The Human Element and the Power of Big Data in Higher Education
“Big Data” was a research idea before it was the next big thing in the practical world of IT. In fact, researchers who incubated the ideas behind Big Data were surprised—and somewhat suspicious—of the speed at which the predictive power of massive data sets came to dominate C-Suite conversations in industries as diverse as consumer retailing and public health. I know those conversations well, because the computer science research lab I directed at Bellcore (the Bell Labs spin-off which was later known as Telcordia Technologies) was the birthplace of patents like the social filtering engine that quickly became the heart of Amazon’s recommender technology in the 1990’s. Others noticed that clusters of statistically similar online behaviors were very good predictors of new behavior. Patterns of past banking transactions, for example, could be used to predict with uncanny accuracy whether a customer was about to switch banks. The more data in the cluster, the better the predictions. These were the early days of eCommerce, and the concept of “more data” was moored in the ancient, manual world of pencil-and-paper data. A million data points was huge. Such was the world before Google.
Over the next decade, the scale by which we measure “big” exploded. Every leap forward—a million billion-fold from megabytes and gigabytes to exabytes—opened new ways in which machines could draw startling conclusions from data, further convincing decision makers that the behavior of large groups is a very good indication of how an individual will behave. The accuracy of voter models used by Barack Obama’s presidential campaigns, the success of IBM’s Watson in beating human competitors at “Jeopardy,” and a dozen other highly visible projects impressed on popular culture the potential of Big Data for changing our conception of the world and people around us. Critics who loudly decried that there was such a things as too much data, were unwitting accomplices in spreading the religion of predictive analytics.
It is curious that the potential impact of such data-intensive methods on higher education was so slow to be recognized. My theory is that it was not so much a problem of recognition as an unsettling realization of what the future might have in store. Educators, who over the previous century invested heavily in the massification of higher ed, knew that these new methods were problematic at many levels. Accurate analytics enabled personalized education at scale, for instance. But as the historian of higher education Frederick Rudolph pointed out fifty years ago, higher education was built on a factory model in which raw materials moved in lockstep through a set of manufacturing steps designed to produce graduates who had been subjected to equal doses of Carnegie units of learning, all under the watchful eye of quality control specialists who rejected defective products.
Psychologists like Benjamin Bloom undercut the factory model and established the primacy of personalization in education. If personalization was practical, most of the rationale for standardized testing, accreditation, and other lynchpins of higher ed would become unnecessary. Even worse, many of the methods for amassing large educational data sets involved online course delivery, a prospect abhorred by many critics who were already attacking MOOCs as a dehumanizing industrialization of the classroom experience. One can imagine the internal struggles to reconcile the desire for a small, intimate learning experience with the possibility that the path to better learning might run directly though virtual classrooms containing thousands or tens of thousands students or statistical analysis of student records. Nevertheless, it was soon apparent that technology conceived to use business intelligence to improve efficiency and effectiveness might also help educators personalize the learning experience, something that the factory model was not able to do.
Adapting Big Data to higher education was bolstered by the peculiar nature of online instruction. Unlike online consumers, who might spend a few minutes purchasing a book, making a dinner reservation, or paying a credit card bill, students engage with their learning materials for hours. They watch and re-watch videos, take and re-take online quizzes, and use social networks in place of in-person recitations. On a relative scale, education generates massive amounts of data—a hundred million times more data than e-commerce according to some estimates. As I explained in my book Revolution in Higher Education, this scale is what makes Big Data particularly effective in education.
This advantage was apparent immediately. Knewton, one of the earliest tech firms specializing in Big Data for learning analytics, formed a partnership with Arizona State University to visualize the progress of ASU’s Math Readiness students. When co-founder Jose Ferreira demonstrated Knewton’s visualization engine to a 2012 invited meeting of higher education leaders in Washington, audience members could see for the first time how the ordering of small learning modules affected student comprehension and progress through course material. Some learners sped through the material in a straight line, some slowed considerably to return to concepts they had not yet mastered, while others failed to make any progress at all. It was obvious to all that professors guiding the course would be able to use this information to make real-time adjustments in teaching, creating personalized learning pathways. Within two semesters success rates rose from 64 to 75 percent, while dropout rates were cut in half.
At about the same time, Georgia State University was building predictive models of student success based on a data set of millions of student records amassed over the course of 10 years. Led by GSU Vice Provost Tim Renick, data-driven analytical models were used to flag students whose progress toward a degree might be derailed without intervention from an advisor. Over a five-year period, graduation rates were boosted by 6 percent and the average time to degree completion was reduced by a half a semester. The completion gap for low income, minority, and first-generation students virtually disappeared. The return on investment for using the technology funded the expansion in advising, so these quality improvements cost the university literally nothing.
In January of this year, Georgia Tech launched a completely online CS 1301 course offering for undergraduates hosted by edX and powered by McGraw-Hill’s SmartBook adaptive technology. Taught by online Master of Science in Computer Science (OMS CS) instructor and Udacity course developer David Joyner and incubated in Georgia Tech’s Center for 21st Century Universities, this new course is designed to utilize the best in technology and student data analysis to create a better and more accessible introductory computer science course for Georgia Tech students and, ultimately, learners all over the world. As universities embrace the operational efficiency and educational effectiveness enabled by Big Data, projects like this will lead the way in technology-enabled personalized learning pathways.
There are dozens of large-scale projects like these underway. Big Data empowers higher education to not only envision but also enable a world in which the individualized academic needs of students are understood and engaged through technology. While some fear the de-humanizing impact of technology, I think the opposite result is far more likely. Students who were not succeeding before and those receiving lower grades who may have previously skated by unnoticed, are supported and empowered to succeed, enabling an improved educational trajectory and more frequent human engagement along the way. Not only that, but universities have been able to streamline resources while simultaneously improving outcomes. The power of big data in higher education lies in technology’s ability to personalize and create an improved, more “human” experience than we’ve seen in outmoded factory models of education.
Author Perspective: Administrator