Personalized Content Recommendation At Scale: Issues and Approaches
As learning increasingly moves online and the cost effectiveness of higher education undergoes more scrutiny, it becomes more important to find ways to identify struggling students and point them toward helpful content and formative assessment items. This is particularly true for a large online university like Western Governors University (WGU), where we have addressed seven key issues in building out our personalized curriculum recommendation capabilities.
1. Agreeing on what personalization means
Some attempts at personalization simply look at user preferences–the student says she prefers videos, so let’s show her more videos. Although some call this approach personalization, we consider it to be customization, modifying the user experience based on settings the user selects. Other approaches employ actual personalization by modifying the user experience based on user behavior. For example, if the student’s behavior indicates they tend to reread text rather than watch videos, text is emphasized over video.
Even though this latter approach does meet the definition of personalization, neither it nor the customization approach is likely to improve student outcomes. A key indicator of a successful outcome is the Course Completion Rate (CCR), which is the percentage of students who completed a course during a term divided by all those enrolled in the course during that term. To move the needle toward outcomes like CCR, we want to identify behaviors that—for a particular student profile—are associated with better outcomes. This third definition of personalization is foundational to our approach at WGU.
2. Education is not a typical recommender system problem
When people think about recommendations, they often think about an e-commerce recommender system that says, “People who buy A also buy B.” That problem is often solved by a type of modelling called collaborative filtering, which focuses on identifying two similar behaviors that occur together. The problem with recommending content to improve course completion is much more complex.
If we used collaborative filtering, we might end up with recommendations like, “Students who don’t read chapter three also don’t read chapter four,” which would not be very helpful. This is like trying to improve someone’s health by saying, “People who eat potato chips also eat pork rinds.” Instead, we want to say, “People who eat potato chips should eat Brussels sprouts.” We do so by identifying groups of students who share certain similarities, like factors that predict completion, then comparing an individual student’s behavior to students like them who passed the course in previous terms, allowing us to identify the gaps and intervene. This concept of comparing an individual to students like them is foundational to our content recommendation systems.
3. Availability of LR data
Our institution uses dozens of learning resource (LR) content providers. Creating accurate student behavior models is much easier when data is delivered at the event level, such as the date and time of an individual pageview or question response. However, not all vendors provide this level of detail. Another challenge with accessing LR data is getting a timely and reliable feed from the vendor. In the past, LR data might have been used for research purposes where timeliness was not as important. Today, LR data is used for mission-critical applications to provide real-time advice to students or faculty. We actively work with our vendors to improve the breadth and frequency of the data we receive.
4. Supervised clustering problem
We want to cluster students in a way that not only creates groups of similar students in terms of the characteristics that define their learning but where the clusters differ in terms of the key success measure. For example, if we found clusters of students sharing first-generation status, math-readiness and incoming GPA, but the resulting clusters all had the exact same CCR—85%. Can we really say that these are different types of students? The challenge for data scientists is using an unsupervised modelling method like clustering in a supervised way, where the resulting clusters have some exogenous variable. In this case, it’s the CCR. We have found highly significant differences in CCR by cluster using a predictive modelling technique called a Generalized Additive Model (GAM) for feature selection. We then use hierarchical or k-means clustering on the student level to feature important variables.
5. Scalable across dozens or hundreds of courses
We have over 500 active courses at WGU. Because of that, methods we use need to be highly automated, both to create the initial models and to update them as students move through the course and term. Therefore, we use machine learning methods to identify which modules and activities most predict completion. For example, it could be the number of page views, number of questions attempted or the percentage of those questions the student answered correctly. We also use scalable systems to evaluate dozens of model types during the model-building process, such as logistic regression, neural networks and decision trees, and scale that process to hundreds of courses.
6. Detecting rare but important triggering events for intervention
In addition to providing a course-level big picture of where student effort and formative success is strong or weak relative to successful, similar students, we also want to look for relatively small and rare microevents that act as predictors of low course completion. These will allow us to intervene with page- and even paragraph-level advice for a student struggling with a specific concept. For example, a student just viewed page 44 “DHTML vs. HTML” seven times in a single day, which tends to predict low course completion and indicates that an intervention is required.
7. Change management for integrating data and faculty judgment
It’s important to find the right balance between using insights gleaned from data and those of faculty members. Insight emerging from data is deep—coming from large sample sizes—but can be narrow in scope, compared to a faculty member’s experience with a student. No matter how good a model is, it will never explain 100% of the variance in student performance. Often, analyses are classified into three groups: descriptive, predictive and prescriptive. But perhaps in this realm we should add a fourth category: assistive. A recent article about Artificial Intelligence (AI) in Wharton Magazine (Fall/Winter 2021) calls this a successful approach in the medical field. It describes efforts where modelling discovers data patterns that may not be apparent to an individual practitioner and should be considered, but where the ultimate decision-making is left to the physician. We are taking an agile approach to development by involving faculty teams in testing various iterations of our recommendation systems as they are being built.
By addressing these challenges, we can create faculty-embraced content and formative item recommendations that can be either micro or macro in their focus while also scalable, personalized to the student, interpretable and assistive, and improved student performance.