Machine learning in higher education – McKinsey

Many higher-education institutions are now using data and analytics as an integral part of their processes. Whether the goal is to identify and better support pain points in the student journey, more efficiently allocate resources, or improve student and faculty experience, institutions are seeing the benefits of data-backed solutions.
This article is a collaborative effort by Claudio Brasca, Nikhil Kaithwal, Charag Krishnan, Monatrice Lam, Jonathan Law, and Varun Marya, representing views from McKinsey’s Public & Social Sector Practice.
Those at the forefront of this trend are focusing on harnessing analytics to increase program personalization and flexibility, as well as to improve retention by identifying students at risk of dropping out and reaching out proactively with tailored interventions. Indeed, data science and machine learning may unlock significant value for universities by ensuring resources are targeted toward the highest-impact opportunities to improve access for more students, as well as student engagement and satisfaction.
For example, Western Governors University in Utah is using predictive modeling to improve retention by identifying at-risk students and developing early-intervention programs. Initial efforts raised the graduation rate for the university’s four-year undergraduate program by five percentage points between 2018 and 2020.1
Yet higher education is still in the early stages of data capability building. With universities facing many challenges (such as financial pressures, the demographic cliff, and an uptick in student mental-health issues) and a variety of opportunities (including reaching adult learners and scaling online learning), expanding use of advanced analytics and machine learning may prove beneficial.
Below, we share some of the most promising use cases for advanced analytics in higher education to show how universities are capitalizing on those opportunities to overcome current challenges, both enabling access for many more students and improving the student experience.
Data science and machine learning may unlock significant value for universities by ensuring resources are targeted toward the highest-impact opportunities to improve access for more students, as well as student engagement and satisfaction.
Advanced-analytics techniques may help institutions unlock significantly deeper insights into their student populations and identify more nuanced risks than they could achieve through descriptive and diagnostic analytics, which rely on linear, rule-based approaches (Exhibit 1).
Advanced analytics—which uses the power of algorithms such as gradient boosting and random forest—may also help institutions address inadvertent biases in their existing methods of identifying at-risk students and proactively design tailored interventions to mitigate the majority of identified risks.
For instance, institutions using linear, rule-based approaches look at indicators such as low grades and poor attendance to identify students at risk of dropping out; institutions then reach out to these students and launch initiatives to better support them. While such initiatives may be of use, they often are implemented too late and only target a subset of the at-risk population. This approach could be a good makeshift solution for two problems facing student success leaders at universities. First, there are too many variables that could be analyzed to indicate risk of attrition (such as academic, financial, and mental health factors, and sense of belonging on campus). Second, while it’s easy to identify notable variance on any one or two variables, it is challenging to identify nominal variance on multiple variables. Linear, rule-based approaches therefore may fail to identify students who, for instance, may have decent grades and above-average attendance but who have been struggling to submit their assignments on time or have consistently had difficulty paying their bills (Exhibit 2).
A machine-learning model could address both of the challenges described above. Such a model looks at ten years of data to identify factors that could help a university make an early determination of a student’s risk of attrition. For example, did the student change payment methods on the university portal? How close to the due date does the student submit assignments? Once the institution has identified students at risk, it can proactively deploy interventions to retain them.
Though many institutions recognize the promise of analytics for personalizing communications with students, increasing retention rates, and improving student experience and engagement, institutions could be using these approaches for the full range of use cases across the student journey—for prospective, current, and former students alike.
For instance, advanced analytics can help institutions identify which high schools, zip codes, and counties they should focus on to reach prospective students who are most likely to be great fits for the institution. Machine learning could also help identify interventions and support that should be made available to different archetypes of enrolled students to help measure and increase student satisfaction. These use cases could then be extended to providing students support with developing their skills beyond graduation, enabling institutions to provide continual learning opportunities and to better engage alumni. As an institution expands its application and coverage of advanced-analytics tools across the student life cycle, the model gets better at identifying patterns, and the institution can take increasingly granular interventions and actions.
Institutions will likely want to adopt a multistep model to harness machine learning to better serve students. For example, for efforts aimed at improving student completion and graduation rates, the following five-step technique could generate immense value:
Institutions could deploy this model at a regular cadence to identify students who would most benefit from additional support.
Institutions could also create similar models to address other strategic goals or challenges, including lead generation and enrollment. For example, institutions could, as a first step, analyze 100 or more attributes from years of historical data to understand the characteristics of applicants who are most likely to enroll.
Institutions will likely want to adopt a multistep model to harness machine learning to better serve students.
The experiences of two higher education institutions that leaned on advanced analytics to improve enrollment and retention reveal the impact such efforts can have.
One private nonprofit university had recently enrolled its largest freshman class in history and was looking to increase its enrollment again. The institution wanted to both reach more prospective first-year undergraduate students who would be a great fit for the institution and improve conversion in the enrollment journey in a way that was manageable for the enrollment team without significantly increasing investment and resources. The university took three important actions:
For this institution, advanced-analytics modeling had immediate implications and impact. The initiative also suggested future opportunities for the university to serve more freshmen with greater marketing efficiency. When initially tested against leads for the subsequent fall (prior to the application deadline), the model accurately predicted 85 percent of candidates who submitted an application, and it predicted the 35 percent of applicants at that point in the cycle who were most likely to enroll, assuming no changes to admissions criteria (Exhibit 3). The enrollment management team is now able to better prioritize its resources and time on high-potential leads and applicants to yield a sizable class. These new capabilities will give the institution the flexibility to make strategic choices; rather than focus primarily on the size of the incoming class, it may ensure the desired class size while prioritizing other objectives, such as class mix, financial-aid allocation, or budget savings.
Similar to many higher-education institutions during the pandemic,2 one online university was facing a significant downward trend in student retention. The university explored multiple options and deployed initiatives spearheaded by both academic and administrative departments, including focus groups and nudge campaigns, but the results fell short of expectations.
The institution wanted to set a high bar for student success and achieve marked and sustainable improvements to retention. It turned to an advanced-analytics approach to pursue its bold aspirations.
To build a machine-learning model that would allow the university to identify students at risk of attrition early, it first analyzed ten years of historical data to understand key characteristics that differentiate students who were most likely to continue—and thus graduate—compared with those who unenrolled. After validating that the initial model was multiple times more effective at predicting retention than the baseline, the institution refined the model and applied it to the current student population. This attrition model yielded five at-risk student archetypes, three of which were counterintuitive to conventional wisdom about what typical at-risk student profiles look like (Exhibit 4).
Together, these three counterintuitive archetypes of at-risk students—which would have been omitted using a linear analytics approach—account for about 70 percent of the students most likely to discontinue enrollment. The largest group of at-risk individuals (accounting for about 40 percent of the at-risk students identified) were distinctive academic achievers with an excellent overall track record. This means the model identified at least twice as many students at risk of attrition than models based on linear rules. The model outputs have allowed the university to identify students at risk of attrition more effectively and strategically invest in short- and medium-term initiatives most likely to drive retention improvement.
With the model and data on at-risk student profiles in hand, the online university launched a set of targeted interventions focused on providing tailored support to students in each archetype to increase retention. Actions included scheduling more touchpoints with academic and career advisers, expanding faculty mentorship, and creating alternative pathways for students to satisfy their knowledge gaps.
Advanced analytics is a powerful tool that may help higher-education institutions overcome the challenges facing them today, spur growth, and better support students. However, machine learning is complex, with considerable associated risks. While the risks vary based on the institution and the data included in the model, higher-education institutions may wish to take the following steps when using these tools:
While many higher-education institutions have started down the path to harnessing data and analytics, there is still a long way to go to realizing the full potential of these capabilities in terms of the student experience. The influx of students and institutions that have been engaged in online learning and using technology tools over the past two years means there is significantly more data to work with than ever before; higher-education institutions may want to start using it to serve students better in the years to come.
Claudio Brasca is a partner in McKinsey’s San Francisco office, where Varun Marya is a senior partner; Nikhil Kaithwal is an associate partner in the London office, Charag Krishnan is a partner in the New Jersey–Summit office, Monatrice Lam is a consultant in the Bay Area–Silicon Valley office, and Jonathan Law is a senior partner in the Southern California office.
The authors wish to thank Inès Garceau-Aranda, Emily Cohen, Katie Owen, Xiaowo Sun, Xuecong Sun, and Shyla Ziade for their contributions to this article.


Leave a Comment