We leverage a unique national dataset of 41,359 college applications to prospectively predict 4-year bachelor's graduation in a generalizable manner. Our features include sociodemographics, institutional graduation rates, academic achievement, standardized test scores, engagement in extracurricular activities, work experiences, and ratings by teachers and high-school guidance counselors. A random forest classifier successfully predicted 4-year graduation for 71.4% of the students (base rate = 44%) using all 166 of the aforementioned features and a split-half validation method. A stochastic hill-climbing feature selection procedure effectively maintained the same classification accuracy, but with a minimal set of 37 features, consisting of an approximately equal representation of sociodemographics, cognitive, and noncognitive factors. We advocate against using these results for admissions decisions, instead contemplating how they might be used to provide parents and educators with actionable information to guide students towards college success.
Shared by Bryan Alexander, like and 1 save total