Ubiquitous Learning and Instructional Technologies MOOC’s Updates
Essential Update #3
One aspect of big data that is becoming more widely used in education is machine learning in assessments, as mentioned in Big Data Comes to School: Implications for Learning, Assessment, and Research (Cope & Kalantzis, 2016). Assessment data is collected via computer adapter testing (CAT) and natural language processing.
CAT is often used for selective response tests to improve the validity of test items and provide more accurately calibrated scores according to each student's ability. Machine learning is integrated into creating valid test items in which newly created test items are evaluated against existing ones, thereby providing a way for multiple educators to “crowdsource” test item development and evaluation. CAT can also decrease the likelihood that no two tests between two students are the same in that the test serves progressively easier or harder questions depending on whether the student answers the previous one correctly. This also decreases the likelihood for users to game the system. Examples of these kinds of tests being implemented include end-of-chapter tests in e-textbooks, quizzes being delivered on learning management systems such as Blackboard, and the evaluation of student proficiency in manual subject areas. CATs are therefore useful in evaluating student progress by identifying strengths and weaknesses by providing immediate feedback for both the student and instructor to reflect on to improve study habits or to understand knowledge gaps that need to be addressed.
Natural language processing is often used to grade short answer and essay-length supply-response assessments with reliability equivalent to human graders. This aspect removes the costs associated with human grading including the time needed to read student responses, rather training, and moderation to ensure interrater reliability. In evaluating literacy, writing is more often used as a proxy than reading because of its importance across a range of curriculum areas. Natural language processing works by comparing responses to a “corpus” of validated human-provided texts to compare against by using statistical similarity and degree of cohesion. However, there are still costs associated with natural language processing in grading in that the methods including statistical corpus comparison and analytical text parsing are not yet perfected. These costs include the inability to provide meaningful feedback and the increased potential for users to game the system by using longer sentences and more complex but superfluous vocabulary to get a higher grade. The benefits and costs of natural language processing in grading these long response assessments therefore highlight that this mode of data collection increases efficiency in scoring but still needs human refinement and moderation by providing feedback and understanding of finer details that the application may not notice.