Can We Get Better Assessment From A Tutoring System Compared to Traditional Paper Testing? (Feng & Heffernan, 2010)

Feng, M., & Heffernan, N. (2010). Can We Get Better Assessment From A Tutoring System Compared to Traditional Paper Testing? Can We Have Our Cake (Better Assessment) and Eat It too (Student Learning During the Test)? Intelligent Tutoring Systems (pp. 309–311).

The authors conducted an analysis of ITS data from 1,392 students over two school years, comparing two conditions: a ‘static condition’ in which student data comprised only practice items without intervention; and a ‘dynamic condition’ in which data included information related to whether or not the student sought help when they encountered difficulties. The main goal of this analysis was to investigate whether or not ‘dynamic assessment’ was an efficient and accurate way to assess student learning (based on a year-end state test).

“Dynamic assessment (DA, or sometimes called dynamic testing, Grigorenko & Sternberg, 1998) has been advocated as an interactive approach to conducting assessments to students in the learning systems as it can differentiate student proficiency at the finer grained level. Different from traditional assessment, DA uses the amount and nature of the assistance that students receive which is normally not available in traditional practice test situations as a way to judge the extent of student knowledge limitations.”

Grigorenko and Sternberg (1998) reviewed relevant literature on this topic and expressed enthusiasm for the idea.

Sternberg & Grigorenko (2001, 2002) argued that dynamic tests not only serve to enhance students’ learning of cognitive skills, but also provide more accurate measures of ability to learn than traditional static tests.

Bryant, Brown & Campione, 1983; Campione & Brown, 1985 – took a graduated prompting procedure to compare traditional testing paradigms against a dynamic testing paradigm. In the dynamic testing paradigm, learners are offered increasingly more explicit prewritten hints in response to incorrect responses. They found that student learning gains were not as well correlated (R = 0.45) with static ability score as with their “dynamic testing” (R = 0.60) score.

*However, although DA has been shown to be effective predicting student performance, it generally takes longer for students to finish a test using the DA approach than using a traditional test.

A computer-based intelligent tutoring system called ASSISTments was used. In this instance ASSISTments presented math problems to students 13 to 16 years old who were in middle or high school. “The hypothesis is that ASSISTments can do a better job of assessing student knowledge limitations than practice tests or other online testing approaches by using the DA approach based on the data collected online.”

The authors compared the same student’s work in two different conditions, ruling out the subject effect. The student’s end of year state accountability test score was used as the measure of student achievement.

  • Simulated static assessment condition (A’): 40 minutes of student work selected from existing log data on only main items. The data for condition A’ included student response data during the first 40 minutes of work on only main problems; all responses and other actions during the DA portion were ignored.
  • Dynamic assessment condition (B’): 40 minutes of work selected from existing log data on both main items and the scaffolding steps and hints. Data for condition B’ included all the responses for main questions and scaffoldings, as well as hint requests.

Metrics for dynamic testing that measures student accuracy, speed, attempts, and help-seeking behaviors. Condition A’ used only the first one as predictor to simulate paper practice tests by scoring students either correct or incorrect on each main problem while condition B’ used all the metrics.

  • Main_Percent_Correct – students’ percent correct on main questions; often referred to as the “static metric”.
  • Main_Count – the number of main items students completed. This measures students’ attendance and how on-task they are. Also reflects students’ knowledge since better students have a higher potential to finish more items in the same amount of time. This is especially true for condition B’ where students’ work on scaffolding also counted as part of the 40 minute work. In condition A’, low performing kids could go through many items but give wrong answers since their time consumed during the tutoring session is disregarded.
  • Scaffold_Percent_Correct – students’ percent correct on scaffolding questions. In addition to original items, students’ performance on scaffolding questions was also a reasonable reflection of their knowledge. For instance, two students who get the same original item wrong may, in fact, have different knowledge levels and this may be reflected in that one may do better on scaffolding questions than the other.
  • Avg_Hint_Request – the average number of hint requests per question.
  • Avg_Attempt – the average number of attempts students made for each question.
  • Avg_Question_Time – on average, how long it takes for a student to answer a question, whether original or scaffolding, measured in seconds.

Stepwise linear regression was used to predict student state test scores. For all the models, the dependent variable is the state test score; in terms of the independent variable, for condition A’, it was Main_Percent_Correct; while for condition B’, it was a collection of metrics: Main_Percent_Correct, Main_Count, Scaffold_Percent_Correct, Avg_Hint_Request, Avg_Attempt, Avg_Question_Time.


  • More attempts or more hints on a question correlate with a lower estimated score.
  • The dynamic assessment condition did a significantly better job at predicting state test scores than the control static condition.
  • Dynamic assessment is more efficient than just giving practice test items. DA can assess student math performance just as accurately as a traditional practice test, even when controlling for testing time.


Brown, A. L., Bryant, N.R., & Campione, J. C. (1983). Preschool children’s learning and transfer of matrices problems: Potential for improvement. Paper presented at the Society for Research in Child Development meetings, Detroit.

Campione, J.C., Brown, A.L., & Bryant, N.R. (1985). Individual differences in learning and memory. In R.J. Sternberg (Ed.). Human abilities: An information-processing approach, 103–126. New York: W.H. Freeman.

Campione, J.C.& Brown, A.L. (1985). Dynamic assessment: One approach and some initial data. Technical Report No. 361. Cambridge, MA: Illinois University, Urbana. Center for the Study of Reading. ED269735

Feng, M., Heffernan, N.T., & Koedinger, K.R. (2009). Addressing the assessment challenge in an online system that tutors as it assesses. User Modeling and User-Adapted Interaction: The Journal of Personalization Research. 19(3), 2009.

Feng, M., Heffernan, N., Beck, J, & Koedinger, K. (2008). Can we predict which groups of questions students will learn from? In Beck & Baker (Eds.). Proceedings of the 1st International Conference on Education Data Mining. Montreal, 2008.

Feng, M., Heffernan, N. T., & Koedinger, K. R. (2006). Addressing the testing challenge with a web based E-assessment system that tutors as it assesses. Proceedings of the 15th Annual World Wide Web Conference. ACM Press: New York.

Fuchs, L.S., Compton, D.L., Fuchs, D., Hollenbeck, K.N., Craddock, C.F., & Hamlett, C.L (2008). Dynamic assessment of algebraic learning in predicting third graders’ development of mathematical problem solving. Journal of Educational Psychology, 100(4), 829-250.

Fuchs, D., Fuchs, L.S., Compton, D.L., Bouton, B., Caffrey, E., & Hill, L. (2007). Dynamic assessment as responsiveness to intervention. Teaching Exceptional Children, 39 (5), 58-63.

Grigorenko, E. L. and Sternberg, R. J. (1998). Dynamic testing. Psychological Bulletin, 124, 75–111.

Sternburg, R.J., & Grigorenko, E.L. (2001). All testing is dynamic testing. Issues in Education, 7, 137-170.

Sternburg, R.J., & Grigorenko, E.L. (2002). Dynamic testing: The nature and measurement of learning potential. Cambridge, England: Cambridge University Press.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: