An Analysis of the Differences in the Frequency of Students’ Disengagement in Urban, Rural, and Suburban High Schools (Baker & Gowda, 2010)

Baker, R. S. J. &  Gowda, S. M. (2010). An Analysis of the Differences in the Frequency of Students’ Disengagement in Urban, Rural, and Suburban High Schools. In Baker, R.S.J.d., Merceron, A., Pavlik, P.I. Jr. (Eds.) Proceedings of the 3rd International Conference on Educational Data Mining, 11-20. Retrieved from

The authors examine how students from various school settings (urban, rural, and suburban) differ in terms of behaviors signifying engagement: frequency of off-task behavior, gaming the system, and carelessness. Automated detectors of these behaviors were applied to data from students using the geometry Cognitive Tutor software across an entire school year. Students in the urban school showed off-task and careless behaviors significantly more than students in the rural and suburban schools. Differences between schools in terms of gaming the system were less stable. These findings are discussed in relation to their possible connection to achievement.“The gaming detector used was trained using data from students using a Cognitive Tutor for Algebra [5], using an age-similar population and an approach validated to generalize between students and between Cognitive Tutor lessons [4].”

“The off-task detector used was trained using data from students using a Cognitive Tutor for Middle School Mathematics. The off-task detector was validated to generalize to new students, and to function accurately in several Cognitive Tutor lessons [2].”

“Carelessness was detected using the slip detector from [3], which was trained on data from Cognitive Tutor Geometry. This use of contextual slip is in line with theoretical work by Clements [9], who argues that making errors despite knowing the skills needed
for successful performance should be considered evidence of carelessness. It is important, however, to note that contextual slip could potentially also be an indicator of shallow knowledge that does not apply to all items in the tutor, even if they are labeled as
involving the same skill.”

Teachers in each school used the software with their students for different amounts of time. This represents a selection bias in the data, but also reflects natural usage in real-world contexts. To address this selection bias, the authors analyzed the data in two
ways – using all data (the more ecologically valid choice), and using a time-slice consisting of the 3rd-8th hours (minutes 120-480) of each student’s usage (this time-slice will not be as representative of the usage in each school, but avoids this confound). The authors reasoned that the initial 2 hours likely represent interface learning (which, in turn, is dependent on prior experience with educational software), and therefore may not be representative of overall tutor use.

The authors recommend a future study comprising a large numbers of schools, in order to average these implementation differences across each type of school.

“Automated machine-learned detectors provide an essential tool for analysis of this sort, in this author’s opinion a better tool than existing alternatives. For example, it is not tractable to use observational, text replay annotation, or video methods at this sort of scale. [5] presents a use of text replay methods to analyze a single behavior among 58 students over an entire school year; though text replay methods are significantly faster than live observation or video coding methods, the coding needed for this analysis took over 200 hours. Utilizing text replays to annotate the 3 school sample used in this paper would have taken over 2000 hours, assuming a rate of observation equal to that in [5]. Video coding and field observation would have taken even longer.

That said, it is worth noting that automated detectors have important challenges not present when using human labels. It is important to validate the generalizability of detectors across students, schools, and learning materials, a task which has been only
partially completed for the detectors used in this paper, and which has received insufficient attention in the literature in general. Construct validity is also a key issue in the use of machine-learned detectors, and is more a risk in detectors that are based on
theoretically determined training labels (e.g. the model of carelessness), compared to detectors based on human judgments shown to have good inter-rater reliability (e.g. the detectors of off-task behavior and gaming the system). It is worth noting that automated
detectors produced with a common alternative to machine learning, knowledge engineering, are likely to be prone to the same challenges to generalizability and construct validity as machine-learned detectors. Current practice with knowledge engineering often does not check detectors against human labels or across contexts, a potentially significant risk to using these models in discovery with models analyses.

As research applying detectors across contexts goes forward, it has significant potential to support progress in studying the impact of school context. By further study of which school contexts – and what attributes of those contexts – are associated with greater
frequencies of disengaged behavior, we may be able to better understand the differences in learning between different learning settings. This may in turn support education researchers and practitioners in designing curricula, learning software, and interventions tailored to different schools – a potentially key step towards developing educational software that is equally effective for all students, whether they are in urban schools, rural schools, suburban schools, or elsewhere.”

[2] Baker, R.S.J.d. Modeling and Understanding Students’ Off-Task Behavior in
Intelligent Tutoring Systems. Proceedings of ACM Computer-Human Interaction, 2007,

[3] Baker, R.S.J.d., Corbett, A.T., Aleven, V. More Accurate Student Modeling Through
Contextual Estimation of Slip and Guess Probabilities in Bayesian Knowledge Tracing.
Proceedings of the 9th International Conference on Intelligent Tutoring Systems, 2008,

[4] Baker, R.S.J.d., Corbett, A.T., Roll, I., Koedinger, K.R.. Developing a Generalizable
Detector of When Students Game the System. User Modeling and User-Adapted
Interaction, 18 (3), 2008, 287–314.

[5] Baker, R.S.J.d., de Carvalho, A. M. J. A.: Labeling Student Behavior Faster and More
Precisely with Text Replays. Proceedings of the 1st International Conference on
Educational Data Mining, 2008, 38-47.

[6] Beck, J. Engagement tracing: using response times to model student
disengagement. Proceedings of the 12th International Conference on Artificial
Intelligence in Education (AIED 2005), 88-95.

[10] Cocea, M., Hershkovitz, A., Baker, R.S.J.d.: The Impact of Off-task and Gaming
Behaviors on Learning: Immediate or Aggregate? Proceedings of the 14th International
Conference on Artificial Intelligence in Education, 2009, 507–514.

[12] Gobel, P.: Student off-task behavior and motivation in the CALL classroom.
International Journal of Pedagogies and Learning, 4 (4), 2008, 4-18.

[13] Karweit, N., Slavin, R.E.: Time-On-Task: Issues of Timing, Sampling, and
Definition. Journal of Experimental Psychology, 74 (6), 1982, 844–851.

[22] Rowe, J., McQuiggan, S., Robison, J., Lester, J.. Off-Task Behavior in Narrative-
Centered Learning Environments. Proceedings of the Fourteenth International
Conference on Artificial Intelligence and Education, 2009, 99-106.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: