The State of Educational Data Mining in 2009: A Review and Future Visions (Baker, R.S.J.D. & Yacef, K., 2009)

Baker, R. S. J. D., & Yacef, K. (2009). The State of Educational Data Mining in 2009: A Review and Future Visions. Journal of Educational Data Mining, 1(1). Retrieved from http://www.educationaldatamining.org/JEDM/images/articles/vol1/issue1/JEDMVol1Issue1_BakerYacef.pdf

Data mining, also called Knowledge Discovery in Databases (KDD), concerns the discovery of novel and potentially useful information from large amounts of data [Witten and Frank 1999]. Baker makes the distinction that educational data mining methods differ from standard data mining methods in that educational researchers must explicitly account for the multi-level hierarchy and non-independence in educational data. Models drawn from the psychometrics literature are often used in educational data mining [Barnes 2005; Desmarais and Pu 2005; Pavlik et al. 2008].

Important trend to note: “Whereas relationship mining was dominant between 1995 and 2005, in 2008-
2009 it slipped to fifth place, with only 9% of papers involving relationship mining. Prediction, which was in second place between 1995 and 2005, moved to the dominant position in 2008-2009, representing 42% of EDM2008 papers…Another key trend is the increase in prominence of modeling frameworks from Item
Response Theory, Bayes Nets, and Markov Decision Processes.”

Common Methodologies

Baker classifies work in educational data mining as follows:

  • Prediction
    • Classification
    • Regression
    • Density estimation
  • Clustering
  • Relationship mining
    • Association rule mining
    • Correlation mining
    • Sequential pattern mining
    • Causal data mining
  • Distillation of data for human judgment
  • Discovery with models

“In discovery with models, a model of a phenomenon is developed through any process that can be validated in some fashion (most commonly, prediction or knowledge engineering), and this model is then used as a component in another analysis, such as prediction or relationship mining. Discovery with models has become an increasingly popular method in EDM research, supporting sophisticated analyses such as which learning material sub-categories of students will most benefit from [Beck and Mostow 2008], how different types of student behavior impact students’ learning in different ways [Cocea et al. 2009], and how variations in intelligent tutor design impact students’ behavior over time [Jeong and Biswas 2008].”

Common Applications

The improvement of student models. Student models comprise information related to a student’s individual characteristics and/or state, such as her current knowledge, motivation, meta-cognition, and attitudes. Researchers have developed such models in order to design software that can respond to each student in a idiosyncratic and appropriate way [Corbett 2001].

EDM methods have also enabled researchers to model a broader range of potentially relevant student attributes in real-time, including higher-level constructs such as gaming the system [Baker et al. 2004], experiencing
poor self-efficacy
[McQuiggan et al. 2008], being off-task [Baker 2007], or feeling bored or frustrated [D’Mello et al. 2008].

For example of how EDM prediction methods can be used to develop student models, see Beck & Woolf, 2000; Beck, 2007; Mavrikis, 2008).

Discovering or improving models of a domain’s knowledge structure. “Barnes [2005] has developed algorithms which can automatically discover a QMatrix from data, and Desmarais & Pu [2005] and Pavlik et al [Pavlik et al. 2009; Pavlik, Cen, Wu and Koedinger 2008] have developed algorithms for finding partial order knowledge structure (POKS) models that explain the interrelationships of knowledge in a domain.”

Studying pedagogical support. This application is useful in discovering which types of pedagogical support are most effective, either overall or for different groups of students or in different situations [Beck and
Mostow 2008; Pechenizkiy et al. 2008]. One popular method is learning decomposition [Beck and Mostow 2008], which involves fitting exponential learning curves to performance data, relating a student’s later success to the amount of each type of pedagogical support the student received up to that point. Using a best-fit model, one can assign relative weights for each type of pedagogical support, and thus infer the relative effectiveness of each type of support for promoting learning.

To advance educational theory. EDM may help researchers gain deeper understanding of the key factors impacting learning, often with a view to design better learning systems. For instance Gong, Rai and Heffernan [2009] investigated the impact of self-discipline on learning; Perera et al. [2009] used the Big 5 theory for teamwork as a driving theory to search for successful patterns of interaction within student teams; Madhyastha and Tanimoto [2009] investigated the relationship between consistency and student performance with the aim to provide guidelines for scaffolding instruction.

REFERENCES
ABELSON, R. 1968. Theories of Cognitive Consistency: A Sourcebook. Rand McNally,
Chicago.

ALEVEN, V. and KOEDINGER, K.R. 2001. Investigations into help seeking and
learning with a Cognitive Tutor. In Proceedings of the AIED-2001 Workshop on Help
Provision and Help Seeking in Interactive Learning Environments, 47-58.

BAKER, R.S., CORBETT, A.T. and KOEDINGER, K.R. 2004. Detecting Student
Misuse of Intelligent Tutoring Systems. In Proceedings of the 7th International
Conference on Intelligent Tutoring Systems, Maceio, Brazil, 531-540.

BAKER, R.S.J.D. 2007. Modeling and Understanding Students’ Off-Task Behavior in
Intelligent Tutoring Systems. In Proceedings of the ACM CHI 2007: Computer-Human
Interaction conference, 1059-1068.

BAKER, R.S.J.D. in press. Data Mining For Education. In International Encyclopedia of
Education (3rd edition), B. MCGAW, PETERSON, P., BAKER Ed. Elsevier, Oxford, UK.

BAKER, R.S.J.D., BARNES, T. and BECK, J.E. 2008. 1 st International Conference on
Educational Data Mining, Montreal, Quebec, Canada.

BARNES, T. 2005. The q-matrix method: Mining student response data for knowledge.
In Proceedings of the AAAI-2005 Workshop on Educational Data Mining.

BARNES, T., DESMARAIS, M., ROMERO, C. and VENTURA, S. 2009. Educational
Data Mining 2009: 2nd International Conference on Educational Data Mining,
Proceedings, Cordoba, Spain.

BARTNECK, C. and HU, J. 2009. Scientometric Analysis of the CHI Proceedings. In
Proceedings of the Conference on Human Factors in Computing Systems (CHI2009),
699-708.

BECK, J. and WOOLF, B. 2000. High-level student modeling with machine learning. In
Proceedings of the International Conference on Intelligent tutoring systems, 584-593.

BECK, J.E. 2007. Difficulties in inferring student knowledge from observations (and
why you should care). Proceedings of the AIED2007 Workshop on Educational Data
Mining, 21-30.

BECK, J.E. and MOSTOW, J. 2008. How who should practice: Using learning
decomposition to evaluate the efficacy of different types of practice for different types of
students. In Proceedings of the 9th International Conference on Intelligent Tutoring
Systems, 353-362.

CHOQUET, C., LUENGO, V. and YACEF, K. 2005. Proceedings of “Usage Analysis in
Learning Systems” workshop, held in conjunction with AIED 2005, Amsterdam, The
Netherlands, July 2005.

COCEA, M., HERSHKOVITZ, A. and BAKER, R.S.J.D. 2009. The Impact of Off-task
and Gaming Behaviors on Learning: Immediate or Aggregate? In Proceedings of the 14th
International Conference on Artificial Intelligence in Education, 507-514.

CORBETT, A.T. 2001. Cognitive Computer Tutors: Solving the Two-Sigma Problem. In
Proceedings of the International Conference on User Modeling, 137-147.

D’MELLO, S.K., CRAIG, S.D., WITHERSPOON, A.W., MCDANIEL, B.T. and

GRAESSER, A.C. 2008. Automatic Detection of Learner’s Affect from Conversational
Cues. User Modeling and User-Adapted Interaction 18, 45-80.

DEKKER, G., PECHENIZKIY, M. and VLEESHOUWERS, J. 2009. Predicting Students
Drop Out: A Case Study. In Proceedings of the International Conference on Educational
Data Mining, Cordoba, Spain, T. BARNES, M. DESMARAIS, C. ROMERO and S.
VENTURA Eds., 41-50.

DESMARAIS, M.C. and PU, X. 2005. A Bayesian Student Model without Hidden Nodes
and Its Comparison with Item Response Theory. International Journal of Artificial
Intelligence in Education 15, 291-323.

DONMEZ, P., ROSÉ, C., STEGMANN, K., WEINBERGER, A. and FISCHER, F. 2005.
Supporting CSCL with automatic corpus analysis technology. In Proceedings of the
International Conference of Computer Support for Collaborative Learning (CSCL 2005),
125-134.

GONG, Y., RAI, D., BECK, J. and HEFFERNAN, N. 2009. Does Self-Discipline Impact
Students’ Knowledge and Learning? In Proceedings of the 2nd International Conference
on Educational Data Mining, 61-70.

JEONG, H. and BISWAS, G. 2008. Mining Student Behavior Models in Learning-by-
Teaching Environments. In Proceedings of the 1st International Conference on
Educational Data Mining, 127-136.

KAY, J., MAISONNEUVE, N., YACEF, K. and REIMANN, P. 2006. The Big Five and
Visualisations of Team Work Activity. In Intelligent Tutoring Systems, M. IKEDA, K.D.
ASHLEY and T.-W. CHAN Eds. Springer-Verlag, Taiwan, 197-206.

KOEDINGER, K.R., CUNNINGHAM, K., A., S. and LEBER, B. 2008. An open
repository and analysis tools for fine-grained, longitudinal learner data. In Proceedings of
the 1st International Conference on Educational Data Mining, 157-166.

MADHYASTHA, T. and TANIMOTO, S. 2009. Student Consistency and Implications
for Feedback in Online Assessment Systems. In Proceedings of the 2nd International
Conference on Educational Data Mining, 81-90.

MAVRIKIS, M. 2008. Data-driven modeling of students’ interactions in an ILE. In
Proceedings of the 1st International Conference on Educational Data Mining, 87-96.

MCQUIGGAN, S., MOTT, B. and LESTER, J. 2008. Modeling Self-Efficacy in
Intelligent Tutoring Systems: An Inductive Approach. User Modeling and User-Adapted
Interaction 18, 81-123.

MERCERON, A. and YACEF, K. 2003. A Web-based Tutoring Tool with Mining
Facilities to Improve Learning and Teaching. In 11th International Conference on
Artificial Intelligence in Education., F. VERDEJO and U. HOPPE Eds. IOS Press,
Sydney, 201-208.

MERCERON, A. and YACEF, K. 2005. Educational Data Mining: a Case Study. In
Artificial Intelligence in Education (AIED2005), C.-K. LOOI, G. MCCALLA, B.
BREDEWEG and J. BREUKER Eds. IOS Press, Amsterdam, The Netherlands, 467-474.

MOORE, A.W. 2006. Statistical Data Mining Tutorials. Downloaded 1 August 2009
from http://www.autonlab.org/tutorials/

PAVLIK, P., CEN, H. and KOEDINGER, K.R. 2009. Learning Factors Transfer
Analysis: Using Learning Curve Analysis to Automatically Generate Domain Models. In
Proceedings of the 2nd International Conference on Educational Data Mining, 121-130.

PAVLIK, P., CEN, H., WU, L. and KOEDINGER, K. 2008. Using Item-type
Performance Covariance to Improve the Skill Model of an Existing Tutor. In Proceedings
of the 1st International Conference on Educational Data Mining, 77-86.

PECHENIZKIY, M., CALDERS, T., VASILYEVA, E. and DE BRA, P. 2008. Mining
the Student Assessment Data: Lessons Drawn from a Small Scale Case Study. In
Proceedings of the 1st International Conference on Educational Data Mining, 187-191.

PERERA, D., KAY, J., KOPRINSKA, I., YACEF, K. and ZAIANE, O. 2009. Clustering
and sequential pattern mining to support team learning. IEEE Transactions on Knowledge
and Data Engineering 21, 759-772

ROMERO, C. and VENTURA, S. 2007. Educational Data Mining: A Survey from 1995
to 2005. Expert Systems with Applications 33, 125-146.

ROMERO, C., VENTURA, S., DE BRA, P. and CASTRO, C. 2003. Discovering
prediction rules in aha! courses. In Proceedings of the International Conference on User
Modeling, 25–34.

ROMERO, C., VENTURA, S., ESPEJO, P.G. and HERVAS, C. 2008. Data Mining
Algorithms to Classify Students. In Proceedings of the 1st International Conference on
Educational Data Mining, 8-17.

SCHOFIELD, J. 1995. Computers and Classroom Culture. Cambridge University Press
Cambridge, UK.

SUPERBY, J.F., VANDAMME, J.-P. and MESKENS, N. 2006. Determination of factors
influencing the achievement of the first-year university students using data mining
methods. In Proceedings of the Workshop on Educational Data Mining at the 8th
International Conference on Intelligent Tutoring Systems (ITS 2006), 37-44.

TAIT, K., HARTLEY, J.R. and ANDERSON, R.C. 1973. Feedback Procedures in
Computer-Assisted Arithmetic Instruction. British Journal of Educational Psychology 43,
161-171.

TANG, T. and MCCALLA, G. 2004. Utilizing Artificial Learners to Help Overcome the
Cold-Start Problem in a Pedagogically-Oriented Paper Recommendation System. In
Proceedings of the International Conference on Adaptive Hypermedia, 245-254.

TANG, T. and MCCALLA, G. 2005. Smart recommendation for an evolving e-learning
system: architecture and experiment. International Journal on E-Learning 4, 105-129.

TANIMOTO, S.L. 2007. Improving the Prospects for Educational Data Mining. In
Proceedings of the Complete On-Line Proceedings of the Workshop on Data Mining for
User Modeling, at the 11th International Conference on User Modeling (UM 2007), 106-
110.

WITTEN, I.H. and FRANK, E. 1999. Data mining: Practical Machine Learning Tools
and Techniques with Java Implementations. Morgan Kaufmann, San Fransisco, CA.

ZAÏANE, O. 2001. Web usage mining for a better web-based learning environment. In
Proceedings of conference on advanced technology for education, 60-64.

ZAÏANE, O. 2002. Building a recommender agent for e-learning systems. In
Proceedings of the International Conference on Computers in Education, 55–59.

Advertisements
Comments
2 Responses to “The State of Educational Data Mining in 2009: A Review and Future Visions (Baker, R.S.J.D. & Yacef, K., 2009)”
  1. Mark Chen says:

    I love your article summaries!
    Is the reference list for this cut-off after “C”? Was hoping to see the Madhyastha and Tanimoto [2009] one.

    thanks!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: