### Learning Factors Analysis (Cen, Koedinger, & Junker, 2006)

Cen, H., Koedinger, K., Junker, B. **Learning Factors Analysis – A General Method for Cognitive Model Evaluation and Improvement**. *the 8th International Conference on Intelligent Tutoring Systems*. 2006. Pages 12.(download)

In this paper, the authors describe a semi-automated method for improving a cognitive model called Learning Factors Analysis that combines a statistical model, human expertise and a combinatorial search.

A **cognitive model** – set of production rules or skills encoded in intelligent tutors to model how students solve problems. (Production = skill = rule)

A good cognitive model:

- captures the fine knowledge components in a curriculum
- provides tailored feedback and hints
- selects problems with difficulty level and learning pace matched to individual students
- improves student learning.

Data from the Area Unit of the Geometry Cognitive Tutor was used. The initial cognitive model implemented in the Tutor had 15 skills that correspond to productions or, in some cases, groups of productions.

– Circle-area – Given the radius , find the area of a circle

– Circle-circumference – Given the diameter, find the circumference of a circle.

– Circle-diameter — Given the radius or circumference, find the diameter of a circle.

– Circle-radius — Find the radius given the area, circumference, or diameter.

– Compose-by-addition – In a+b=c, given any two of a, b, or c, find the third.

– Compose-by-multiplication – In a*b=c, given any two of a, b, or c, find the third.

– Parallelogram-area – Given the base and height, find the area of a parallelogram.

– Parallelogram-side – Given the area and height (or base), find the base (or height).

– Pentagon-area – Given a side and the apothem, find the area of a pentagon.

– Pentagon-side – Given area and apothem, find the side (or apothem).

– Trapezoid-area – Given the height and both bases, find the area of a trapezoid.

– Trapezoid-base – Given area and height, find the base of a trapezoid.

– Trapezoid-height – Given the area and the base, find the height of a trapezoid.

– Triangle-area – Given the base and height, find the area of a triangle.

– Triangle-side – Given the base and side, find the height of a triangle.

Data consisted of 4102 data points involving 24 students, and 115 problem steps. Each data point is a correct or incorrect student action corresponding to a single production execution.

Typical header row: Student | Success | Step | Skill | Opportunities

Success is whether the student did that step correctly or not in the first attempt. 1 = success; 0 = failure. Step is the particular step in a tutor problem the students are involved in (“p1s1” stands for problem 1 step 1). Skill is the production rule used in that step. Opportunities mean the number of previous times to use a particular skill. It increments every time the skill is used by the same student, and can be computed from the first and fourth columns.

***

A **difficulty factor** – a property of the problem that causes student difficulties. By assessing the performance difference on pairs of problems that vary by one factor at a time, one can identify the hidden knowledge component(s) that can be used to improve a cognitive model.

A factor (embed) can have several values (embed, alone). One strategy is to examine how different factor types (embed-embed and embed-alone) affect a student’s success at employing a particular production rule.

****

**Combinatorial search** conducts model selection within the logistic regression model space. Difficulty factors are incorporated into an existing cognitive model through a model operator called Binary Split, which splits a skill a skill with a factor value, and a skill without the factor value.

A* search is the combinatorial search algorithm in LFA. It starts from an initial node, iteratively creates new adjoining nodes, explores them to reach a goal node. To limit the search space, it employs a heuristic to rank each node and visits the nodes in order of this heuristic estimate.

The heuristic guiding the search is one of the two scoring functions for regression models: AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion), two estimators for prediction risk. Each search is run twice, guided by a different heuristic each time. (Lower statistical scores are better.)

***

Ways in which LFA may improve the tutor and the curriculum:

- By identifying over-taught or under-taught rules
- Adjusting practice of certain skills in the curriculum

Examples:

Parallelogram-side has a high intercept (2.06) and a low slope (-.01). Its initial success probability is .94 and the average number of practices per student is 14.9. Much practice spent on an easy skill is not a good use of student time. Reducing the amount of practice for this skill should save student time without compromising their performance.

Trapezoid-height has a low intercept (-1.55), and a positive slope (.27). Its initial success probability is .29 and the average number of practices per student is 4.2. The final success probability is .69, far away from the level of mastery. More practice on this skill is needed for students to reach mastery.

Also, an original rule may have two split rules, each of which need decidedly different amounts of practice, because they have different initial difficulty and learning rates. However, students who have appeared to master the original rule in the curriculum before even reading the second split rule might not get enough practice on the second split rule.

With final probability .92 students seem to have mastered Compose-by-multiplication. However, the decomposition of the skill shows a different picture. CMarea does well with final probability .96. But CMsegment has final probability only .60 and an average amount of practice less than 2. The knowledge-tracing algorithm in the tutor may let the student go after he reaches the mastery on Compose-by-addition in the original model. But with the model found by LFA, the knowledge-tracing algorithm will be able to catch the weakness of students in acquiring CMsegment.