The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information (Miller)
Miller, G. (1956). The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information. The Psychological Review, 63, 81-97.
In this classic paper, Miller discusses the implications of findings from various studies probing the limits and functioning of our working memory. At one point, he summarizes span lengths thusly:
- Span of immediate memory=about seven items in length
- Span of absolute judgment=we can distinguish around seven categories
- Span of attention=encompasses about six objects at a glance
Miller points out that the similar capacity of these spans may seem to indicate a shared underlying mechanism, when in fact experiments provide evidence to the contrary. For example, if immediate memory is similar to absolute judgment, then the span of immediate memory should be limited by the amount of information an observer can retain, i.e., the span should be short when items contain a lot of information and the span should be long when items contain little information. However, Hayes found that subjects’ abilities to remember five different test materials (containing different amounts of transmittable information per item) did not vary as much as expected if memory span was indeed a function of the amount of information transmitted. Rather, concludes Miller: “Absolute judgment is limited by the amount of information. Immediate memory is limited by the number of items…the number of bits of information is constant for absolute judgment and the number of chunks of information is constant for immediate memory. The span of immediate memory seems to be almost independent of the number of bits per chunk…” (p.92-3). Furthermore, through a process called recoding, we can transform information so that there are more bits per chunk (and thus less chunks overall). Recoding, especially in terms of translating observations into a verbal code, is a common practice in our daily lives.
[*Note: Much of the following material was excerpted from the chapter…]
Miller suggests that we equate “amount of information” with “amount of variance” — this allows us to examine information retention in situations where metrics are not ordinarily considered, as well as compare the results of separate experiments (where dimensions are important). Here’s how he characterizes the relationship:
“When we have a large variance, we are very ignorant about what is going to happen. If we are very ignorant, then when we make the observation it gives us a lot of information. On the other hand, if the variance is very small, we know in advance how our observation must come out, so we get little information from making the observation” (p.82).
The input and output of a communication system can be described by their variance (or information); output is correlated with input — we may measure this correlation and surmise how much of the output variance is attributable to input and how much is due to “noise.” And finally, the measure of transmitted information is a measure of the input-output correlation.
Amount of information = variance
Amount of transmitted information = covariance or correlation
Person in the experiment = communication channel
One bit of information = amount of information needed to make a decision about two equally likely alternatives (p.82-3) [# alternatives = 2 to the n bits; or # bits = ln(# alternatives)/ln(2)]
In the experimental system, one can increase the amount of input information, resulting in a concomitant increase in transmitted information, followed by an eventual leveling off = channel capacity.
Channel capacity – “represents the greatest amount of information that he can give us about the stimulus on the basis of an absolute judgment. The channel capacity is the upper limit on the extent to which the observer can match his responses to the stimuli we give him” (p.82)
Two ways to increase info: increase the rate of input; increase number of alternatives (in this case, we are more interested in latter scenario).
Pollack – asked listener to identify tones of varying frequency by assigning numerals to them. After listener gave his response, the was told the correct identification of the tone. CC=2.5 bits (6 alternatives)
Garner – focused on loudness of tones (4,5,6,7,10 and 20 different intensities ranging 15-110 db). CC=2.3 bits (5 alternatives)
Beebe-Center, Rogers, and O’Connell – 3,5,9,and 17 different concentrations of salt solutions. CC=1.9 bits (4 alternatives)
Hake and Gardner – visual position (unlimited vs. limited response techniques). CC=3.25 bits (9.5 alternatives). Confirmed by Coonan and Klemmer.
Eriksen and Hake – judging the sizes of squares. CC=2.2 bits (5 alternatives)
Eriksen – 2.8 bits for size, 3.1 bits for hue, 2.3 bits for brightness.
Summary: “Channel capacity does seem to be a valid notion for describing human observers (M=2.6 bits, SD=.6 bit)…There seems to be some limitation built into us either by learning or by the design of our nervous systems, a limit that keeps our channel capacities in this general range. On the basis of the present evidence it seems safe to say that we possess a finite and rather small capacity for making such unidimensional judgments and that this capacity does not vary a great deal from one simple sensory attribute to another” (p.86).
“It is interesting to consider that psychologists have been using seven-point rating scales for a long time, on the intuitive basis that trying to rate into finer categories does not really add much to the usefulness of the ratings” (p.84).
Important to keep in mind that the experimental conditions necessitated unidimensional stimuli.
Experiments involving multidimensional stimuli: “The second dimension augments the channel but not so much as it might” (p.87)
Klemmer and Frick: position of a dot in a square (horizontal and vertical dimensions involved). CC=4.6 bits (24 alternatives), vs. 3.25 bits.
Beebe-Center, Rogers, and O’Connell: identify both sweetness and saltiness. CC=2.3 bits (5 alternatives), vs. 1.9 bits.
Pollack: loudness and pitch of pure tones. CC=3.1 bits (8.6 alternatives).
**Pollack and Ficks: six different acoustic variables (freq, intensity, rate of interruption, on-time fraction, total duration, spatial location), each of which presented five different values, all told presenting 15, 625 alternatives. CC=7.2 bits (150 alternatives). Closest approximation so far to real life experience.
“The addition of independently variable attributes to the stimulus increases the channel capacity, but at a decreasing rate…as we add more variables to the display, we increase the total capacity, but we decrease the accuracy for any particular variable. In other words, we can make relatively crude judgments of several things simultaneously…In order to survive in a constantly fluctuating world, it was better to have a little information about a lot of things than to have a lot of information about a small segment of the environment” (p.88-9).
“According to the linguistic analysis of the sounds of human speech, there are about eight or ten dimensions—the linguists call them distinctive features—that distinguish one phoneme from another. These distinctive features are usually binary, or at most ternary, in nature” (p.89) [vowels vs. consonants, oral vs.nasal consonants, front vs. middle vs. back phonemes].
“This approach gives us quite a different picture of speech perception than we might otherwise obtain from our studies of the speech spectrum and of the ear’s ability to discriminate relative differences among pure tones” (p. 89).
Kaufman, Lord, Reese, and Volkmann: flashed random patterns of dots (1-200+) on a screen. Subjects had to venture number of dots. Below seven — subjects were said to “subitize” [aka, “span of attention”]. Above seven, they “estimate.”
“Span of absolute judgment” – for unidimensional judgments, this span in in the neighborhood of seven.
Three ways we increase the accuracy of our judgements:
- To make relative rather than absolute judgments;
- To increase the number of dimensions along which the stimuli can differ;
- To arrange the task in such a way that we make a sequence of several absolute judgments in a row. “This device introduces memory as the handmaiden of discrimination” (p.91).
Span of immediate memory=about seven items in length
Span of absolute judgment=we can distinguish around seven categories
Span of attention=encompasses about six objects at a glance
**Do these spans reflects a single underlying process? NO, and here’s two experiments that provide evidence.
If immediate memory is like absolute judgment, then the span of immediate memory should correlate to the amount of information that an observer can retain, i.e., the span should be short when items contain a lot of information and the span should be long when items contain little information.
Hayes: five different test materials: binary digits, decimal digits, letters, letters + decimal digits, 1,000 monosyllabic words. Memory span=5-9. If span was dependent on the amount of information transmitted, then one would’ve expected a larger range (there is less info in each binary digit than monosyllabic words).
Pollack: amount of information transmitted increases almost linearly as the amount of information per item in the input is increased.
“Absolute judgment is limited by the amount of information. Immediate memory is limited by the number of items…the number of bits of information is constant for absolute judgment and the number of chunks of information is constant for immediate memory. The span of immediate memory seems to be almost independent of the number of bits per chunk…” (p.92-3).
Recoding– the process by which one transforms information so that there are more bits per chunk (and thus less chunks overall). Something we do constantly in our daily behavior, esp in terms of translating observations into a verbal code.
Smith: recoding schemes increased subjects memory span for binary digits.