However, a normal person’s IQ score is 85 to 115, which is 1 standard deviation away from the average. So if you fall within the IQ range of 85 to 115 you are 1 standard deviation from average and in the “normal” range.
![]()
Learning ObjectivesBy the end of this section, you will be able to:. Explain how intelligence tests are developed.
Describe the history of the use of IQ tests. Describe the purposes and benefits of intelligence testingWhile you’re likely familiar with the term “IQ” and associate it with the idea of intelligence, what does IQ really mean? IQ stands for intelligence quotient and describes a score earned on a test designed to measure intelligence. You’ve already learned that there are many ways psychologists describe intelligence (or more aptly, intelligences). Similarly, IQ tests—the tools designed to measure intelligence—have been the subject of debate throughout their development and use.When might an IQ test be used? What do we learn from the results, and how might people use this information? IQ tests are expensive to administer and must be given by a licensed psychologist.
Intelligence testing has been considered both a bane and a boon for education and social policy. In this section, we will explore what intelligence tests measure, how they are scored, and how they were developed.
MEASURING INTELLIGENCEIt seems that the human understanding of intelligence is somewhat limited when we focus on traditional or academic-type intelligence. How then, can intelligence be measured? And when we measure intelligence, how do we ensure that we capture what we’re really trying to measure (in other words, that IQ tests function as valid measures of intelligence)? In the following paragraphs, we will explore the how intelligence tests were developed and the history of their use.The IQ test has been synonymous with intelligence for over a century.
In the late 1800s, Sir Francis Galton developed the first broad test of intelligence (Flanagan & Kaufman, 2004). Although he was not a psychologist, his contributions to the concepts of intelligence testing are still felt today (Gordon, 1995). Reliable intelligence testing (you may recall from earlier chapters that reliability refers to a test’s ability to produce consistent results) began in earnest during the early 1900s with a researcher named Alfred Binet. Binet was asked by the French government to develop an intelligence test to use on children to determine which ones might have difficulty in school; it included many verbally based tasks.
American researchers soon realized the value of such testing. Louis Terman, a Stanford professor, modified Binet’s work by standardizing the administration of the test and tested thousands of different-aged children to establish an average score for each age.
As a result, the test was normed and standardized, which means that the test was administered consistently to a large enough representative sample of the population that the range of scores resulted in a bell curve (bell curves will be discussed later). Standardization means that the manner of administration, scoring, and interpretation of results is consistent. Norming involves giving a test to a large population so data can be collected comparing groups, such as age groups. The resulting data provide norms, or referential scores, by which to interpret future scores. Norms are not expectations of what a given group should know but a demonstration of what that group does know. Norming and standardizing the test ensures that new scores are reliable. This new version of the test was called the Stanford-Binet Intelligence Scale (Terman, 1916).
Remarkably, an updated version of this test is still widely used today. French psychologist Alfred Binet helped to develop intelligence testing. (b) This page is from a 1908 version of the Binet-Simon Intelligence Scale. Children being tested were asked which face, of each pair, was prettier.In 1939, David Wechsler, a psychologist who spent part of his career working with World War I veterans, developed a new IQ test in the United States. Wechsler combined several subtests from other intelligence tests used between 1880 and World War I. These subtests tapped into a variety of verbal and nonverbal skills, because Wechsler believed that intelligence encompassed “the global capacity of a person to act purposefully, to think rationally, and to deal effectively with his environment” (Wechsler, 1958, p. He named the test the Wechsler-Bellevue Intelligence Scale (Wechsler, 1981).
This combination of subtests became one of the most extensively used intelligence tests in the history of psychology. Although its name was later changed to the Wechsler Adult Intelligence Scale (WAIS) and has been revised several times, the aims of the test remain virtually unchanged since its inception (Boake, 2002). Today, there are three intelligence tests credited to Wechsler, the Wechsler Adult Intelligence Scale-fourth edition (WAIS-IV), the Wechsler Intelligence Scale for Children (WISC-V), and the Wechsler Preschool and Primary Scale of Intelligence—Revised (WPPSI-III) (Wechsler, 2002).
These tests are used widely in schools and communities throughout the United States, and they are periodically normed and standardized as a means of recalibration. Interestingly, the periodic recalibrations have led to an interesting observation known as the Flynn effect. Named after James Flynn, who was among the first to describe this trend, the Flynn effect refers to the observation that each generation has a significantly higher IQ than the last.
Flynn himself argues, however, that increased IQ scores do not necessarily mean that younger generations are more intelligent per se (Flynn, Shaughnessy, & Fulgham, 2012). As a part of the recalibration process, the WISC-V (which is scheduled to be released in 2014) was given to thousands of children across the country, and children taking the test today are compared with their same-age peers.The WISC-V is composed of 10 subtests, which comprise four indices, which then render an IQ score. The four indices are Verbal Comprehension, Perceptual Reasoning, Working Memory, and Processing Speed. When the test is complete, individuals receive a score for each of the four indices and a Full Scale IQ score (Heaton, 2004).
The method of scoring reflects the understanding that intelligence is comprised of multiple abilities in several cognitive realms and focuses on the mental processes that the child used to arrive at his or her answers to each test item (Heaton, 2004).Ultimately, we are still left with the question of how valid intelligence tests are. Certainly, the most modern versions of these tests tap into more than verbal competencies, yet the specific skills that should be assessed in IQ testing, the degree to which any test can truly measure an individual’s intelligence, and the use of the results of IQ tests are still issues of debate (Gresham & Witt, 1997; Flynn, Shaughnessy, & Fulgham, 2012; Richardson, 2002; Schlinger, 2003). THE BELL CURVE (I.E., NORMAL DISTRIBUTION)The results of intelligence tests follow the bell curve, a graph in the general shape of a bell. When the bell curve is used in psychological testing, the graph demonstrates a normal distribution of a trait, in this case, intelligence, in the human population.
Many human traits naturally follow the bell curve. For example, if you lined up all your female schoolmates according to height, it is likely that a large cluster of them would be the average height for an American woman: 5’4”–5’6”. This cluster would fall in the center of the bell curve, representing the average height for American women.
![]() ![]()
There would be fewer women who stand closer to 4’11”. The same would be true for women of above-average height: those who stand closer to 5’11”. The trick to finding a bell curve in nature is to use a large sample size. Without a large sample size, it is less likely that the bell curve will represent the wider population. A representative sample is a subset of the population that accurately represents the general population. If, for example, you measured the height of the women in your classroom only, you might not actually have a representative sample. Perhaps the women’s basketball team wanted to take this course together, and they are all in your class.
Because basketball players tend to be taller than average, the women in your class may not be a good representative sample of the population of American women. But if your sample included all the women at your school, it is likely that their heights would form a natural bell curve.
Are you of below-average, average, or above-average height?The same principles apply to intelligence tests scores. Individuals earn a score called an intelligence quotient (IQ). Over the years, different types of IQ tests have evolved, but the way scores are interpreted remains the same. The average IQ score on an IQ test is 100. Standard deviations describe how data are dispersed in a population and give context to large data sets. The bell curve uses the standard deviation to show how all scores are dispersed from the average score. In modern IQ testing, one standard deviation is 15 points.
So a score of 85 would be described as “one standard deviation below the mean.” How would you describe a score of 115 and a score of 70? Any IQ score that falls within one standard deviation above and below the mean (between 85 and 115) is considered average, and 82% of the population has IQ scores in this range. An IQ score of 130 or above is considered a superior level. The majority of people have an IQ score between 85 and 115.Only 2.2% of the population has an IQ score below 70 (American Psychological Association APA, 2013).
A score of 70 or below indicates significant cognitive delays, major deficits in adaptive functioning, and difficulty meeting “community standards of personal independence and social responsibility” when compared to same-aged peers (APA, 2013, p. An individual in this IQ range would be considered to have an intellectual disability and exhibit deficits in intellectual functioning and adaptive behavior (American Association on Intellectual and Developmental Disabilities, 2013). Formerly known as mental retardation, the accepted term now is intellectual disability, and it has four subtypes: mild, moderate, severe, and profound.
The Diagnostic and Statistical Manual of Psychological Disorders lists criteria for each subgroup (APA, 2013). Characteristics of Cognitive Disorders Intellectual Disability SubtypePercentage of Intellectually Disabled PopulationDescriptionMild85%3rd- to 6th-grade skill level in reading, writing, and math; may be employed and live independentlyModerate10%Basic reading and writing skills; functional self-care skills; requires some oversightSevere5%Functional self-care skills; requires oversight of daily environment and activitiesProfound. Since cognitive processes are complex, ascertaining them in a measurable way is challenging. Researchers have taken different approaches to define intelligence in an attempt to comprehensively describe and measure it.2.
The Wechsler-Bellevue IQ test combined a series of subtests that tested verbal and nonverbal skills into a single IQ test in order to get a reliable, descriptive score of intelligence. While the Stanford-Binet test was normed and standardized, it focused more on verbal skills than variations in other cognitive processes.
In order to understand test results from standardized tests it is important to be familiar with a variety of terms and concepts that are fundamental to “measurement theory,” the academic study of measurement and assessment. Two major areas in measurement theory, reliability and validity, were discussed in the previous chapter; in this chapter we focus on concepts and terms associated with test scores.
The basics Frequency distributionsA frequency distribution is a listing of the number of students who obtained each score on a test. If 31 students take a test, and the scores range from 11 to 30 then the frequency distribution might look like Table 1. We also show the same set of scores on a histogram or bar graph in Figure 1.
The horizontal (or x-axis) represents the score on the test and the vertical axis ( y-axis) represents the number or frequency of students. Plotting a frequency distribution helps us see what scores are typical and how much variability there are in the scores. We describe more precise ways of determining typical scores and variability next. Table 1: Frequency distribution for 30 scoresScore on testFrequencyCentral tendency measures03212226Mode233Median242Mean250262276Mode282292301TOTAL31. Figure 1: Tests scores from Table 1 represented as a bar graph Central tendency and variabilityThere are three common ways of measuring central tendency or which score(s) are typical.
The mean is calculated by adding up all the scores and dividing by the number of scores. In the example in Table 1, the mean is 24. The median is the “middle” score of the distribution—that is half of the scores are above the median and half are below.
The median on the distribution is 23 because 15 scores are above 23 and 15 are below. The mode is the score that occurs most often. In Table 1 there are actually two modes: 22 and 27.
Thus, this distribution is described as bimodal. Calculating the mean, median and mode are important as each provides different information for teachers. The median represents the score of the “middle” students, with half scoring above and below, but does not tell us about the scores on the test that occurred most often.
The mean is important for some statistical calculations but is highly influenced by a few extreme scores (called outliers) but the median is not. To illustrate this, imagine a test out of 20 points taken by 10 students, and most do very well but one student does very poorly. The scores might be 4, 18, 18, 19, 19, 19, 19, 19, 20, 20. The mean is 17.5 (175/10) but if the lowest score (4) is eliminated the mean is now is 1.5 points higher at 19 (171/9). However, in this example the median remains at 19 whether the lowest score is included. When there are some extreme scores the median is often more useful for teachers in indicating the central tendency of the frequency distribution.The measures of central tendency help us summarize scores that are representative, but they do not tell us anything about how variable or how spread out are the scores. Figure 2 illustrates sets of scores from two different schools on the same test for fourth graders.
Note that the mean for each is 40 but in School A the scores are much less spread out. A simple way to summarize variability is the range, which is the lowest score subtracted from the lowest score.
In School A with low variability the range is (45 − 35) = 10; in the school B the range is (55 − 22 = 33). Figure 2: Fourth grade math scores in two different schools with the same mean but different variabilityHowever, the range is only based on two scores in the distribution, the highest and lowest scores, and so does not represent variability in all the scores. The standard deviation is based on how much, on average, all the scores deviate from the mean. In the example in Figure 2 the standard deviations are 7.73 for School A and 2.01 for School B. In Exhibit 1 below we demonstrate how to calculate the standard deviation. Figure 5: Using trend lines to estimate grade equivalent scores.Grade equivalent scores also assume that the subject matter that is being tested is emphasized at each grade level to the same amount and that mastery of the content accumulates at a mostly constant rate (Popham, 2005).
Many testing experts warn that grade equivalent scores should be interpreted with considerable skepticism and that parents often have serious misconceptions about grade equivalent scores. Parents of high achieving students may have an inflated sense of what their child’s levels of achievement. ReferencesLinn, R. L., & Miller, M. Measurement and Assessment in Teaching 9th ed. Upper Saddle River, NJ: Pearson.Popham, W.
Classroom Assessment: What teachers need to know. Boston:, MA: Pearson.
![]() Comments are closed.
|
AuthorWrite something about yourself. No need to be fancy, just an overview. Archives
March 2023
Categories |