Each year, students in thousands of school districts throughout the nation take the Measures of Academic Progress (MAP) test. Many school districts use norm tables prepared by the Northwest Evaluation Association (NWEA) in 2015 to report how their students are doing compared to other students in the nation. NWEA is the owner of the MAP test.
One common norm-based measure is the percent of students who scored at or above the national average. Another is the percentile rank of students’ average scores by subject and grade-level.
In recent years, Paul Zavitkovsky, a researcher and leadership coach at the Center for Urban Education Leadership Program at the University of Illinois at Chicago, noticed that MAP tests are generating higher normative results than those generated by the National Assessment of Education Progress (NAEP) and the Illinois PARCC tests.
In 2017, for example, the Chicago Public Schools system (CPS), which administers the MAP test to its students, reported that 73.5% of CPS eighth-graders scored at or above the national average in reading on the spring 2017 MAP test. In contrast, only 42% of CPS eighth-graders scored at or above the national average score in reading on the 2017 NAEP test.
The NAEP test is given every two years to a representative sample of students in fourth and eighth grades in every state and certain large cities, including Chicago. Sponsored by the U.S. Department of Education, NAEP assesses reading and math achievement and is commonly known as “The Nation’s Report Card.” Among assessment professionals, NAEP is widely considered to be the “gold standard” for large-scale, standardized testing in the United States.
“Different samples of the national test population will typically generate somewhat different results, but fully representative samples should produce roughly comparable results,” said Mr. Zavitkovsky.
In 2017, though, the percentage of CPS eighth-graders scoring at or above the national average on the MAP test was 31.5 percentage points higher than the percentage on the NAEP test.
Mr. Zavitkovsky said, “In hundreds of Illinois school districts, and thousands of school districts nationwide, parents and teachers rely on MAP scores to give them an accurate picture of how their kids are doing. When the results that NWEA reports for the MAP test get this far out of whack with NAEP results, somebody needs to say something.”
This article provides some background on MAP and NWEA’s 2015 norm tables. It illustrates that the percentile ranks generated on the MAP test for the average CPS student are significantly higher than the percentile ranks generated on the NAEP test. It also illustrates that much higher percentages of CPS students are reportedly scoring above the national average using NWEA’s 2015 norms than the percentages of CPS students scoring above the national average on NAEP and above the Illinois average on PARCC.
This article does not address NWEA’s growth targets. In addition, it does not address NWEA’s benchmarks for college readiness that were identified in a separate study and were not based on the 2015 norm study.
Background on MAP and NWEA 2015 Norms
All students who take the MAP test are assigned what NWEA calls a RIT score. In its 2015 norm study, NWEA determined percentile ranks for each RIT score by grade level and by subject. In its report, “NWEA 2015 MAP Norms for Students and School Achievement Status and Growth,” NWEA includes tables that show the percentile ranks for each score on the MAP test by grade level and subject tested.
Percentile ranks provide a measure of a student’s achievement status compared with the achievement status of his or her peers. For example, using NWEA’s 2015 norm tables, a score of 224 in eighth-grade reading on the Spring MAP test corresponds to a percentile rank of 60. This indicates that a student’s score is the same or higher than 60% of students tested nationwide.
NWEA says its 2015 norm study is based on K-11 grade samples, each composed of 72,000 to 153,000 student test records from approximately 1,000 schools. NWEA used student test scores from a three-year period, 2011-2014, in determining the percentile rank associated with each test score. NWEA says it employed rigorous procedures to ensure that the norms were representative of the U.S. school-age population.
The 2015 national norms replaced those that had been identified in a 2011 study conducted by NWEA. Page 8 of the 2015 Norm Study says that the status norms for grades 1 through 5 “minimally changed,” but the status norms for grade 6 and above “change a bit more than anticipated. Where differences were observed, the grade-level status means [i.e., the average] for the 2015 norms tended to be lower than the corresponding norms for 2011.”
The effect was “the percentile distributions, particularly in grades 6 and above, were shifted up. Thus, the same RIT score for the same grade-level, for the same season, in the same content area would be associated with a higher status percentile rank in the 2015 norms than in the 2011 norms,” says the study, at page 8.
In simple terms, the change from the 2011 to the 2015 norms meant that seventh- and eighth-grade students could exceed the national average on the MAP test with a lower RIT score
NWEA said the differences were “partially due to the use of a more sensitive analytic model,” and that there were several other areas that “appear promising for helping to explain the differences,” one of which was the effects of the Common Core State Standards.
Table No. 1 below shows the difference between using NWEA’s 2011 norm tables and NWEA’s 2015 norm tables at the eighth-grade level. The first column lists a RIT score. The second column reports the percentile rank of that score, using 2011 norm tables. The third column reports the percentile rank of the score using 2015 norm tables. The fourth column shows the increase in percentile rank for each score due to the using the 2015 norm tables rather than the 2011 norm tables. The fifth column estimates the size of the increase in percentile rank in terms of grade-level equivalents.
Table 1 illustrates that the same RIT scores produce higher percentile ranks using NWEA’s 2015 norms than they did using NWEA’s 2011 norms. In most cases, these differences are more pronounced for students who score below the 60th percentile.
MAP Generates Significantly Higher Percentile Ranks than NAEP
The average test score of students in a given grade, subject, and school district represents the performance of the average student in that grade, subject, and school district. The percentile rank of the average score represents how the average student in the school district compares to other students in the nation.
Using NWEA’s 2015 norm tables, the national percentile ranks of CPS students’ average scores on the 2015 and the 2017 MAP tests, are significantly higher than the national percentile ranks of CPS students’ average scores on the 2015 and 2017 NAEP tests.
As an example, the average score of CPS eighth-graders in reading on the 2015 MAP test was 224.8 (see Table 2 below). By NWEA’s reckoning, the average CPS eighth-grader performed at or above 61% of the students in the nation. A score at the 61st percentile is about 0.6 grade-level equivalents above the national average, says Mr. Zavitkovsky.
By contrast, the average score of CPS eighth-graders in reading on the 2015 NAEP test was 257. Grady Wilburn, Ph.D., Statistician at the National Center for Education Statistics, told the RoundTable that score corresponds to the 39th national percentile. A score at the 39th percentile is about 0.6 grade-level equivalents below the national average, says Mr. Zavitkovsky.
In this example, NWEA’s 2015 norms generated a percentile rank that is 22 percentile points, or about 1.2 grade equivalents, higher than that generated on the NAEP test, said Mr. Zavitkovsky.
Table 2 below provides the average score of CPS fourth- and eighth -graders in reading and math on the 2015 Spring MAP test and the 2015 NAEP test. It also provides national percentile ranks and matching grade equivalents for each score. Table 3 provides the same data for 2017. In all cases, MAP norms generate significantly higher percentile ranks and grade equivalents than NAEP.
It is important to recognize that MAP and NAEP use different scoring scales, so the average scores for each grade and subject will, of course, be different. But the percentile ranks of average scores purport to show how average scores compare with all students tested nationwide. Those ranks should be roughly the same for MAP and NAEP, says Mr. Zavitkovsky.
The difference in the percentile ranks is illustrated in Charts 1 and 2, below. See sidebar for data sources.
One possible reason for the difference in the MAP and NAEP percentiles is the difference in the way the percentile ranks for each test were determined. In its 2015 norm study, NWEA used test data from a representative sample of students for the period 2011-2014, and made other adjustments.
By contrast, the percentile rank of scores on the 2015 and 2017 NAEP tests were identified by determining where the CPS average score ranked among all other scores in the nation on the same test, in the same grade, in the same year.
Mr. Zavitkovsky discounted the difference in methodology as explaining the wide difference in the percentile ranks. He said, “These comparisons make it pretty clear that NWEA is reporting percentile ranks that are about a half-grade level higher than NAEP percentiles at grade 4, and almost a full grade level higher than NAEP percentiles at grade 8. Distortions like this go way beyond what might reasonably be explained by differences in test content or minor differences in sample populations.”
The RoundTable asked NWEA by email on June 26 and July 9 to explain why MAP was generating significantly higher percentile ranks than NAEP and to provide any other comments they would like. Although NWEA’s representative said they should have a response by July 20, NWEA did not provide an explanation or comment.
MAP Generates Much Higher Percentages of CPS Students Scoring Above the National Average than NAEP and PARCC
As part of its annual achievement summaries, CPS reports the percent of CPS students scoring at or above the national average on the MAP test. Using NWEA’s 2015 norms, the percentages of CPS students who score at or above the national average on the MAP test are much higher than the percentages of students scoring at or above the national average on the NAEP.
As an example, CPS reported that 73.5% of its eighth-graders scored at/above the national average in reading on the 2017 MAP test, using NWEA’s 2015 norms. In contrast, only 42% of CPS eighth-graders scored above the national average in reading on the 2017 NAEP test.
Table 4 below provides: a) the percentage of CPS fourth- and eighth-graders who scored at or above the national average on the spring 2017 MAP test in reading and math using NWEA’s 2015 norms, and b) the percentage who scored at or above the national average on the 2017 NAEP test in reading and math.
The differences are illustrated in Chart 3, above. See sidebar for data sources.
A comparison of MAP and PARCC results tells a similar story. The State of Illinois began administering the PARCC test to all third- through eighth-graders in 2015 as the state-mandated achievement test under the No Child Left Behind Act. Both the MAP and PARCC tests purport to assess mastery of the Common Core State Standards.
Charts 4 and 5, below, compare results on MAP and PARCC for 2016. Chart 4 is for reading, Chart 5 for math. They show: a) the percent of CPS third-through eighth-graders at CPS who scored above the national average score in reading and math on the 2016 MAP test, and b) the percent of third through eighth-graders who scored above the State of Illinois average score in reading and math on the 2016 PARCC test. The data for PARCC were provided by Mr. Zavitkovsky. See sidebar for sources of data.
Charts Nos. 4 and 5 illustrate two things. First, the percent of CPS seventh- and eighth-graders scoring above the national average in reading and math on the MAP test is significantly higher than the percentage of CPS third-through sixth-graders scoring above the national average in reading and math on the MAP test. For example, 56% of CPS sixth-graders scored above the national average in reading on the 2016 MAP test, compared to 65% for seventh-graders and 73% for eighth-graders.
Similar increases at the seventh- and eighth-grade levels did not occur on the PARCC test for 2016.
Second, the charts show that significantly higher percentages of CPS students are scoring above the national average on the MAP test, using NWEA’s 2015 norms, than are scoring above the Illinois average on PARCC. For example, 73% of CPS eighth graders scored at or above the national average score on the 2016 MAP test, compared to 43% on the 2016 PARCC test.
There are some obvious differences in the comparison between MAP and PARCC. The MAP data in Charts 4 and 5 present the percentage of CPS students above the national average, and the PARCC data presents the percentage of CPS students above the State of Illinois average. But Mr. Zavitkovsky says that Illinois students have historically mirrored national data. In both 2011 and 2015, statewide PARCC norms at grades 4 and 8 were statistically indistinguishable from NAEP norms at grades 4 and 8, he said.
Another difference between Chicago’s MAP and PARCC scores is that different numbers of students took the tests. At third grade, 9% more third-graders took PARCC than MAP. At fourth through eighth grades, the difference in the number of students who took the tests range from less than 1% to about 3%.
In January 2018 the RoundTable provided the data shown in Charts 4 and 5, in a different format, to NWEA and asked NWEA to comment on the differences.
John Cronin, Vice President, Education Research of NWEA, and Yeow Neng Thum, Sr. Research Fellow of NWEA and an author of NWEAs 2015 Norm study, told the RoundTable that they were not privy to the full methodology behind Mr. Zavitkovsky’s data and analyses, and so their response was limited to the information shared in the RoundTable’s request. They said in a Feb. 13, letter:
“The MAP Growth results cited in the graphs you shared show improvement related to a single stable, nationally representative sample - the 2015 MAP Growth norming group. The percentile scores reported for …. PARCC in the graphs are based on the results of each year’s assessment, which means the comparison group changed over time. In this case, the researcher is trying to compare percentile score changes as measured against a stable group (MAP), relative to score changes drawn from instable populations, which makes it impossible to draw conclusions about relative improvement.”
Mr. Zavikovsky told the RoundTable that in determining the percentage of CPS students who scored at or above the average scores of Illinois students on the 2016 PARCC test, he used the average scores of Illinois students on the 2015 PARCC test, which was the first year of the PARCC test. He determined the percentage of students who scored above the 2015 average on the 2016 PARCC test, using a publicly available student-level data set. His methodology was thus anchored in the average scores in 2015. He also said that the differences between the average scores of Illinois students on the 2015 PARCC test and 2016 PARCC test were “negligible.”
Mr. Cronin and Dr. Thum also told the RoundTable, “You are also likely familiar with the recent study of Chicago student achievement by Stanford professor Sean Reardon, which concluded that Chicago schools have shown a pattern of improvement based on ISAT and NAEP test results from 2008 through the 2015 school year. We believe that this corroboration of academic improvement stems from important methodological communality shared by our work. Analysis of growth is actually based on a vertically equated national grade-equivalent metric, not on percentiles per se. In the case of MAP Growth, MAP-percentiles are deployed as a normative reporting metric, and not as inputs to the analysis.”
The study, conducted by Dr. Reardon, Professor of Poverty and Inequality in Education at Stanford University, and other researchers, analyzed about 215 million standardized test scores taken by 40 million students in grades 3-8 in every public school in the United States between 2009 and 2014. The researchers standardized the scores into a common scale.
In a report on students in the Chicago Public School system titled, “Test Score Growth Among Chicago Public School Students, 2009-2014” (Nov. 1, 2017), Dr. Reardon said that CPS third-graders in 2009 grew by six grade-level equivalents in the five years between 2009 and the time they were eighth graders in 2014, or an average of about 1.19 grade equivalents per year.
There is no question that CPS students have shown significant growth in achievement during the last decade. Like Dr. Reardon, Mr. Zavitkovsky has also documented improved CPS achievement in numerous reports, including Taking Stock (2016) and Upstate/Downstate (2017). Both studies are available at urbanleadership.org/what-we-do/research/upstate-downstate-report.
Despite this growth, both Dr. Reardon and Mr. Zavitkovsky found that average achievement among CPS eighth graders still fell short of state and national averages. Dr. Reardon’s study found that in 2014, CPS eighth-graders were performing at a grade-level equivalent score of 7.39 in reading and 7.53 in math. These scores were 0.61 grade levels below the national average in reading and about 0.47 grade levels below the national average in math. These findings are statistically indistinguishable from the 0.60 and 0.45 grade equivalents below the national average that are reported in Table 2 (page 24) for Chicago eighth-graders on the 2015 NAEP, said Mr. Zavitkovsky.
The Differences Deserve an Explanation
The data presented in this article show that MAP is generating significantly higher norm-based results for CPS students than the NAEP test, the PARCC test, and the Stanford study. This article does not address whether the results of any particular test are more valid than another. Determining which test or tests offer the most valid estimates of student achievement, and why, is beyond the scope of this article. But the differences are significant and deserve more scrutiny and analysis.