A new report, “Taking Stock, Achievement in Illinois Under No Child Left Behind,” provides a wealth of data on trends in student achievement, grades three through eight, in Illinois for the period 2001 through 2015, and it makes a number of surprising findings:

• For the last 15 years, achievement has increased in Chicago for all racial subgroups, while overall achievement in the rest of the State has for the most part stagnated,

• In Chicago, average growth proceeds at a fairly even pace between third and eighth grades, but in the rest of the State, growth slows markedly as students transition to middle school,

• Achievement for non-ELL (English Language Learner) Latino students has significantly outpaced that of black and white students, whose growth has stagnated since 2008.

The report also demonstrates that when a common grading system is used, the Illinois Standard Achievement Test (ISAT), the Measures of Academic Progress (MAP), the National Assessment of Education Progress (NAEP) and the PARCC test all produce similar results; and the report delves into what standardized tests actually test. It explains how PARCC has huge potential to support teacher conversations about what students know and where they are getting stuck, but is concerned that PARCC is not paying enough attention to making those potentials easily accessible for teachers and parents. (See sidebar.)

The 95-page report, prepared by Paul Zavitkovsky, Denis Roarty and Jason Swanson at the Center for Urban Education Leadership, University of Illinois-Chicago, is scheduled to be available on March 31.

Growth in Chicago v. Illinois – Using NAEP

Illinois students’ growth in achievement on the 2015 NAEP test in reading and math was “statistically flat” in 2015 … just like it was in 2013, 2011, 2007, and 2005, says the report.

“More disturbing still, NAEP results in 2015 offered further evidence that the only thing keeping statewide trends from outright decline was sustained growth in Chicago, which accounts for 20% of all statewide scoring.”

The NAEP test is widely recognized as the “gold standard” for American testing.

As an example, the percentage of Chicago fourth-graders who scored “proficient” or above in reading on NAEP increased from 14% to 27%, or by 13 percentage points, between 2003 and 2015. All Illinois fourth-graders (including Chicago) increased from 31% to 35%, or by 4 percentage points.

For math, the percentage of Chicago fourth-graders who scored “proficient” or above on NAEP increased from 10% to 30%, or by 20 percentage points, during the same period. All Illinois fourth-graders (including Chicago) increased from 32% to 37%, or by 5 percentage points.

NEAP reports the data separately for Chicago and all Illinois, but does not break it out for Illinois excluding Chicago. It is apparent, though, that Chicago carried the State between 2003 and 2015.

Chicago Third Graders Growing Faster Than State

Taking Stock found that results on the ISATs show the same trends. In making its analysis using ISAT data, the report analyzed changes in third-graders’ median scores in Chicago, and in Illinois outside Chicago. Using median scores gives a consistent measure of whole group growth over time.

For reading, Taking Stock found that the median ISAT score of third-graders in Chicago grew by 16 points between 2001 and 2014. In contrast, the median score of third-graders outside Chicago grew by only 3 points.

For math, the median ISAT score of third-graders in Chicago grew by 22 points between 2001 and 2014. It grew by only 1 point for third-graders outside Chicago.

Growth in Chicago was thus much higher than the rest of the State. The trends are similar when the data is disaggregated by both ethnicity and income. The report found that black, Latino, and white third-graders from both low-income households and from non-low-income households in Chicago grew at a significantly faster pace in both reading and math than their counterparts outside Chicago.

Significantly, the six Chicago subgroups not only grew at a much faster pace than their counterparts outside Chicago, but by 2014, the median score of all six Chicago subgroups either matched or exceeded that of their counterparts statewide.

Figure Nos. 1 and 2, below, illustrate the growth of the low-income subgroups in Chicago and outside Chicago in reading and math. The blue portion of each bar depicts the median ISAT score for each subgroup in 2001. The green portion of each bar depicts the point gain in the median score for each subgroup between 2001 and 2014.

Chicago’s Growth in Middle School

One reason for the accelerated growth in Chicago schools may be instructional, but another reason may be due to the configuration of Chicago’s schools where most students attend K-8 schools, rather than transitioning from a K-5 or K-6 school to a middle school – which is common in school districts outside Chicago.

Taking Stock compares the growth of students in Chicago in grades three through eight with the growth of students outside Chicago in those grades. This was done for low-income and non-low-income black, Hispanic and white students. Median scores were converted into standardized deviation units, so that growth rates would be comparable at each grade level from year to year.

The report finds, “In Chicago, average growth over time proceeds fairly evenly from grade three through eight while average achievement in the rest of Illinois slows markedly as students transitioned from intermediate grades 3-5 to middle school grades 6-8.”

The report says, “Surprisingly little policy attention has been paid to how the transition to middle school affects the achievement of early adolescents. Big differences in school organization in and out of Chicago offer an interesting opportunity to explore the question.”

Chicago may be bucking a national trend. Research shows that on average, children’s grades drop dramatically during the first year of middle school compared to their grades in elementary school.

Growth of Third Graders by Subgroup

One point emphasized in the report is that it is essential to control for changes in students’ demographics in order to get a meaningful picture of changes in student achievement.

Section 5 of the report analyzes subgroups by simultaneously controlling for race, income, and English language proficiency. For purposes of this analysis, the report considers only third-graders who are not ELL, and it disaggregates this group further by both ethnicity and income. As an example, the report analyzes the data for a subgroup of third-graders who are low-income Hispanic and who are not ELL.

In making this analysis, the report uses a consistent measure, namely the percentage of students who scored above the statewide median score (i.e., the 50th percentile).

Figure Nos. 4 and 5, below, show the percentage of non-ELL third-graders in Illinois who scored above the statewide median score in math and reading combined between 2001 and 2014. Figure No. 3 provides the data for low-income black, Latino and white students. Figure No. 4 provides the data for non-low-income students.

Latino students showed the highest growth, with significant improvement between 2001 and 2014.

Black students, both low-income and non-low-income, made significant gains between 2001 and 2008. After 2008, their growth flattened out.

White students, both low-income and non-low-income, made gains between 2001 and 2008, but then low-income white students showed declines between 2008 and 2015; and the achievement of non-low-income white students flattened out.

The charts illustrate two other points. First, there is a gap in achievement between ethnic groups even when low-income subgroups and non-low-income subgroups are compared. Significantly though, by 2014, low-income Latino students outperformed low-income white students. The gap for this subgroup was eliminated. The gaps between white and black students were reduced.

Second, for 2015, the charts include results on the PARCC test, which was given for the first time in 2015. The results on PARCC are very close to the results on the 2014 ISAT. The report contains many other charts that illustrate this point.

Research cited in the report shows that students who are not proficient in reading in third grade are four times more likely to drop out of high school than proficient readers. Struggling readers are six times more likely to drop out.

Suburban Chicagoland vs. Other Regions

The table below shows the median score of students in five regions of the State on the 2006 and 2014 ISATs. The table shows the highest growth was in Chicago, followed by Suburban Chicagoland (DuPage, Kane, Lake, McHenry, Will and suburban Cook counties) which has a total of 1,145 schools serving approximately 40% of students tested statewide.

The table also shows that the average median ISAT score of students in Suburban Chicagoland in 2014 was about 8 points higher than the City of Chicago and about four or five points higher than the rest of the regions.

Because ISAT scale scores have different values at different grade levels, it is difficult to give a precise answer as to what the growth in scores and the change in scores indicate. The report notes, however, that the growth in median scores between third grade and fourth grade on a statewide basis is about 11.5 points; the growth in median scores between seventh grade and eighth grade is about 8.5 points. A difference of 10 points may represent about one-year’s growth.

The percentage of low-income households in Suburban Chicagoland increased from about 17% to 37% between 2006 and 2014.

Taking Stock takes a look at the similarity of results produced by the ISAT, MAP, NAEP and PARCC, delves into what they actually test, considers whether they generate data in a format useful to aid teachers in instructing students, and proposes ways to improve reporting the results on PARCC.

ISATs and Other Tests Produce Similar Results

The publicly reported results on the ISAT, MAP, NAEP and PARCC differ significantly because the benchmarks to “meet standards” or to be “proficient” differ significantly for each test. The apparent differences on all of these tests, though, mostly disappear when the tests are graded in the same way. Taking Stock illustrates that if a common grading system is applied, the tests all produce virtually identical results.

Figures 3 and 4, on page 26, illustrate that the statewide results for six different student subgroups were virtually the same on PARCC as they were on the ISATs, when a common grading system is used. Taking Stock illustrates the same point in charts comparing PARCC and ISAT results for all 56 school districts in Illinois’ Large Unit District Association.

The reason most standardized tests produce similar results is that they are designed to measure “general knowledge” – or higher order thinking and depth of knowledge – more than specific skills and content knowledge, says the report.

Measuring Critical Thinking

“The most persistent message about testing during the NCLB era has been that scoring on standards-based assessments is based on mastery of specific content. But a decade and a half of standards-based test results tell a very different story,” the report says. “Scoring on the ISAT and most other standardized tests was heavily determined by something else that produced similar test results across tests and across content areas. That something else is what assessment professionals euphemistically describe as ‘general knowledge,’” which is higher order thinking and depth of knowledge.

Standardized tests “most important job is to estimate the depth and breadth of students’ academic strengths, and to identify where that estimate fits on a standardized continuum of academic capacities. Numerical scales are the yardsticks used to represent that continuum. Scale scores are the ‘units of knowledge’ that make up that yardstick.”

“Higher scale scores have at least as much to do with depth and breadth of student thinking as they do with the volume of discrete skills and concepts that students have mastered. For the most part, students who are able to size up and work through items and passages that reflect higher levels of depth and complexity earn higher scale scores than students who get stumped by those items.”

Taking Stock gives an example from a fourth grade math exam on NAEP. Students were shown the thermometer pictured below and asked what temperature the thermometer showed, and given a choice of four answers: 43, 46, 52, or 54 degrees. The correct answer is 46 degrees, which 47% of the students selected; 52% selected 43 degrees.

The report says that most students who failed to select the correct answer did so because they did not attend carefully enough to the scale to infer that each tick represents two degrees instead of one. “Basic inferential reasoning and rudimentary understanding of ratio and proportion were required to answer the question.

“Contrary to stereotype, inferential reasoning and conceptual understanding are central requirements for achieving higher scale scores on virtually all standardized tests.”

One important aspect of the scoring of a standardized test is that each question is given a different “value,” which is based on the “correct-response frequencies that items produce when they are administered to large, representative samples of typical test takers.” Questions with a lower correct-response rate are assigned a higher value, and because of this scale scores “are statistical abstractions,” that do not assess specific skills and content knowledge, but “assess the probability that test takers have of being able to respond successfully to different types of skill, content, and ways of academic thinking. This information makes it possible to rank student proficiencies along a continuum of academic difficulty. In turn, this ranking “creates a reliable predictor of future performance within normal margins of error,” says the report.

The ISAT’s Rigor

Taking Stock reports that “weak standards” were not the major reason why the ISAT was heavily weighted with easier questions. Because Illinois policymakers set very low cut scores for “meeting standards” on the ISAT, test makers had to include larger numbers of questions at lower skill levels to increase the reliability of scores that were connected most directly with high-stakes accountability. By loading up the ISAT with questions designed to assess knowledge at low levels of achievement, it gave the test the feel of being an “easier test than it actually was,” says Taking Stock.

Despite this limitation, the ISAT still had enough questions at higher levels of difficulty to measure and predict progress toward college readiness about as well as other more reputable tests, says the report. Scoring at the 60th percentile of statewide ISAT scores in reading was a reliable eighth grade predictor of meeting ACT’s college readiness benchmarks in 11th grade. Adding more questions at upper achievement levels, as is done in the PARCC exam, makes PARCC scoring more reliable at the middle and upper ranges of student achievement.

Some Advantages of PARCC

While the ISATs were capable of measuring higher order thinking, “What they lacked was the ability to report back deep, rich information about how students are thinking and where they were getting stuck,” says Mr. Zavitkovsky. “They were not designed to return useful diagnostic information to support teaching and learning.”

The real promise of PARRC, he says, is “It is specifically designed to gather rich information about what students know and where they’re getting stuck.” Unlike the ISATs, PARCC has also committed to release many of the actual questions that students were tested with. This “carries huge potential for supporting classroom conversations at all grade levels, teacher-to-teacher conversations in grade/departmental teams, and parent-student conversations across the kitchen table.”

While PARCC has this potential, Mr. Zavitkovsky told the RoundTable, “The first round of PARCC reports was really disappointing. I say that because it didn’t offer educators and parents clear, concrete examples of what new scores and proficiency levels mean, much less how they can be used to improve teaching and learning. Without that kind of information, it’s only natural for people to wonder why PARCC should be done at all.”

The report concludes with a reminder that “the original promise of standards-based assessment was to provide educators and parents with meaningful, standards-based information about what students are learning and where they are getting stuck.” It says, “making actual items and student responses readily available to educators and parents in user-friendly formats will go a long way toward delivering on that promise.”

Taking Stock has a wealth of additional data and information. Parts I and II of Taking Stock are available by clicking on Taking Stock below. The final report is scheduled to be posted on March 31.

ISAT Cut Scores and College Ready Benchmarks

Taking Stock concludes that the cut scores to “”meet standards”” on the ISAT that were adopted in 2006 were set extremely low, one to two years below grade level compared with state and national norms – or at about the 20th percentile. While ISBE raised the cut scores in January 2013, the new scores were at the lower edge of grade level on statewide scoring distributions – or at about the 42nd percentile. Both the 2006 and the 2013 cut scores were far below the level of achievement required to be on track to meet ACT’s college readiness benchmarks in eleventh grade.

The report concludes that a student in third through eighth grade would have needed to score at about the 60th to the 65th percentile on the ISATs to be on track to meeting ACT’s benchmarks for college readiness in eleventh grade.

Significantly, the cut scores to be “”proficient”” on NAEP correspond to the 65th – 70th percentiles, and the cut scores to be “”proficient”” on the 2015 PARCC test correspond to the 60th and 68th percentiles in reading and math respectively.