Please Note: Jacquie's Lessons are derived from Rhoades to Reading and other publications authored by Jacquie Rhoades. Additional research is available upon request.
A Statistical Analysis of the Effectiveness of the Rhoades to Reading Intervention Project
Sequoia Middle School
Dr. David V Chavez
California State University, San Bernardino
Suzanne L Reid.
Loma Linda University
Claremont Graduate School
Presented to Jacquie Rhoades & Dr. Maria Palacio
Sequoia Middle School
April 16, 1999
The purpose of this analysis is to statistically support the effectiveness of the reading program currently in place at Sequoia Middle School in the Fontana Unified School District using reading scores previously collected in 1997 & 1998. It is hypothesized that those students who participated in the reading intervention program would show greater growth between 1997 & 1998 than those students that did not participate. Towards this end, two hundred and thirty-one children who participated in the intervention and had both 1997& 1998 reading scores were compared to one hundred and four children who did not participate and had both 1997 & 1998 reading scores (non-intervention group). Children were selected to participate in the intervention based on initial assessment of reading ability with the poorer readers entered into the intervention group. The mean score for the children in the intervention group on the 1997 reading test was 4.197. The mean score for the children in the non-intervention group on the 1997 reading test was 4.483. A t-test revealed that this difference approached significance (t=1.97, df=333, p=.063). While not significant, the difference in the initial mean scores strongly recommended a second analysis that will be described later. A second t-test was run for grade equivalent growth of each group. This t-test revealed that the mean gain of .977 years for the intervention group was significantly greater than the mean gain of .507 years for the non-intervention group (t=3.49, df=207, p=.001). Interestingly, a third t-test revealed that the 1998 grade equivalent mean reading scores of 4.894 for the intervention group and 4.989 for the non-intervention group were not significantly different from each other.
Taken as a set, these analyses suggest that the two groups tested at relatively different levels in 1997, with those children placed into the intervention scoring below those that were not placed in the program. Test scores for 1998 revealed that in the intervention program gained almost a full grade in their abilities, while those that did not participate gained about half a grade. This difference in growth in a meaningful one as exhibited by the fact both groups’ reading skills are comparable in 1998.
As mentioned earlier, the groups were not equal in their pre-test reading scores. This difference can be perceived as a limitation of the previous analysis. It can be argued that a group that begins at a lower level stands to gain more ground in the following year. While this argument belies the reality seen in education that low achieving children fall further and further behind year rather naturalistically“catching up”, it is a legitimate research critique. Therefore, a second analysis of selected cases was conducted.
In this second analysis, we selected out cases from the larger group – the intervention group – so that the remaining group would be comparable to the non-intervention group. In order to do this, the distribution of scores for both groups was remarkably similar (1.5 – 7.9 for the intervention group and 1.5 – 9.9 for the non-intervention group). It was the frequency distribution that was different with greater number of lower cases in the intervention group. This is consistent with the initial selection process for inclusion in the intervention group. Cases were randomly selected using a random numbers table for deletion in those instances where there existed a large number of cases in the intervention group (primarily in the lower range). A single case was also left at each score level in the intervention group when there was no representation in the non-intervention group to maintain some degree of semblance to the original sample. In addition, a single case was deleted from the non-intervention group because the reading score of 9.9 had no comparable matched case in the intervention group. This left a sample of 111 in the intervention group with a range of scores between 1.5 and 7.9 and a sample of 103 in the non-intervention group with a range of scores between 1.5 and 8.1.
Once case selection was done to create matched samples, the same series of t-tests were conducted. The mean score for the remaining 111 children in the intervention group on the 1997 reading test was 4.475. The mean score for the remaining 103 children in the non-intervention group on the 1997 reading test was 4.430. A t-test revealed that this difference did not approach significance. This suggests that the groups were now comparable and the results from this second analysis would not have the same sample limitations as in the previous analysis. A second t-test was run for grade equivalent growth of each group. This t-test revealed that the mean gain of .934 years for the intervention group was significantly greater than the .554 years for the non-intervention group (t=2.15, df=212, p=.033). Interestingly, a third t-test revealed that the grade equivalent mean scores of the intervention group (5.409) now significantly exceeded that of the non-intervention group (4.984) (t=2.13, df=212, p=.034).
Taken as a set, these analyses suggest that the random selection of cases was successful in creating a comparable subset of intervention group scores for analysis. Test scores for 1998 revealed that those children in the intervention group gained almost a full grade in their abilities while those that were not in the intervention gained about half a grade. This difference in growth leads to the subset of comparable intervention children actually exceeding the reading abilities of the non-intervention children in 1998.
These analyses conducted suggest that the reading intervention was a successful one. Children are making meaningful gains that are helping them reach and exceed the reading level of children who are selected due to their higher initial reading scores. While this support is compelling, further research might randomly assign comparable cases from the outset to the two groups to determine whether the same findings persist.
Statistical Analysis of Rhoades to Reading
By Steven Brannon MBA
Introduction This presentation presents results from a number of examinations covering students’ verbal skills. On each examination, a student would take a pre test and at a later point time a post test covering the same information, although using what is called a “parallel form” of the test. Using the same test for pre and post testing would bias results because of what are called “Memory Effects” or, simply stated, students would remember large amounts of information from the previous test. The fact that they would have time to discuss the pre test afterwards would further bias results. That error was not made in this study. Presenting Numeric Information Many find the science of statistics intimidating and horrifying. This is understandable considering terms like multicollinearity, coefficients of skewness and kurtosis, autocorrelation and others tend to glaze the eyes. This presentation will avoid use of intimidating terms and data are presented in terms of the students’ first test scores (Called Pretest hereafter), and last test score (Called Post Test hereafter). Note that each student has two scores from the pre test and post test. It would bias the study to pair a pre test score from Susie Smith with a post test score from Kevin Simms. Performance is judged on the basis of mean scores. “Mean” is another name for “Average.” To compute one, add up scores and divide by the number of scores. One problem area is how to determine what is called “Statistical Significance.” This is an arbitrary construct although on an intuitive basis easily understood. If a teacher’s salary is increased by ten cents per day, she would be earning more although hardly impressed. For this presentation’s purposes, it will use the word “Significant” interchangeable with “Huge,” “Tremendous,” and “Exceptional.” Real-world examples would include a student improving from 500 to 700 on the SAT or a golfer whose average for 18 holes was usually 110 suddenly starting to consistently shoot in the high 60’s. Space considerations do not permit providing visuals so the presentation will focus on the mean and what is called the t statistic. The t test for paired scores (it is incorrect to simply say one used a t test) is used as parametric data (data with decimal places) are used at what is called the .05 level of significance. In other words researchers acknowledge that regardless of the size of the t value they admit possibilities of their information being incorrect 5% of the time. This is the industry standard. Pre vs Post Word Reading Gr. Equiv 2007-2008
t-Test: Paired Two Sample for Means
t Critical two-tail
The means of the two tests (pre vs post) are 4.403 (pre) and 5.557 (post). Although there seems to be some distance between the two means, the paired t test must be used to determine if the distance is “Significant” or not. Although the computation is not a difficult one, the formulas used are sufficiently appalling that most people avoid them religiously. Those curious may refer to any statistics test for Paired (Related) T Test. Although it will only be done for this set of scores, showing an Excel printout will help.
t Critical two-tail
Reading the table is easily done. The t stat was derived from computations based upon the scores. The t critical is the dividing line between what is declared significant and not significant. In other words if the t critical is larger than the t stat, nothing significant happened or, in layman’s terms, there was not enough improvement noted to state that the treatment applied (a reading program, new text etc) worked any better than what had been done in the past. With all of that said, we can state that on this particular test that students improved tremendously as the computed t stat of 2.49 is greater than the critical value of 2.059. We end the process with a plain English statement that students in this study improved substantially. From our information, luck was not a factor.Pre vs Post Sentence Comp Gr. Equiv 2007-2008 Students were administered this test under the same conditions of the others. We are dealing with paired scores as two scores were obtained from the same student at different points in time using parallel form tests defined earlier. Students’ performances improved significantly (tremendously) as the computed t statistic of 2.957 exceeded the critical value of 2.055. Pre vs PostSpelling Gr. Equiv. 2007-2008Comparing the t statistic of 5.668 with the t critical value of 1.996 indicates improvement that not only is significant but staggering bringing to mind a weekend golfer who suddenly qualifies for the PGA Tour and becomes a consistent money winner. Pre vs Post Word Reading Gr. Equiv 2006-2007Given the computed t statistic of8.31592 when compared to the t critical value of 1.9949, we must conclude students made tremendous gains well outside the realms of chance. Pre vs Post Sentence Comp Gr. Equiv 2006-2007Given the computed t statistics of 6.88763 when compared to the t critical value of 1.994945, students made tremendous gains over this study gains which cannot be attributed to chance.Pre vs PostSpelling Gr. Equiv. 2006-2007The computed t statistic of 8.315 when compared to a t critical value of 1.99 indicates improvement on a level which can only be considered staggering.