Please Note: Jacquie's Lessons are derived from Rhoades to Reading and other publications authored by Jacquie Rhoades. Additional research is available upon request.
A Statistical Analysis of the Effectiveness of the Rhoades to Reading Intervention Project
at
Sequoia Middle School
Dr. David V Chavez
California State University, San Bernardino
Suzanne L Reid.
Loma Linda University
Muriel Lopez
Claremont Graduate School
Presented to Jacquie Rhoades & Dr. Maria Palacio
Sequoia Middle School
April 16, 1999
The purpose of this analysis is to statistically support the effectiveness of the reading program currently in place at Sequoia Middle School in the Fontana Unified School District using reading scores previously collected in 1997 & 1998. It is hypothesized that those students who participated in the reading intervention program would show greater growth between 1997 & 1998 than those students that did not participate. Towards this end, two hundred and thirtyone children who participated in the intervention and had both 1997& 1998 reading scores were compared to one hundred and four children who did not participate and had both 1997 & 1998 reading scores (nonintervention group). Children were selected to participate in the intervention based on initial assessment of reading ability with the poorer readers entered into the intervention group. The mean score for the children in the intervention group on the 1997 reading test was 4.197. The mean score for the children in the nonintervention group on the 1997 reading test was 4.483. A ttest revealed that this difference approached significance (t=1.97, df=333, p=.063). While not significant, the difference in the initial mean scores strongly recommended a second analysis that will be described later. A second ttest was run for grade equivalent growth of each group. This ttest revealed that the mean gain of .977 years for the intervention group was significantly greater than the mean gain of .507 years for the nonintervention group (t=3.49, df=207, p=.001). Interestingly, a third ttest revealed that the 1998 grade equivalent mean reading scores of 4.894 for the intervention group and 4.989 for the nonintervention group were not significantly different from each other.
Taken as a set, these analyses suggest that the two groups tested at relatively different levels in 1997, with those children placed into the intervention scoring below those that were not placed in the program. Test scores for 1998 revealed that in the intervention program gained almost a full grade in their abilities, while those that did not participate gained about half a grade. This difference in growth in a meaningful one as exhibited by the fact both groups’ reading skills are comparable in 1998.
As mentioned earlier, the groups were not equal in their pretest reading scores. This difference can be perceived as a limitation of the previous analysis. It can be argued that a group that begins at a lower level stands to gain more ground in the following year. While this argument belies the reality seen in education that low achieving children fall further and further behind year rather naturalistically“catching up”, it is a legitimate research critique. Therefore, a second analysis of selected cases was conducted.
In this second analysis, we selected out cases from the larger group – the intervention group – so that the remaining group would be comparable to the nonintervention group. In order to do this, the distribution of scores for both groups was remarkably similar (1.5 – 7.9 for the intervention group and 1.5 – 9.9 for the nonintervention group). It was the frequency distribution that was different with greater number of lower cases in the intervention group. This is consistent with the initial selection process for inclusion in the intervention group. Cases were randomly selected using a random numbers table for deletion in those instances where there existed a large number of cases in the intervention group (primarily in the lower range). A single case was also left at each score level in the intervention group when there was no representation in the nonintervention group to maintain some degree of semblance to the original sample. In addition, a single case was deleted from the nonintervention group because the reading score of 9.9 had no comparable matched case in the intervention group. This left a sample of 111 in the intervention group with a range of scores between 1.5 and 7.9 and a sample of 103 in the nonintervention group with a range of scores between 1.5 and 8.1.
Once case selection was done to create matched samples, the same series of ttests were conducted. The mean score for the remaining 111 children in the intervention group on the 1997 reading test was 4.475. The mean score for the remaining 103 children in the nonintervention group on the 1997 reading test was 4.430. A ttest revealed that this difference did not approach significance. This suggests that the groups were now comparable and the results from this second analysis would not have the same sample limitations as in the previous analysis. A second ttest was run for grade equivalent growth of each group. This ttest revealed that the mean gain of .934 years for the intervention group was significantly greater than the .554 years for the nonintervention group (t=2.15, df=212, p=.033). Interestingly, a third ttest revealed that the grade equivalent mean scores of the intervention group (5.409) now significantly exceeded that of the nonintervention group (4.984) (t=2.13, df=212, p=.034).
Taken as a set, these analyses suggest that the random selection of cases was successful in creating a comparable subset of intervention group scores for analysis. Test scores for 1998 revealed that those children in the intervention group gained almost a full grade in their abilities while those that were not in the intervention gained about half a grade. This difference in growth leads to the subset of comparable intervention children actually exceeding the reading abilities of the nonintervention children in 1998.
These analyses conducted suggest that the reading intervention was a successful one. Children are making meaningful gains that are helping them reach and exceed the reading level of children who are selected due to their higher initial reading scores. While this support is compelling, further research might randomly assign comparable cases from the outset to the two groups to determine whether the same findings persist.
Introduction This presentation presents results from a number of examinations covering students’ verbal skills. On each examination, a student would take a pre test and at a later point time a post test covering the same information, although using what is called a “parallel form” of the test. Using the same test for pre and post testing would bias results because of what are called “Memory Effects” or, simply stated, students would remember large amounts of information from the previous test. The fact that they would have time to discuss the pre test afterwards would further bias results. That error was not made in this study. Presenting Numeric Information Many find the science of statistics intimidating and horrifying. This is understandable considering terms like multicollinearity, coefficients of skewness and kurtosis, autocorrelation and others tend to glaze the eyes. This presentation will avoid use of intimidating terms and data are presented in terms of the students’ first test scores (Called Pretest hereafter), and last test score (Called Post Test hereafter). Note that each student has two scores from the pre test and post test. It would bias the study to pair a pre test score from Susie Smith with a post test score from Kevin Simms. Performance is judged on the basis of mean scores. “Mean” is another name for “Average.” To compute one, add up scores and divide by the number of scores. One problem area is how to determine what is called “Statistical Significance.” This is an arbitrary construct although on an intuitive basis easily understood. If a teacher’s salary is increased by ten cents per day, she would be earning more although hardly impressed. For this presentation’s purposes, it will use the word “Significant” interchangeable with “Huge,” “Tremendous,” and “Exceptional.” Realworld examples would include a student improving from 500 to 700 on the SAT or a golfer whose average for 18 holes was usually 110 suddenly starting to consistently shoot in the high 60’s. Space considerations do not permit providing visuals so the presentation will focus on the mean and what is called the t statistic. The t test for paired scores (it is incorrect to simply say one used a t test) is used as parametric data (data with decimal places) are used at what is called the .05 level of significance. In other words researchers acknowledge that regardless of the size of the t value they admit possibilities of their information being incorrect 5% of the time. This is the industry standard. Pre vs Post Word Reading Gr. Equiv 20072008
The means of the two tests (pre vs post) are 4.403 (pre) and 5.557 (post). Although there seems to be some distance between the two means, the paired t test must be used to determine if the distance is “Significant” or not. Although the computation is not a difficult one, the formulas used are sufficiently appalling that most people avoid them religiously. Those curious may refer to any statistics test for Paired (Related) T Test. Although it will only be done for this set of scores, showing an Excel printout will help.
Reading the table is easily done. The t stat was derived from computations based upon the scores. The t critical is the dividing line between what is declared significant and not significant. In other words if the t critical is larger than the t stat, nothing significant happened or, in layman’s terms, there was not enough improvement noted to state that the treatment applied (a reading program, new text etc) worked any better than what had been done in the past. With all of that said, we can state that on this particular test that students improved tremendously as the computed t stat of 2.49 is greater than the critical value of 2.059. We end the process with a plain English statement that students in this study improved substantially. From our information, luck was not a factor. Pre vs Post Sentence Comp Gr. Equiv 20072008 Students were administered this test under the same conditions of the others. We are dealing with paired scores as two scores were obtained from the same student at different points in time using parallel form tests defined earlier. Students’ performances improved significantly (tremendously) as the computed t statistic of 2.957 exceeded the critical value of 2.055. Pre vs Post Spelling Gr. Equiv. 20072008 Comparing the t statistic of 5.668 with the t critical value of 1.996 indicates improvement that not only is significant but staggering bringing to mind a weekend golfer who suddenly qualifies for the PGA Tour and becomes a consistent money winner. Pre vs Post Word Reading Gr. Equiv 20062007 Given the computed t statistic of 8.31592 when compared to the t critical value of 1.9949, we must conclude students made tremendous gains well outside the realms of chance. Pre vs Post Sentence Comp Gr. Equiv 20062007 Given the computed t statistics of 6.88763 when compared to the t critical value of 1.994945, students made tremendous gains over this study gains which cannot be attributed to chance. Pre vs Post Spelling Gr. Equiv. 20062007 The computed t statistic of 8.315 when compared to a t critical value of 1.99 indicates improvement on a level which can only be considered staggering.
 
.

No comments:
Post a Comment