What is Effect Size?

Effect size is a simple measure for quantifying the difference between two groups or the same group over time, on a common scale. In an educational setting, effect size is one way to measure the effectiveness of an intervention. Effect size enables us to measure both the improvement (gain) in learner achievement for a group of learners AND the variation of student performances expressed on a standardised scale.  By considering both improvement and variation it provides information about which interventions are worth having.



How can we use effect size?

There are many ways in which to use effect sizes including investigating the effectiveness of an intervention for a defined group of students, comparing the effectiveness of different interventions, and evaluating growth over time.


Example

A curriculum leader is using effect size to understand and estimate the impact of an approach to reading comprehension by comparing achievement scores using PAT R Comprehension (or equivalent assessment) for the same students over a year.


In reviewing the school’s PAT R effect size results for the same students from Year 5, Term 3, 2010 to Year 6, Term 3, 2011 an effect size of 0.49 is recorded, but effect sizes for individual classes are 0.86, 0.42 and 0.18 respectively.


This indicates that more than the expected average progress is being made, and raises questions such as: “How well is what I am doing working for different groups of students each year and why?”, “What possible reasons could there be for some student or groups progressing more or less?” and “How does student progress compare with their achievement levels”?


How is Effect Size calculated?

Effect size is calculated by taking the difference in two mean scores and then dividing this figure by the average spread of student scores (i.e. average standard deviation*). To be valid, the spread of scores should be approximately distributed in a ‘normal’ bell curve shape. See formula below:


Effect Size (ES) =

Average of the post-test scores – Average of the pre-test scores

Average Standard Deviation*

 

*The average standard deviation in the above formula refers to the standard deviation for the pre-test and post-test data calculated individually, then averaged.


How can Effect Size be used reliably?

Multiple measures are still required: Effect size is only a single measure of progress and educators are encouraged to use multiple measures to reliably understand and replicate evidence of what works. It is difficult to draw any conclusions that an intervention is effective or ineffective using a single measure.


Caution for all small sample sizes and at the individual student level: Effect size for cohorts smaller than thirty are often not suitable for reliably estimating the impact of an intervention. Hattie suggests that care should be taken in the interpretation of any findings for small sample sizes as outliers in student scores can skew the effect sizes and may require special consideration. Effect sizes derived from small sample sizes and individual student effect sizes should only be used indicatively by the teacher to question - What possible reasons could there be for why that group of students recorded these estimated effect sizes? What will we do for students who are achieving at expected achievement levels but not the expected growth effect size? Interpretation of effect sizes for individual students is to be used with caution because we would expect larger errors in effect size at this level (refer to Appendix 1). Therefore individual level effects must always be used in addition with other reliable information and teacher professional judgement.


Accuracy is enhanced when comparing the exact same group of students: When comparing pre-test and post-test scores, it is most useful to ensure that all students are tested and that scores from the same group of students are compared. This enhances the accuracy and interpretation of the results.


NAPLAN effect sizes cannot be compared equally: NAPLAN effect sizes calculated for the Year 3-5 cohort should not be compared with Year 5-7 and Year 7-9 cohort effect sizes using the 0.4 average effect size interpretation. There are larger effect sizes for Year 3-5 than in Year 5-7 and Year 7-9. In addition, students at lower proficiency bands will tend to show greater gains than students in higher proficiency bands and care is needed for students that attain maximum or near maximum scores as it is difficult to show growth (due to this ceiling effect). It is recommended that NAPLAN effect size values only be compared over time for equivalent groups in the same school (e.g. Year 3-5), across statistically similar/like schools or with the corresponding state level effect size.