As a teacher, I have been asked to make predictions as to how my pupils will do in GCSE and A-level exams more times than I can remember. At my previous school, we did this three times a year for A-level students (which made up 80% of my teaching).
I questioned the value of these predictions, especially after reading in Thinking Fast and Slow, about the illusion of expertise: the example given was of stockbrokers who consistently thought that they could out-perform algorithms in making good predictions. The data did not support them.
I had a database of several hundred A-level students from my school so I decided to calculate how accurate our predictions were and compare this to my super-hi-tech algorithm for predicting A2 performance: AS grade + 8 UMS points.
I then calculated the mean squared error in all of these predictions and you can see these numbers in the top right of the spreadsheet.
My super-hi-tech algorithm produced an error of 0.42. (note that I could have added anywhere between 6 and 11 UMS points and this doesn’t change much).
In January, the team of expert teachers (I’m not joking here: my colleagues were very experienced and effective teachers) produced an error of 0.64, in March they’d reduced this to 0.45 but it wasn’t until April, about a month before the exams that the experts finally beat the algorithm, with an error or 0.35.
This suggests that there was absolutely no point in making the earlier predictions. To be honest, I’m not sure what use the April predictions were either but at least they were slightly more accurate than the simplest model I could think of. Moreover, I think it shows how bad teachers are at judging students and why we shouldn’t use teacher assessment in reports, or school data generally. This point is also made well in Daisy Christodoulou’s blog: Tests are inhuman, and that is what’s so good about them.