Friday, 2 January 2015

The Standards Puzzle

"Have standards improved over time?" is one of the most persistent questions in education policy. And understandably so - under the last Government spending on education doubled, so it's reasonable to want to know if that's made any difference to the "output" of the education system.

Unfortunately our main measuring tools in education - national exams - are useless for answering the question. First because they've changed so often and secondly because they are used for school accountability and so schools have got better at teaching to the test (this isn't a criticism - it's an entirely rational response to a clear incentive).

Professor Rob Coe at the University of Durham has made the best attempt at using alternative "low stakes" test data to look at standards over time. In his inaugural lecture he set out his analysis of international tests like PISA as well as Durham's own tests, used by many schools. His conclusion: "The best I think we can say is that overall there probably has not been much change."

In the absence of any better evidence I'd have to agree with Professor Coe that this is what the data shows. And yet it feels counter-intuitive. I've worked in education for ten years and it certainly feels to me that schools have improved over that time. Likewise most of the more experienced teachers and headteachers I've discussed this issue with think things have got significantly better too.

Of course this could simply be cognitive biases at work. We all desperately want things to improve so we convince ourselves they have. But a recently published DfE report suggests another possible explanation.

The snappily titled "Longitudinal study of young people in England: cohort 2" (LSYPE2) will track 13,000 young people from year 9 to the age of 20. The first cohort were tracked from 2004-2010. Comparing the two cohorts will show how things have changed over the past ten years. This report looks at the first year's data from the new cohort and compares it with the first year's data from 2004.

And the trends are very clear. Ironically, given the recent obsession with "character building" amongst policymakers, there have been big improvements in a range of "non-cognitive" measures. Reported bullying has fallen; the percentage who claim to have tried alcohol has plummeted; aspiration has increased - higher percentages say they are likely to go to university; and relationships with parents seem to be a lot stronger too.

This is entirely consistent with other data showing massive falls in risky behaviours by young people over the last decade - including a huge fall in criminal behaviour. As well as big increases in participation in education post-16 and in higher education.

All of this would suggest I, and others, are not imagining it when we claim that schools are - on average - nicer places to be than ten years ago. And that pupils are making more progress, at least in the sense that more are going on to further and higher education.

But here's the puzzle: given the improvements in behaviour; the reduction in criminality; the falls in truancy; the increase in aspiration; the improvements in home lives - all of which are known to link to academic attainment - why haven't we seen a commensurate, observable, rise in academic standards? Either academic standards have actually improved, but we just don't have the measurements available to identify it properly, or something is happening in schools that's preventing us capitalising on these "non-cognitive" improvements to genuinely improve standards. So what's going on?

All thoughts welcome!

Tuesday, 15 July 2014

Education after Gove

I wasn't expecting to be writing this post today. There has been a rumour going round for months that Michael Gove would be moved to an election role - though as Party Chair rather than Chief Whip. But in recent weeks all the noises were that he would be staying in post until the election.

Tim Montgomerie said earlier on twitter that: "I understand Osborne opposed Gove move but dire opinion polling presented by Lynton Crosby of MG's standing with teachers forced change." Another possibility is that he's simply got fed up with being blocked on any further policy by the Lib Dems. If holding the fort until the election is all that's left then he's happy for someone else to do it.

Whatever the reason what does his departure mean for education? Here are a few initial thoughts:

1) There are unlikely to be any major policy reversals. No. 10 have very deliberately ensured the new Secretary of State Nicky Morgan is surrounded by Govites - Nick Boles, Nick Gibb and John Nash. Moreover Gove himself will still be in No. 10 and will be in the PM's daily meeting - he is still in a position to prevent anything he thinks would significantly undermine his legacy. What's more the election is only 10 months away - it's not a time for big U-turns.

2) I don't know Nicky Morgan and she doesn't have a track record in education but I'm sure, like all ministers, she will want some specific policies that are identified as hers. Briefing around her appointment suggests her thing will be early years and the Conservatives certainly see this as a key election battleground. Labour already have some big and expensive policies in this space.

3) Some officials will see the appointment of a new Secretary as an opportunity to tweak various policies they are worried about. The slew of upcoming exam changes is an obvious area where they may try to use her appointment to lengthen the timelines of reform. There is also an on-going review of ITT which may not now go as far as might have been expected.

4) There won't be much time for the new Secretary to learn her brief. Her first crisis might come as soon as next month. Ofqual recently wrote to schools saying they expected greater than normal turbulence in exam results this summer as a result of earlier reforms (including linearity and the end of vocational equivalences). Last time there was greater than normal turbulence - in English results in 2012 - there was a firestorm of complaints from schools than ended in a judicial review. Even if results' week passes without incident, in September we have the launch of a new curriculum; new rules on assessment; the introduction of free school meals in key stage one; compulsory English and Maths post-16 for those without GCSE "C" grades; and about a dozen other things. While the Govian reform phase may be over the implementation phase is at a critical moment.

5) Gove's enemies may be celebrating prematurely. Though policy is unlikely to change much it will be significantly harder to demonise Nicky Morgan than it has been to attack Gove. He was something of a unifying factor for the teacher unions - the last NUT strike was effectively an anti-Gove demonstration. They may find their campaigns lose some momentum now.

Now is not the time for a proper retrospective of Michael Gove's time at the DfE. But - as Andrew Old says - perhaps his greatest achievement has been to normalise comprehensive education for the Conservative party; to shift the argument from "saving" a few bright poor kids through grammar schools or assisted places to creating a genuinely world class system for all. In time I suspect that will be more widely recognised than it is now.

Saturday, 28 June 2014

The London Schools Effect - what have we learned this week?

Perhaps the biggest question in education policy over the past few years is why the outcomes for London schools have been improving so much faster than in the rest of the country. I wrote about it here last year. Until now there's been little in the way of research into the question but last week two reports came out - one by the IFS and one from CFBT - that seek to provide some answers.

They both agree that the change in GCSE results has been spectacular. There's plenty of data in both reports on this but I found this graph from the IFS particularly powerful because it relates to a metric that isn't something schools are held accountable to - and so feels like authentic proof that something extraordinary has happened in London.

But what, exactly, has happened? Here the two reports seem to disagree. According to the IFS - whose analysis is purely quantitative the main reasons are:
  • Changes in pupil and school characteristics - in particular London and other inner-city areas have seen an increase in pupils from a range of ethnic backgrounds (partly) as a result of immigration. The IFS analysis suggests this accounts for about half the improvement in London between 2002-2012.
  • Changes in "prior attainment" - the authors argue that once higher levels of attainment in key stage 2 (end of primary) tests are taken into account then the "London effect" in secondaries looks less impressive. Indeed once prior attainment and changes in pupil/school characteristics have been controlled for the gap between London and the rest of the country falls from 21 percentage points in the  5 A*-C GCSE with English and Maths measure to just 5 percentage points. Moreover this gap is fairly stable between 2002-2012 - though it does increase a by about 2 percentage points over the period.
  • There was a big increase in key stage 2 schools for disadvantaged pupils between 1999-2003 and that led to big increases in GCSE scores for these pupils between 2004-08 - but the GCSE improvement was actually the result of prior attainment. The authors hypothesise this may be due to the introduction of "national strategies" in primary literacy and numeracy in the late 90s - these were piloted in inner London authorities (as well as some other urban areas e.g. Liverpool).
  • London secondaries do have a better record at getting disadvantaged pupils to stay in education post-16. After controlling for pupil/school characteristics they are around 10 percentage points more likely to stay in education.

The CFBT report does include quantitative analysis but is much more focus on qualitative research - specifically interviews with headteachers, academics, civil servants and other experts. This report argues the key reasons for London's improvement are:
  • Four key "improvement interventions" between 2002 and 2014 - the "London Challenge" (a Labour initiative that used data to focus attention on weaker schools and used better schools to support their improvement); Teach First; the introduction of sponsored academies; and improvements driven by local authorities.
  • They conclude that: "each of these interventions played a significant role in driving improvement. Evaluations of each of these interventions have overall been positive, although the absence of RCT evidence makes it impossible to identify the precise gains from each set of activities. The exact causal mix also varied from borough to borough because there were variations in the level of involvement in London Challenge, variations in the effectiveness of local authority activity, variations in the level of ‘academisation’ and variations in the level of input from Teach First."
  • The authors argue that there were cross-cutting themes covering these interventions and the wider improvement story. In particular - the better use of data; practitioner-led professional development and, particularly, leadership - both politically and at school level.

At first glance it's hard to reconcile the positions taken in the two reports. The IFS focus on primary, and to a lesser extent pupil characteristics, while CFBT focus on secondary policy changes. I think, though, they are two different bits of an extremely complicated jigsaw that hasn't been finished yet - and because of the lack of evidence/data - never will be. Like the apocryphal blind men with the elephant they're looking at different parts of the whole.

1) Both reports probably underestimate the importance of changes in pupil characteristics. CFBT completely dismiss this as a driver based on an inadequate analysis of ethnicity data. The IFS analysis is more comprehensive and so does pick up a significant effect but may still miss the true extent because of the limitations of available data on ethnicity. I think this may explain the extent of the "primary effect" in the IFS report. Essentially they're saying the big improvements in GCSE results are partially illusory because they were already built into those pupils' primary attainment. However, they are unable (because of a lack of data) to analyse whether those primary results were also partly illusory because those pupils started primary at a higher level.

There is a clue that this may be a factor in their analysis of Key Stage 1 data for more recent years. Controlling for prior attainment at KS1 reduces the "London effect" at Key Stage 2 by about half. But the authors are unable to do this analysis for the crucial 1999-2003 period when results really improved. They are also unable to look from the beginning of primary - because we don't have baseline assessments when pupils start school.

2) The IFS report probably underestimates the secondary effect. As Chris Cook has shown the London secondary effect at least doubles if you exclude equivalents.

3) The CFBT report definitely underestimates the primary effect because it doesn't look for it. Thought there are some quotes from people who worked in local authorities during the crucial period who highlight their focus on literacy and numeracy during the late 90s.

So pupil characteristics; primary schools and secondary schools all seem to have played a role in boosting attainment in London. The CFBT report is convincing on some of the factors at play in secondaries; the IFS report is convincing that primaries also played some kind of a role. The big questions for me after digesting both reports:

  • Are there "London specific" pupil characteristics that wouldn't be apparent from the available data. E.g. are immigrants who go to London different to those who don't? Are some of the ethnicity effects stronger than indentified because key groups (e.g. Polish) are hidden in larger categories?
  • Are there policy reasons why London primaries improved faster than those elsewhere in the crucial 1999-2003 period? I struggle to buy the idea that the national strategies were the key driver here as they were rolled out nationally (albeit that the pilots were focused on inner London). But the quotes in the CFBT report suggest their might be something here around a general focus on literacy/numeracy. This is a key area for further research.
  • To what extent were the policy interventions (London Challenge, academies etc...) the main reasons for secondary improvement? Or was it more to do with the number of good school leaders during that period? One of the most interesting tables in the CFBT report - pasted below - shows that inner London is the only part of the country where headteacher recruitment has got easier in the last ten year. And the importance of leadership shines through in the interviews conducted for the CFBT report. Is it possible to more closely identify the relationship between individual leaders and school improvement? What can we learn from these leaders?

And of course the really big question - is any of this replicable in other areas? We're starting to see a raft of local improvement initiatives across the country - Wales Challenge; Somerset Challenge; North East Challenge and so on. It's really important that in these areas we do a better job of evaluating all the interventions put in place from the start so that if we see big improvements we have a better understand of the causes.

Further reading:

The IFS report

The CFBT report

Chris Cook's analysis

Loic Menzies - one of the CFBT authors - on the two reports

The London Challenge evaluation by Merryn Hutchings and others

Transforming Education For All: The Tower Hamlets Story by Chris Husbands et al

Wednesday, 14 May 2014

In defence of baseline assessments

Earlier this year the DfE announced new proposals for holding primary schools accountable. These include a "baseline assessment" for pupils in reception. Primary schools that opt-in to using this assessment will then be measured on the progress pupils make over the course of their time in school rather than on the raw results of Key Stage tests.
It's fair to say the idea hasn't been universally welcomed. While the NAHT have made some positive noises the NUT have voted to investigate boycotting these assessments. And I suspect their position is held by the majority of early years teachers.
I don't think the DfE proposal as it stands is perfect - for one thing the suggestion that schools could pick from a range of assessments seems unhelpfully complex. But, given we have high-stakes accountability for primaries, and that this isn't going to change any time soon, the principle seems sensible to me.
However, opponents of the tests have raised some reasonable concerns, particularly that the assessments could be used to "label" children from a young age. I recently received an email from a correspondent (who doesn't wished to be named) which shows how labelling could be avoided while still allowing primaries to be measured on the progress they were making rather than their raw scores, regardless of intake. I've reposted the email in full as, I think, it shows how the benefits of the assessments could be secured without the negatives opponents are worried about.
"My starting point on baseline assessments is that a teacher's focus for ages 4-7 should mostly be about absolutes rather than relatives. As an absolute bottom line, every 7yo should have completed learning to decode (including the complex code, not just the simplified initial code) and thus to read with reasonable fluency, to write properly, to spell (though not full spelling code mastery by 7), and have had opportunities to practice their new skills in worthwhile activities; and similarly for maths. These aspirations should be there for all children (with the perennial exception of true heavy-duty special needs), not just the brighter ones. KS1 assessment ought to be showing us whether these aspirations are met.

But this creates a problem, in that children arrive at primary school with very different levels of development and (though many hate the idea) variable capacities to learn. School intakes are far from homogeneous, and the accountability system will persistently punish some schools if we simply compare KS1 outcomes and don't recognise this. In a high-accountability world, this creates disincentives to work in and run these schools, which over time will tend to lead to differences in teacher and curriculum quality, creating a vicious circle.

I therefore think it is important to have a measure in the system that provides a primary education baseline, so from the first term of Reception. I also favour a test over teacher assessment: teachers are too conflicted otherwise. But I would explicitly make this a measure of schools, not pupils. I might send schools information about cohort performance: average score vs national average, range from highest to lowest, probably no more than this: really just enough for schools to see that there is a fair external perspective on their intake, and to have a sense of what overall level of performance at KS1 ought to be expected. But I would definitely NOT give them individual child scores, nor would I give these to parents. (This sounds shocking to many ears, but it is in fact absolutely normal - eg schools administer all sorts of tests for internal purposes whose results don't go to students or parents.) So children would not be labelled, and schools could not set differentiated child level targets explicitly designed to meet specific Ofsted progress expectations. The child level data would sit in the NPD until needed for KS1 progress/VS calculations for all matched children.

This would allow proper assessment of progress and value-added from YR to Y2 at school (and perhaps classroom) level, but without individual labelling with all its negative consequences and without refocusing lower primary teachers away from absolute expectations. And I really do think that this early stage accountability is necessary, as we all tend to judge the lower end of our children's primary schools by how nice the people are, and only realise what they haven't been taught when it is already getting rather late to do something about it. (My older child was in Y2 before I realised that the school's reading and spelling teaching was lamentable, and I am a fairly well-informed parent who recognised quickly that the problem was with the school and not the child. I know many parents lamenting their children's dyslexia who still don't realise that it was probably avoidable.)

Administration of tests to 4/5 yos is of course a challenge. But

(a) modern computer-based tests are quite accessible to the vast majority of children who will already have seen (and often played) tablet/PC/phone games

(b) they can be adaptive, using quite complex algorithms to determine which questions they use to refine the measure, so that even a teacher watching a child take the test cannot deduce their precise score

(c) the incentive to teachers is to under-report baselines, but it would take a degree of nastiness that I hope not too many are capable of to nudge a child away from the right answer towards a wrong answer

(d) I suspect that screening algorithms will be capable of picking up anomalous patterns of answers if teachers impersonate children and try to replicate their mistakes.

 So I think it will be possible to establish a worthwhile baseline test if these technical issues can be dealt with and if the temptation to use this as an accountability test for nursery classes can be resisted, at this would infallibly lead to nursery classes starting to teach to typical test items, thus undermining the value of the baseline."

Friday, 11 April 2014

Some thoughts on grief

Until it happened it didn't occur to me that our daughter would be stillborn. I'd worried about a difficult birth; 4-day labours; emergency c-sections; brain damage and so on. It didn't cross my mind that when we arrived at the hospital there'd simply be no heartbeat.

Stillbirth turns out to be relatively common - around 1 in every 200 births in the UK. This rate hasn't fallen in the UK for over 20 years despite significant improvements in other aspects of maternity care. As 90% of stillbirths have no congenital abnormality it should be possible to reduce the rate significantly with better screening.


I suspect part of the reason for the lack of investment in stillbirth research - and lack of media attention - is because it's very hard to talk about. The absence of the paraphernalia that usually accompanies a death reduces the number of opportunities to engage with friends and relatives - leaving instead a almost complete lack of activity in a household prepared for the exhaustions of a newborn. And unlike other deaths, where you can share stories about the deceased from happier times, there's no hook for positive conversations.

Which is why so many of the messages we've received contain words along the lines of "words are futile at this time" or "there are no words" or "I know nothing I can say can make anything better". Of course this isn't true. For me at least the hundreds of messages we've received have been very helpful in processing what's happened. Without them we'd have had almost no communication at all outside of our immediate families. And the box of cards we now have are pretty much the only thing we have to remember her by.


I've found the process of grieving much as one would expect - it comes in waves and there are increasingly long periods - hours at a time now - where I feel pretty normal (and then feel guilty for feeling normal). But everyone's grief is individual and there are some odd quirks which I think are less common. After the horror of the initial few days I've held it together pretty well. The only times I've really felt myself going to pieces was after someone has done me a significant and unexpected kindness. I don't know why - perhaps because it reminds of the enormity of what's happened?

The most noticeable thing has been the difference in the way my wife and I think about her. Because I never had the chance to meet her I think of her in terms of lost possibility; the girl - and woman - she could have been. My wife, though, had a physical relationship with her over many months - making the loss much more visceral. She thinks of her by the name we chose. For some reason I can't.


At some point in the future we'll be holding a fundraising event for Sands, the UK's stillbirth charity, in honour of our daughter and to help pay for research that will hopefully stop other families going through this.

Thursday, 3 April 2014

The worst few days of my life

As many of you know my wife, Linda, and I have been expecting our third child. On Monday afternoon Linda had a 38 week check-up and was told everything was fine. Later that evening she went into a normal labour and early on Tuesday morning we arrived at the hospital. When the midwife did the initial check she was unable to find the baby's heartbeat. After some Casualty-like scenes of panic a doctor confirmed the bad news. Shortly after our daughter was delivered stillborn. As yet the doctors are unable to establish a reason why this happened and in most cases like this they never do.

Needless to say we are heartbroken. The last few days have been the hardest of our lives. But we're very lucky to have our wonderful twins as well as an incredibly supportive network of family and friends. They will see us through this.

I'm writing this public note so that I don't have to tell everyone individually and so that people understand why I'm not returning calls, texts, emails and DMs at the moment. But I am very grateful to everyone who has already offered condolences and support. I'll be back in action soon.

9 things you should know about the new PISA "creative problem-solving" test

Today sees the launch of the first international test of "creative problem-solving". It is the latest addition to the suite of PISA tests run by the OECD which have become hugely influential in global education policy-making.

This test was taken by pupils in late 2012 at the same time as PISA tests in maths, science and reading but the results were held back for a separate launch. I was invited to a pre-embargo briefing yesterday and the information here is taken from a mix of the published reports and answers given by OECD experts at the briefing.

1. The purpose of the test was to measure students ability to solve problems which do not require technical knowledge. The PISA subject tests in maths, science and reading are also based around problem-solving but they do require knowledge in these subject areas (e.g. mathematical concepts and mental arithmetic). Examples of questions include working out which ticket to buy at a vending machine, given a list of constraints, or finding the most efficient place for three people to meet. Unlike the subject PISA tests it was completed on computers which allowed for more sophisticated interactive assessment.

2. Overall the results correlated fairly closely with the PISA subject tests. Unsurprisingly students who are good at maths problems are also good at ones involving general reasoning. The correlation with maths results was 0.8 and with reading was 0.75.

3. But England was one of the countries that did significantly better in this test than in the subject ones. It came 11th overall but the individual rankings are misleading. It makes more sense to think of clusters of countries that did about as well as each other. The leading group of seven consists entirely of Far East countries and jurisdictions. England is in the second group with countries that traditionally do well in PISA like Australia, Canada, Finland and Estonia. Then comes a third group made up other larger European countries and the United States. The countries below the OECD average are primarily smaller European countries and developing nations.

4. This is unhelpful for a number of the big narratives in English education policy. It undercuts the "England is falling behind in the world" narrative so beloved of right-wing newspapers. On a test of intellectual reasoning (which is what this is) our 15 year olds do as well as any other nation bar a small group of Far East jurisdictions (only two of which - Japan and Korea - are not cities or city states).

5. But it's also perhaps unhelpful for those who argue that our education system is dominated by an obsession with tests and narrow curriculum knowledge. It turns out we're actually pretty good at "21st century skills" already. Our students performed better in this test than you would expect based on their maths, science and reading ability. Likewise all the employers arguing that our system isn't delivering the kind of problem-solving skills they need should reflect on these results.

6. The reason England outperformed it's subject PISA scores is that students at the top end did better on the problem-solving test than on the subject ones. Students at the bottom end did no better. This suggests that we're doing something with our more gifted students that we're not doing with our weaker ones. In other countries - e.g. Japan - the opposite was true weaker students did better in problem-solving than subject tests but the strongest ones didn't.

7. In England there was no statistically significant gender difference in performance on this test (in maths and science boys do better; in reading girls do). Interestingly immigrants scored below non-immigrants which is a change from the maths and reading tests where there is no significant difference.

8. The domination of Far East countries puts pay to the notion that their success in PISA subject tests is somehow down to rote-learning or fact-cramming. It also puts pay to the idea that all Far East systems are the same. While Shanghai and Hong-Kong are still in the top group they did much worse on this test than would be expected given their stellar scores in maths, science and reading. Conversely Korea, Japan and Singapore all did better than would be expected.

9. While the test results are interesting they don't tell us why some countries do better than others. Both Singapore and Korea - who come top - have both tried over the past few years to add "21st century competencies" to their curricula to make them less purely focused on academic knowledge. But it's unclear whether their high scores in this test are due to that or because their traditional strength in the academic basics transfers to "creative problem-solving" tests of this type. The OECD presenters were clear that they thought it was impossible to teach problem-solving skills in the abstract without content, but they also felt it was possible to embed them in a knowledge-based curriculum.