The Wiliam Test: Does Your Feedback Actually Drive Student Action?

In partnership with

Watch a boy get an essay back. Not the essay itself, but the boy. I bet he flicks straight past the comments to the grade, his face does whatever the grade deserves, and then he leans over to see what his mate got. The comments that took your precious time might as well be wallpaper.

I teach at a boys' college in NZ, so I get a concentrated dose of this. But it isn't a boys only thing. Ruth Butler showed it back in 1988: students given comments on their work improved; students given a grade, or a grade with comments attached, didn't. Sit with that for a moment. The comments were still there but the grade switched them off. NCEA has its own dialect for the same move. Got Achieved? Sweet. Stop reading.

I've been chewing on this since I first read Inside the Black Box, which is longer ago than I care to admit, and Dylan Wiliam has had a permanent spot on my shelf ever since. So Kate Jones's Feedback: Strategies to Support Teacher Workload and Improve Pupil Progress (Hachette Learning, 2024) was always going to end up in my reading pile. I've spent a while working through it slowly, with notes. This post is the first thing those notes produced.

This a great book on Feedback

The line I've been thinking about most

Jones builds the whole book on top of one line from Wiliam and his long-time collaborator Siobhan Leahy:

❝

The only thing that matters with feedback is the reaction of the recipient. ... Feedback that the student does not act upon is a waste of time.

I think this should be called the Wiliam test, because you can run it over any feedback you've ever given. Did the student do something with your feedback? That's the test.

It sounds too simple to be worth naming. Then you run it honestly and it starts knocking things over.

Your prompts are leaving out 80% of what you're thinking.

When you type a prompt, you summarize. When you speak one, you explain. Wispr Flow captures your full reasoning — constraints, edge cases, examples, tone — and turns it into clean, structured text you paste into ChatGPT, Claude, or any AI tool. The difference shows up immediately. More context in, fewer follow-ups out.

89% of messages sent with zero edits. Used by teams at OpenAI, Vercel, and Clay. Try Wispr Flow free — works on Mac, Windows, and iPhone.

Start flowing free

I use Wispr Flow to help get my thoughts on the page more quickly… Well worth using in my opinion - Eliot

Is it just a ritual?

Jones opens with a confession I suspect most of us could co-sign, including me. Earlier in her career she'd return marked books and tell the class to read the comments before moving briskly on to the next thing. The marking was conscientious but the acting-on never happened, because nothing in the lesson asked for it.

That routine isn't hers alone - it's the one we inherited. Comments down the margin. A target at the bottom. Books handed back, "have a read of your feedback", next topic. And a few weeks later the next set of books shows the same errors, faithfully preserved. If you teach an essay-heavy subject you've lived some version of this, probably from both sides of the desk.

Run the Wiliam test over that routine and the uncomfortable part isn't the marking. It's the schedule. There was never a moment where acting on the feedback was the actual task. Wiliam again, but blunter:

❝

Don't ever give feedback to students unless you make the time, the next time they are in the classroom with you, for them to respond to that feedback.

No time for learners to respond, no reaction. No reaction and the test fails, however good the comments were. The marking wasn't wrong, It was just expensive.

Marking policies skipped the research

The feedback literature has a translation problem. The headline from John Hattie's Visible Learning was that feedback is one of the highest-impact influences on learning, and that headline went everywhere. Schools rewrote marking policies around it. More comments, more often, in more colours. This is the era that gave us triple-impact marking and the verbal feedback stamp, marking designed to be seen by an auditor rather than used by a student. I emigrated from the UK to NZ in 2012 and missed that era, though former colleagues' descriptions of it left me in no doubt how lucky I was to have got out before experiencing it.

The part that got lost sits in a paper Hattie wrote with Helen Timperley two years earlier, The Power of Feedback. About 38% of the feedback effects in the studies they reviewed were negative. Not neutral. Negative, as in the students would have done better without it. Feedback is high-variance, and badly designed feedback makes students worse.

It took until 2021 for the EEF's feedback guidance to say the obvious thing in print:

❝

In many cases, written 'marking' has often been conflated with 'feedback' and may indeed have unhelpfully supplanted other forms of feedback.

Marking is one method of feedback. It isn't a synonym for it, and it's often not even a good example of it. The Wiliam test is the cheapest tool I know for telling the two apart.

What the Wiliam test changes

It retires the guilt. Most of us were trained to feel that anything less than detailed marking on everything is a corner cut. The test reframes it. Detailed marking that nobody acts on isn't feedback, it's a ritual, and you're allowed to retire a ritual.

It puts the design problem on my desk. If the feedback didn't land, the lazy question is "why don't they care?" The better question is "what about the design made acting on it unlikely?" Only one of those questions has an answer I can do something about by Thursday.

It promotes the strategies that scale. Whole-class feedback. Wiliam and Leahy's detective strategy, where you tell a student there are two errors in the paragraph and they have to find them. Audio comments a student can replay. A redraft of one boxed-off section rather than a vague instruction to "improve". None of these are shortcuts. They're designs that pass the test more often than thirty individually crafted sets of margin comments. A teacher at my school gives regular oral feedback via OneNote on student English essays and it works.

What it doesn’t say

It doesn't say give less feedback. Jones is unambiguous on this. Students need feedback, plenty of it. The point is to give the kind that changes what they do next.

It doesn't say marking is bad. Some work deserves the full treatment: exam-condition mocks, practice essays in the run-up to externals, anything where authentication needs my eyes on every script. The claim is narrower - most work in a normal week doesn't justify it.

It doesn't hand you a frequency. Wiliam is explicit that over-specifying is how schools wreck this. A policy that mandates two marked pieces a fortnight mostly generates evidence of compliance, not feedback.

And it isn't a quick fix. The strategies that pass the test take real training to do well. The UCL Verbal Feedback Project described moving from written marking to verbal feedback as "a complex series of refinements to practice", not a switch you flip. Schools that flip it anyway, without the development time, mostly produce what the literature calls lethal mutations (as a biologist I love this incursion from genetics!): the shape of the strategy with none of its working parts.

Where am I going with this…

This is the first post in a series I'm currently calling Feedback that doesn't kill your weekend, drawn from the Jones book and the post-2021 literature it sits inside. Coming up, roughly fortnightly: how marking got conflated with feedback, and how to prise them apart. The yellow box, a selective marking method that fits inside a working week. The four-part case for keeping numbers off practice work. Whole-class feedback, and the claim that it turns marking into planning. And AI-assisted feedback, with the guard rails that stop it eroding the judgement it's supposed to support.

Each post will end with one thing you can try in your next lesson, and I'll say which tools cost money and which don't. NCEA notes throughout, because nearly all of this literature is British and only some of it survives the trip. BUT it should still be relevant wherever you are in the world…

For my own classes, the plan for the rest of this term is a whole-class first pass on every piece of written work set, a boxed redraft, and a hinge question to check whether the misconception actually moved. I'll report back on what survives contact with my ākonga (that’s a te reo Māori word meaning students).

If you want the series in your inbox and to hear more about my thoughts on learning as well as tech to support learning, the signup is at the bottom of the page.

So, the one to try this week is the test itself. Before you write anything on a student's work, decide when they'll act on it. If there's no when, don't write it yet.

And my honest read after a month in this literature: it's more workable than I expected. Better feedback in less time isn't a fantasy, and most of it needs no new technology. It needs one question, asked routinely, of everything we give back: did the student act on it?

If not, there's your redesign.

Until next time, Ngā mihi

Eliot

The Wiliam test