The parent's guide to CAT4: levels, scoring, and how to actually prepare your child

The Cognitive Abilities Test, fourth edition (CAT4), is a standardised reasoning assessment used by a large share of UK primary and secondary schools to measure how a child thinks rather than what they have been taught. It is published by GL Assessment, takes around two hours split across two or three sittings, and produces a numeric snapshot of a child’s verbal, quantitative, non-verbal, and spatial reasoning.

Most parents meet CAT4 in one of three ways. Your child sits it in Year 4, 5, or 7 and you see the report at a parents’ evening; the school uses the results to decide sets or streams for the year ahead; or you are looking at independent or selective-grammar admissions and the test (or one very like it) is part of the entrance process. None of these alone is a catastrophe to be panicked about, and none of them is a reason to run a high-pressure tutoring campaign — but the results do influence what your child’s school thinks they can do. Worth understanding.

This guide covers what the test actually measures, how the levels and scores work, what schools do with the results, and how to practise sensibly without making your child anxious.

What CAT4 actually measures

CAT4 has four batteries — the official word for groups of related sub-tests. Each battery has two sub-tests, giving eight in total:

Verbal reasoning. How well the child reasons with words and verbal concepts. The two sub-tests are verbal classification (“which word goes with these three?”) and verbal analogies(“cat is to kitten as dog is to ___”).
Quantitative reasoning. Reasoning with numbers and numerical relationships — not arithmetic skill in the sense of fast mental maths, but pattern-spotting. The sub-tests are number series (continue the sequence) and number analogies (find the rule that turns 3 into 9 and 5 into 25, then apply it).
Non-verbal reasoning. Reasoning with shapes and patterns, deliberately language-light so it works for children with English as an additional language or weaker-than-average literacy. The sub-tests are figure classification and figure matrices.
Spatial reasoning. Mentally manipulating shapes — rotating, folding, completing partial pictures. The sub-tests are figure analysis (paper-folding) and figure recognition (find the hidden shape).

The four batteries are deliberately broad. A child who is verbal-strong but spatial-weak (common in fluent readers who find Lego frustrating) shows up clearly in the profile. So does the reverse: a child whose written work undersells how sharp they actually are. That profile, more than any single number, is what makes CAT4 useful to teachers.

Levels A–F: which one is your child sitting?

CAT4 ships at seven levels with overlapping age ranges. Schools pick the level that suits the year group:

Level X / Pre-A — Year 3 (ages 7–8).
Level A — Year 4 (ages 8–9).
Level B — Year 5 (ages 9–10).
Level C — Year 6 (ages 10–11).
Level D — Year 7 (ages 11–12).
Level E — Year 8 (ages 12–13).
Level F — Year 9 (ages 13–14).

The questions get harder up the levels, but each level is normed against a national sample of children of the same age. That means a Standardised Age Score of 110 on Level B and the same 110 on Level D are comparable in what they say about the child relative to peers — you cannot, however, compare a raw mark of 28 across levels, because the test was different.

How CAT4 is scored: SAS, NPR, stanines

The school’s report typically quotes three scores per battery, plus a fourth “mean” score across all four.

Standardised Age Score (SAS)

The headline number. SAS has a mean of 100 and a standard deviation of 15, adjusted for the child’s exact age in years and months. Roughly two-thirds of children score between 85 and 115; the “average range” schools quote on reports is usually 89–111. An SAS of 130 means your child is somewhere around the top 2–3% of children of the same age; an SAS of 70 means the bottom 2–3%.

National Percentile Rank (NPR)

How the child compares to a national cohort, expressed as a percentile. NPR 75 means “your child scored higher than 75% of the children we normed against”. It’s the same information as SAS, just rescaled — SAS 110 is roughly NPR 75; SAS 115 is roughly NPR 84; SAS 100 is NPR 50.

Stanines

A 1–9 banding (5 = average, 1 = lowest, 9 = highest) used when schools want to talk about broad ability bands rather than precise scores. A stanine of 7–9 typically maps to the top quarter of children; 1–3 to the bottom quarter.

Two practical points worth knowing. First, schools usually round SAS scores into bands when allocating sets, so an SAS of 112 and 116 often land in the same group. Second, the four battery means are more diagnostic than the overall mean — a child with 115/115/85/85 across the four batteries has a very different profile from one with 100/100/100/100, even though both averages land at 100.

What schools actually do with the results

CAT4 is one input among many. Schools pair it with teacher assessment, key-stage results, and ongoing classwork. In practice, the report drives roughly five things:

Set or stream allocation. Most common in Year 7 entry, sometimes earlier for maths and English in primary. Where setting happens, CAT4 is rarely the only factor but it is usually one of them.
Identifying able pupils. A high SAS, particularly in non-verbal or spatial reasoning, can flag children whose written work doesn’t reflect how they’re thinking. Those children get pushed onto more challenging work.
Identifying support needs. A wide gap between batteries can hint at a specific learning difficulty worth investigating — verbal much lower than non-verbal can suggest dyslexia, for example. The CAT4 report does not diagnose anything; it points teachers at where to look.
Predicted outcomes. GL Assessment publishes indicative GCSE predictions tied to CAT4 profiles. Schools treat these as ceilings to aim above, not floors to settle for, but they do appear in target-setting conversations.
Selective and independent admissions. Where CAT4 is part of the application, schools have their own thresholds — published or otherwise — and use them alongside interviews and other assessments.

What CAT4 is not: a definitive verdict on your child’s intelligence, a predictor of life outcomes, or something a single bad morning will permanently tarnish. A cold, a row at breakfast, or unfamiliarity with the question format on test day all show up in the score. Reasonable schools know this.

How to practise without anxiety

The honest version: a child who has seen the question types before, knows what “figure matrices” means without having to read the rubric, and is comfortable working under light time pressure does noticeably better than one for whom the test is a complete surprise. That’s a familiarity gap, not an ability gap, and it’s worth closing. What does not work is treating CAT4 like a GCSE.

Little and often. Twenty minutes twice a week, for a few weeks, beats a single long Saturday session. Fatigue collapses reasoning performance fast.
Rotate sub-tests. All eight, not just the two your child enjoys. Verbal-confident kids almost always have a non-verbal or spatial blind spot worth nudging.
Start without a timer. Get them comfortable with the question types first. Bring the timer in only once they’re solving D1-level problems consistently.
Explanations matter. Wrong answers are where the learning lives. A practice tool that just says “that’s wrong, try again” is worth less than one that walks through why.
Don’t frame it as the test. “If you do well at this you’ll do well in the real one” is the most reliable way to make a child freeze on test day. Frame practice as the puzzles, not the prep.

What to look for in a practice tool

Five criteria, in roughly the order they matter:

All four batteries. Not just the easy-to-author ones. If a tool is mostly verbal because verbal questions are quick to write, half the test is missing.
Calibrated difficulty. A genuine spread across difficulty bands, not 200 D1 questions in different colours.
Honest scoring. Avoid tools that congratulate your child on 99% accuracy at every level — that inflation is worse than no scoring at all, because it sets expectations the real test won’t meet.
No ads, no third-party trackers. Children aren’t a marketing surface, and child practice data shouldn’t end up in an ad-network database.
Deletable data. If you can’t delete the account, you don’t actually own it.

That’s the brief we set ourselves when building Puzzitron: all four batteries, calibrated difficulty, honest scoring, no ads, no trackers on the part the child uses, and one click to delete everything. It’s free during the beta.

Useful next reads

Non-verbal reasoning explained — the four NVR sub-tests, with worked examples a 9-year-old can follow.
11+ in 2026–27: a parent’s guide by region — the 11+ exam differs by LEA. What to expect in each selective-school heartland.
CAT4 practice on Puzzitron — the practice game itself.