Research

The Measurement Policy in Practice

The four-tier system on /legal/measurement-policy exists because the immersive industry has a measurement-theatre problem. This is the operational companion: what the tiers mean when we apply them to real work, and where our own case studies land.

The problem the policy solves

Immersive consulting has spent two decades selling outcomes it didn't measure. The decks are familiar. A VR module "increased retention 75%" with no baseline, no control, and a sample of 14. A projection installation "drove 2 million impressions" counting every passerby on a public street as an impression. A simulation "trained surgeons faster" with no comparable cohort and no statistical test.

None of those claims are fraud. They're the industry's normal grammar. The grammar still costs everyone trust. When the validated study eventually arrives, it doesn't get to sound different, because all the other studies already used the validated vocabulary.

We split the vocabulary. Four tiers, each with a rule, each appearing next to the number on the page.

The tiers, operationally

TierRuleVisualWhat it means in a meeting
VALIDATED n ≥ 30, p < 0.05, with control or pre/post baseline Signal green, no asterisk "We're willing to defend this in front of a reviewer."
DIRECTIONAL Real measurement, underpowered or no control Warn amber "The signal is real. The study wouldn't pass peer review."
REPORTED Public source or client-reported, not independently verified Neutral "The number came from somewhere credible. We didn't count it."
ANECDOTE Story, no number Italic, faint "Worth telling. Not data."

A claim that can't earn any tier doesn't ship. Not on the site, not in a deck, not in a proposal.

How our own case studies are graded

Walking through the live case studies on the site is the fastest way to show the tiers working against our own incentives.

Crash Course Engineering: REPORTED

The headline number on the Crash Course case study draws on PBS / Complexly viewership figures the production team published. Real numbers, large audience, public source. We didn't run the analytics. The tier reflects that honestly. Calling Crash Course "VALIDATED" would require us to have instrumented learning outcomes against a control group, which we didn't.

The temptation to upgrade is real. Big audience, prestigious brand, easy to imply rigour the engagement didn't include. The policy is the discipline that says no.

Polar Lab (PBS NOVA): REPORTED

Polar Lab shipped to broadcast and earned engagement metrics from PBS. Those numbers are REPORTED for the same reason: the audience figures came from the broadcaster's instrumentation, not ours, and we didn't run a controlled learning study. The Webby and the press coverage are real. They're industry-recognition signals. They're not effect-size measurements.

Dissolving Boundaries: ANECDOTE

Dissolving Boundaries at Nuit Blanche projected onto a 350×200 ft facade. People stopped. People filmed. The night happened. There's no controlled outcome to validate, and no platform-supplied number we'd defend as "impressions" without inventing a count. The honest tier is ANECDOTE, and the case study says so. Spectacle work mostly lives here. That's the nature of one-night cultural installations, and pretending otherwise insults the form.

The gap between "we measure" and "the measurement is validated"

Most of our case studies sit in REPORTED or DIRECTIONAL. That's not a confession. It's the truth of the field. Validated work requires the right study design, sufficient sample size, control or pre/post baseline, and the budget to do it. Most commercial engagements don't fund that. The policy makes the gap visible instead of papering over it.

VALIDATED work is the ambition. The Constellation tier is where we build the instrumentation that makes validated claims possible across multiple engagements. Atlas and Summit engagements usually leave DIRECTIONAL or REPORTED evidence, because that's what their budgets and timelines support.

What it costs us

Pitches. Reliably. A competitor saying "our VR training increased retention 75%" beats us in the room every time the buyer doesn't know to ask "validated by whom, against what control, with what n." We've lost work to that competitor and we'll lose more.

We're betting the buyers who do ask the question are the ones worth keeping, and that the ones who don't ask today will start asking once enough validated work exists to reset the grammar.

What it gives us

A clean conscience and a clean legal posture. When a client cites a number from our site, they can see exactly what kind of number it is. When a regulator or academic partner asks "how do you classify your evidence," we hand them this page. When a future engagement produces a validated result, the VALIDATED tag will mean something, because we didn't burn it on directional work.

The policy is unglamorous. It's also the only honest answer we found.

How a claim gets tiered, step by step

The grading is mechanical, not editorial. Every quantitative claim goes through the same four questions before it ships.

  1. Where did the number come from. Our instrumentation, the client's instrumentation, a third party, or a public source. If we can't name the source, the claim doesn't ship.
  2. What was the sample size. If n < 30, VALIDATED is off the table regardless of how clean the study was. The number can still earn DIRECTIONAL if the rest of the design was sound.
  3. Was there a control or pre/post baseline. No control and no baseline means the claim is at best DIRECTIONAL, more often REPORTED.
  4. Who counted. If we counted, the claim can earn any tier the rest of the criteria support. If someone else counted, ceiling is REPORTED.

The grading runs against every claim, including the ones we'd prefer to grade higher. The point of mechanical grading is that the grader doesn't get to want the answer.