Richard Hill

Judgement for AI-mediated work


Judgement as an Operating Model Problem in the Age of AI

Most organisations treat judgement as a personal attribute. You hire “good people”, you develop leaders, you run training on critical thinking, you encourage “better decisions”. When something goes wrong, you look for a flawed individual judgement call.

That framing is convenient, and often wrong.

In AI-mediated work, judgement is less an individual virtue and more an operating model property. The quality of judgement you get is shaped by how work is organised: who has decision rights, how exceptions are handled, what evidence is required, what gets logged, what gets escalated, and what counts as “done”. AI doesn’t replace judgement. It changes how judgement failure happens, and it makes operating-model weaknesses visible faster and more painfully. 

What the operating model lens reveals

An operating model is the practical machinery of “how we run this place”: accountabilities, workflows, governance, and the information flows that coordinate action. The operating model lens is ruthless, because it assumes that outcomes are generated by systems, not by intentions.

That matters because AI is now woven into the operating model in three ways:

  1. It accelerates throughput. Drafts, summaries, analyses, and responses appear instantly.
  2. It increases plausibility. Outputs sound coherent even when they are wrong, incomplete, or mis-scoped.
  3. It blurs the boundary between drafting and deciding. Text is no longer a bottleneck, so the natural pause where judgement used to sit gets squeezed out. 

If your operating model assumed that drafting time created a moment for review, you just lost that safety feature. Not because anyone is reckless, but because the system no longer generates friction by default.

Judgement failures are usually “process failures wearing a human mask”

In post-mortems, organisations often say “we made a poor decision”. What they frequently mean is one of these:

  • We didn’t know who was allowed to decide.
  • We didn’t define what evidence was required.
  • We didn’t distinguish routine from exception.
  • We didn’t create a checkpoint where dissent could surface.
  • We didn’t log the rationale, so we can’t learn or defend it.
  • We didn’t notice a commitment had been made until the customer, regulator, or employee treated it as binding.

Those are operating model failures. AI just makes them faster and easier to trigger.

Classical judgement theory already hints at this. Under bounded rationality, decision quality depends on process appropriateness more than optimisation fantasies. Sensemaking emphasises that meaning is constructed socially and continuously, not discovered fully-formed. AI intensifies both realities: it participates in attention and framing, and it pushes organisations toward premature closure when outputs feel “good enough”. 

Four operating model questions that decide your judgement quality

If you want judgement to survive contact with ubiquitous AI, the operating model has to answer four questions explicitly, in writing, in workflows.

1) Where do decision rights sit, and how are they triggered?

The most common AI-era failure mode is decision-rights ambiguity. An AI-assisted email goes out promising a delivery date, a refund, or a concession. Nobody thought they were “making a decision”. They thought they were drafting.

Operating model fix: define decision rights by decision type (refunds, contract language, pricing exceptions, patient risk thresholds, disciplinary records, etc.), and embed triggers. “If X, escalate to Y.” Not as a cultural norm. As a routing rule.

This is the difference between “we trust people” and “we trust the system we’ve designed people to operate within”.

2) What counts as sufficient evidence, and who verifies it?

AI increases the risk of evidence laundering: plausible claims appear in polished prose, and the organisation treats them as if they were checked. This isn’t always hallucination. It can be selective summarisation, missing base rates, or silent assumptions.

Operating model fix: specify evidence requirements for high-impact decisions. What sources are acceptable? What must be verified? What uncertainty must be disclosed? Who signs off?

This is governance, not pedantry. In the terms of heuristics and biases research, you’re managing systemic error pathways rather than trying to “train bias out” of individuals. 

3) How do we handle exceptions, not just the happy path?

AI is great on the happy path: standard queries, routine cases, predictable customers, typical students, normal demand patterns. Executive pain lives in exceptions: novelty, edge cases, moral ambiguity, regime shifts.

Naturalistic decision-making research is blunt here: experts don’t optimise; they recognise patterns and simulate consequences. AI can help by acting as a cognitive simulator (surfacing edge cases, second-order effects), but it can also drown teams in plausible alternatives with no grounding. 

Operating model fix: define exception-handling explicitly. What constitutes an exception? What is the escalation path? What is the “stop the line” rule? Where must a human deliberately re-frame the situation rather than accept the AI’s framing?

If you can’t state your exception model, your organisation is outsourcing it to the path of least resistance.

4) What gets logged so we can learn, calibrate, and defend?

AI-mediated work increases epistemic risk: uncertainty about what was known, what was assumed, what was generated, and who endorsed it. Without logs, you can’t do learning, and you can’t do accountability. You get blame, not improvement.

Operating model fix: implement judgement traceability proportional to risk. Not “log everything” (you’ll create noise and resentment), but log the rationale for consequential decisions and the provenance of key claims. At minimum: decision owner, inputs, assumptions, dissent surfaced, and the reason for the final call.

This aligns with the practical wisdom point: responsibility remains human, and executives become responsible for the cognitive environment that produced the decision, not just the final act. 

The operating model reframes what “AI readiness” really means

A lot of AI readiness talk focuses on literacy, tooling, and prompt competence. Those are useful, but they’re downstream. A technically fluent organisation can still be judgement-poor if its operating model creates accidental commitments and untraceable responsibility.

From the operating model lens, AI readiness is closer to judgement readiness:

  • Can we name who decides what?
  • Can we tell “draft” from “decision” in our workflows?
  • Can we show what evidence we relied on?
  • Can we prove we considered counter-arguments?
  • Can we detect when the model is outside its domain of competence?
  • Can we learn when we were wrong?

If those answers are fuzzy, AI will amplify the fuzziness, then external reality will punish it.

A concrete way to start: map “decision surfaces”

One practical move is to identify your organisation’s decision surfaces: points where text, analysis, or recommendations cross a boundary into an external commitment or irreversible internal action.

Examples:

  • Customer emails and proposals
  • Contract amendments
  • HR notes and performance documentation
  • Clinical pathways and risk stratification thresholds
  • Financial approvals and credit decisions
  • Public statements and investor communications

For each surface, ask:

  • What decisions are implicitly made here?
  • Who owns them?
  • What evidence is required?
  • What’s the exception rule?
  • What is the minimum log?

This is boring work. It is also exactly the kind of boring work that prevents spectacularly expensive failures.

The core claim

In AI-mediated organisations, judgement quality is not primarily a training issue. It is an operating model issue.

AI makes drafting cheap and fast. That’s the feature. The cost is that the organisation loses the natural friction that used to protect decision points. If you don’t replace that friction deliberately with decision rights, evidence rules, exception handling, and traceability, you won’t get “augmented intelligence”. You’ll get faster confusion with nicer wording.

Executive value, then, becomes the governance of this machinery: designing an operating model where judgement remains visible, owned, and defensible even when cognition is distributed across humans and machines.