Richard Hill

Judgement for AI-mediated work

Category: Adoption

  • When Trustees Must Push Back on AI Leadership Narratives

    Much of the current leadership commentary on AI is directionally sound. It emphasises that AI should be advisory rather than authoritative, that leaders should provide context rather than instructions, and that human responsibility does not dissolve simply because drafting has been automated. These are sensible claims. They are also, from the perspective of trustees and non-executive directors, insufficient.

    Boards are not responsible for endorsing the moral centre of a narrative. They are responsible for the operating conditions under which that narrative remains true. In AI-mediated work, the failure mode is rarely dramatic at first. It is procedural and quiet. When text becomes cheap, the organisation can move decision-making into drafting without noticing. A polished paragraph begins to function as a commitment. An AI-generated summary starts to substitute for a review of evidence. A recommendation becomes a default because it is convenient, not because it is justified. In this environment, the distinction between “draft” and “decision” becomes a governance boundary, not a matter of good personal habits.

    This is where trustees should press. Not because the narrative is wrong, but because it is incomplete in precisely the areas that define governance: decision rights, risk ownership, control design, and assurance. The core trustee question is always the same: what are we relying on here, and how do we know it is working?

    Guardrails are asserted, not operationalised

    Leadership writing often calls for “guardrails” in the form of values and decision rights. The difficulty is that guardrails are not a statement of intent. They are operational boundaries that hold under pressure. Values are necessary but not sufficient. Decision rights are decisive only if they are made explicit in workflows, not merely implied by an organisational chart.

    Trustees should treat “guardrails” as a claim that requires demonstration. Which decisions are currently being influenced by AI, and where are those decisions recorded as such? Who is the accountable owner for each? Where is the transition from drafting to deciding made explicit? What triggers escalation, and to whom? How are exceptions handled, and how is exception-handling reviewed?

    Without concrete answers, the organisation has not introduced guardrails. It has introduced language about guardrails.

    In practice, operationalisation begins with a decision inventory: a finite list of recurring decisions in which AI is used or will soon be used. The list is typically shorter than people assume, especially if it is constrained to decisions that create obligations, exposures, or material impacts. It then requires a decision rights map that specifies who drafts, who checks, who decides, who can override, and who must be informed. This is not procedural theatre. It is the minimum structure required to prevent accidental delegation, which is the characteristic governance hazard of AI-assisted drafting.

    Trustees should also focus attention where governance actually lives, in exceptions. Routine decisions tend to look coherent even in poorly governed systems. It is at the edges, unusual cases, time pressure, emotional friction, incomplete information, that accountability becomes unclear. If the organisation cannot explain how exceptions are labelled, owned, rationalised, and reviewed, then decision rights remain largely nominal.

    Risk is discussed implicitly, but not mapped into controls and assurance

    A second weakness in leadership narratives is the absence of a risk taxonomy and the corresponding absence of a control map and assurance story. Responsibility is asserted, but the organisation is not described as a system of risks, controls, and tests. Trustees cannot discharge their responsibilities in that register.

    AI-mediated work tends to change risk profiles in consistent ways. Confidentiality risk rises because staff can unknowingly disclose sensitive information through prompts or outputs. IP risk rises because commercially valuable content can be reproduced, shared, or stored in ways that were not previously plausible at scale. Regulatory and legal exposure can increase because outputs can contain ungrounded assertions, discriminatory language, or defamatory implications, particularly in high-stakes contexts such as HR, safeguarding, compliance, and client communications. Auditability deteriorates if the organisation cannot reconstruct who approved what, on what evidence, and under which conditions. Operational dependency grows as tools become embedded before they are treated as dependencies with resilience requirements. Model behaviour can drift as systems update and workflows evolve, undermining implicit assumptions about reliability.

    The trustee posture here should be clear. The question is not whether management has considered these risks in principle. The question is whether risks are mapped to concrete use cases and converted into preventive and detective controls with named owners, and whether there is an assurance mechanism capable of validating that the controls are actually functioning.

    An organisation does not need an elaborate governance programme to start. It does need to demonstrate basic discipline. For each high-impact AI-assisted use case, management should be able to state what can go wrong, what controls prevent it, what controls detect it, who owns those controls, and how they are tested. Trustees should not accept “we are careful” as an assurance story. Nor should they accept “nothing has gone wrong so far” as evidence of safety. Absence of detected failure is not evidence of a robust control environment.

    Judgement is treated as decisiveness, rather than decision mechanics

    A third weakness is the tendency to equate judgement with decisiveness and accountability. This association is understandable. Many organisations do need decisiveness. However, in AI-assisted contexts, decisiveness can become a mechanism for fast error. The very quality that is celebrated in leadership narratives can be amplified into a liability if it is not constrained by decision mechanics.

    Judgement, in governance terms, is not a temperament. It is a design property of decision-making under uncertainty. It depends on evidence thresholds, dissent channels, explicit sign-off points, escalation rules, and learning loops that produce system change rather than mere reflection. It depends on distinguishing reversible decisions from irreversible ones, and on ensuring that the organisation does not mistake persuasive language for justified commitment.

    AI increases the plausibility and fluency of drafts. That is precisely why trustees should insist that the organisation strengthens its decision mechanics at the points where commitment is made. If the organisation cannot identify those points, or cannot describe the evidence discipline that governs them, then it is likely that decision-making is already drifting into drafting.

    The “only humans can” obscures the control problem

    A fourth weakness is the rhetorical neatness of claims about what AI cannot do: it cannot set aspirations, cannot create truly new ideas, cannot take responsibility. These claims may be defensible philosophically. Trustees should not anchor on them operationally.

    The practical governance question is not whether AI can do something in principle. It is how AI is allowed to shape organisational decisions, and how the organisation prevents unacknowledged delegation. AI can influence organisational direction without possessing values. It can shape agendas by surfacing certain issues and suppressing others. It can shape options by generating some alternatives more readily than others. It can shape framing by presenting trade-offs in ways that nudge preferences. These are not abstract concerns. They are mechanisms through which AI can affect judgement.

    The trustee implication is straightforward. Governance cannot rely on metaphysical reassurance. It must rely on boundary design. Where may AI propose options? Where may it draft language? Where may it summarise evidence? Where is it prohibited from making recommendations without human verification? Where must a human explicitly attest that they reviewed underlying evidence rather than simply approving a draft?

    In an AI-mediated environment, governance requires friction at commitment points. Comforting narratives reduce friction. Trustees should be explicit about this trade-off, and resist the temptation to treat human exceptionalism as a substitute for operational control.

    Skills-based hiring is not a governance improvement unless validity is demonstrated

    Finally, the “paper ceiling” point, the claim that organisations should reduce reliance on credentials and adopt skills-based hiring, is socially important and potentially valuable. It is not, in itself, a governance improvement unless it is treated as a selection system that must be validated.

    Trustees should ask a simple question: what evidence shows that the proposed method predicts performance and reduces bias? Without validation, an organisation can replace one unfair filter with another, and make it harder to detect because the new filter is presented as progressive and modern.

    Audition-style selection can introduce its own biases through unequal access to preparation time, familiarity with the cultural norms of performance, and variability in evaluation. It can become inconsistent unless inter-rater reliability is tested and the audition tasks are designed and reviewed with the same seriousness as assessment in education. AI complicates this further because candidates can use AI in preparation or during the audition itself. If AI use is permitted, what competencies are being tested? If it is prohibited, how is enforcement designed without introducing new inequities? These are governance questions because they concern predictability, fairness, defensibility, and organisational reputation.

    The trustee move is to translate narrative into mechanism

    Across these five weaknesses, a single pattern recurs. Leadership narratives describe intent. Trustees must insist on mechanism. That means translating the language of judgement into decision rights and decision logs, converting risk awareness into control mapping, and converting learning culture into a cadence that produces measurable change.

    The central governance risk in AI-mediated work is not “bad AI” in the abstract. It is the gradual relocation of commitment into drafting, and the diffusion of accountability that follows. Trustees should therefore ask management to show where the boundaries are, who owns them, and how they are tested. If those questions are answered with clarity, the leadership narrative becomes more than a narrative. It becomes a governable operating model.

    That, ultimately, is what boards require: an organisation that can reconstruct who decided what, on what basis, with what safeguards against accidental delegation, and with what mechanisms for correction when judgement proves wrong. In an environment where language is cheap, making judgement visible is not a stylistic preference. It is a governance necessity.

     

  • Up tempo work

    Generative AI has quietly changed the tempo of work. Not in the headline places. In the boring places. Email, agendas, briefing notes, drafts of policies, draft replies to customers, draft performance notes. Stuff that used to take just enough effort to force a pause. 

    Now the pause is optional. That sounds like a productivity win. It is, sometimes. It’s also a governance problem wearing a productivity moustache.

    Because when drafting becomes effortless, organisations start committing to things without noticing. The thing that used to be “a draft” becomes “the decision”, because it reads cleanly and moves fast. 

    The claim

    The biggest leadership risk in the AI era is not that AI will make leaders obsolete. It’s that AI will make commitment too cheap, and organisations will confuse fluent drafting with actual decision-making.

    What would change my mind? Evidence that teams using AI heavily can consistently show (a) clear decision rights, (b) reliable escalation paths for exceptions, and (c) an audit trail that explains who owned what when it mattered, without slowing everything to a crawl. Not a policy. Actual practice.

    What I think is going on

    The current leadership narrative goes something like: AI can draft, but it can’t lead. Leaders must provide context, set guardrails, build trust, show judgement, and so on. 

    All true, in the abstract. But it misses the mechanics.

    AI doesn’t “replace leadership”. It changes the surface area of leadership. It pushes leadership into thousands of micro-moments, distributed across the organisation, where people are generating text and making commitments at speed. And those micro-moments are exactly where decision rights usually get fuzzy.

    So the right question isn’t “Can AI lead?” The question is: Where are decisions being made by accident, because text became cheap?

    The part people get wrong

    A lot of writing about AI leadership leans on “guardrails (clear values and decision rights)” as if saying it makes it real. 

    But guardrails are not values on a slide. Guardrails are a working control system:

    • which decisions exist (not “be responsible”, actual decisions)
    • who owns them
    • what counts as an exception
    • what must be escalated
    • what evidence is required before committing
    • how you find out when people bypass the route

    If you can’t answer those in plain English, the “guardrails” are vibes. Vibes do not survive contact with the inbox.

    A cleaner mental model

    McKinsey frames a shift from “command” to “context”.  I mostly agree, but I’d sharpen it:

    Leadership is moving from “deciding” to “designing decision conditions”.

    That means your job is to design the conditions under which other people, often using AI, can make decent calls under time pressure, without turning the organisation into a liability farm.

    Concrete example: customer support.

    AI helps a support agent draft a reply in 30 seconds. The model is good at sounding helpful. It will often over-promise because over-promising sounds helpful. A human who’s tired, new, or keen to close tickets can hit send.

    Now you’ve got an implied contract. Delivery teams inherit a mess. Finance gets dragged into refunds. Nobody can say whether this was an authorised exception or an accidental commitment.

    The fix isn’t “tell agents to be careful”. The fix is to explicitly separate:

    • drafting authority (anyone can draft),
    • commitment authority (only named roles can approve terms, money, timelines, exceptions),
    • release control (what must be checked before “send”, and who checks it).

    That’s governance. Not glamorous. Very effective.

    Judgement is not a personality trait

    McKinsey says leaders must demonstrate judgement, aligning choices to values, because AI is advisory not authoritative. 

    Yes. But “judgement” as a leadership trait is too squishy to operate at scale. I used to treat judgement as something you either have or you don’t. Now I think judgement is a system property as much as a personal one.

    Judgement shows up in:

    • what evidence is required before acting
    • whether uncertainty is made visible or papered over
    • how exceptions are handled
    • whether reversals are allowed without punishment
    • whether you can trace a decision back to a person, a rationale, and a timestamp

    If your operating environment rewards speed and punishes hesitation, you’ll get confident nonsense. AI just helps you generate it faster.

    Creativity, but make it accountable

    McKinsey argues leaders must design for nonlinear outcomes, not “20 percent better” but “10 times better”, and that humans must frame the problem, invite dissent, and hold the creative line. 

    Again, broadly right. Here’s the catch: AI makes it easy to produce ten options, which can create the illusion of creativity while reducing actual thinking. You get a pile of plausible outputs and nobody wants to be the boring person who asks, “What are we optimising for?”

    So I treat creativity work with AI like this:

    1. Write the brief like a contract. What is in scope, out of scope, what constraints are real, what success looks like, what failure looks like.
    2. Force one hard trade-off. Speed vs accuracy. Cost vs user harm. Personalisation vs privacy. Pick one. Make it explicit.
    3. Require a dissent paragraph. Not “risks”, a genuine counter-argument. If the best dissent you can write is weak, you probably don’t understand the space.
    4. Name the decision owner. The person who is on the hook when the shiny idea breaks.

    That’s how you get novelty without random motion.

    Where this breaks

    A few objections are fair.

    “This is too heavy for small teams.”

    If you try to build a full enterprise control framework, yes. But decision rights can be lightweight. A one-page “commitment map” is often enough to stop the worst mistakes.

    “We move too fast to add process.”

    You’re already paying for process. You’re just paying after the fact, in rework, customer fallout, HR pain, and fire drills. The question is where you want to spend your admin budget: before or after damage.

    “But leaders do need softer skills, trust, empathy, learning culture.”

    Agreed. The document makes a strong case for learning loops like premortems and after-action reviews.  I’m not arguing against the human stuff. I’m arguing that the human stuff fails without mechanics. Trust doesn’t scale by declaration. It scales when people can predict how decisions get made and how exceptions are handled.

    “AI tools can be configured to prevent this.”

    Sometimes. But configuration is still a governance choice. Who decides the rules? Who can override? What gets logged? Same problem, new wrapper.

    What I’d do if I were responsible

    • Map “commitment moments”. Where can someone, with a draft, create an obligation? Email, proposals, HR notes, customer replies, invoices, policy statements, procurement requests.
    • Define three decision classes.
      1. routine, can be auto-approved
      2. exception, needs named sign-off
      3. high-stakes, needs a second human and a record
    • Write a “draft vs decision” rule into workflows. Not training slides. Actual steps. If it matters, it gets reviewed.
    • Require minimal decision logs for exceptions. Two minutes, not a dissertation: what changed, why, who approved, what evidence, what you’ll check later.
    • Run one premortem per month on an AI-assisted process. “Assume this went wrong. How?” Then fix the top two failure modes. 
    • Protect leadership attention for inflection points. McKinseny cites the example that a CEO might keep 20 percent of the calendar empty.  The principle is right: protect time for the moments where judgement actually sits.

    Close

    I’m watching one thing more than anything else: whether organisations can keep the speed benefits of AI while making commitments harder to do by accident.

    Because that’s the new baseline. Drafting is cheap. Accountability is not. If you don’t design for that, your “AI transformation” will mostly be an expensive way to manufacture confident errors faster.