Richard Hill

Judgement for AI-mediated work

Category: Judgement

  • Drafting Is Not Deciding

    When text becomes cheap, organisations start making decisions by accident.

    That is the central governance risk of generative AI in everyday work. Not hallucinations. Not “bias” in the abstract. The more common failure mode is quieter: fluent drafts begin to function like commitments. The pause where judgement normally sits is compressed, displaced, or skipped entirely.

    This is not primarily a technology problem. It is an operating problem.

    Judgement is a decision, not a view

    Judgement is what you do when there is no clean rulebook, the evidence is partial, and the consequences matter.

    It has four features.

    • Uncertainty is real, not cosmetic
    • Trade-offs are unavoidable
    • Reversibility is limited
    • Accountability attaches to someone, whether you like it or not

    That is why judgement cannot be reduced to “good writing” or “better thinking”. It is organisational work that needs roles, boundaries, artefacts, and review.

    If you want to find judgement in the wild, look for moments where the organisation cannot simply “follow the process”, and somebody must own a choice.

    The draft–decision collapse

    Modern organisations already run on documents. Emails, policies, proposals, HR notes, risk assessments, customer communications. These texts do not merely record decisions. They often trigger action as if a decision has been made.

    Generative AI accelerates this because it produces text that is fast, fluent, and socially credible. The fluency is the trap. A draft that reads well feels complete. That feeling is not evidence.

    The result is what I will call the draft–decision collapse. The boundary between “we are drafting” and “we are deciding” erodes. A piece of text is circulated, and the organisation behaves as if commitment has occurred.

    You can see this in small, expensive moments.

    • A customer email includes a deadline, a refund, or an exception that nobody consciously authorised.
    • HR notes drafted for “professional tone” drift into inference about motive, and later become evidence under subject access or dispute.
    • A policy update makes a subtle shift in risk posture because the wording sounds reasonable.
    • A procurement response contains unverified claims that then become part of the audit trail.
    • An internal briefing becomes “the plan” because it looks coherent, even though it is assembled from fragments.

    The AI did not decide. The organisation acted as if it had.

    Speed now, rework later

    The immediate benefit is throughput. The downstream cost arrives later, and it is distributed across teams.

    Typical outcomes include:

    • rework, because the exception is discovered after it has been implied in writing
    • stress, because accountability lands on whoever clicked send
    • reputational exposure, because polished language reads as official position
    • legal and compliance risk, because records contain unauthorised commitments or loaded statements
    • erosion of trust, because nobody can trace who approved what

    This is why “productivity” can become a misleading metric in AI-assisted environments. Output rises, but coherence can fall. You get more text and less clarity about what is true, what is agreed, and who owns the consequences.

    Decision rights are the core unit of governance

    A lot of AI governance is principle-heavy and operationally thin. Principles matter, but organisations fail at the level of decision rights.

    Decision rights answer questions such as:

    • Who can commit money, deadlines, or service levels?
    • Who can approve exceptions to policy?
    • Who can accept risk on behalf of the organisation?
    • Who can publish statements that external parties can treat as official?
    • Who is accountable when this goes wrong?

    Generative AI blurs decision rights because it enables decision-shaped text to be produced by people who do not hold authority to commit. The remedy is not panic, and it is not simply training. It is to rebuild the boundary between drafting and deciding into the workflow.

    Evidence discipline, applied

    Judgement quality is strongly linked to evidence discipline. What counts as “enough” evidence to justify commitment?

    AI-assisted drafting increases the risk that evidential standards become implicit and inconsistent. The text looks finished before the claim is justified.

    A practical way to restore evidence discipline is to make standards explicit in categories:

    • Must verify: facts, figures, dates, prices, terms, references, compliance claims, clinical or legal assertions.
    • Must attribute: sources, datasets, assumptions, and who supplied the information.
    • Must log: rationale, uncertainty, trade-offs, dissent, and alternatives considered.
    • Must escalate: exceptions, novel risks, reputational exposure, and anything difficult to reverse.

    This is what “epistemic governance” looks like when you take it out of the seminar room and put it into operations. It is governance over how claims are formed, justified, and validated inside the organisation.

    Without it, the organisation drifts into a substitution: it treats fluent text as evidence of adequate reasoning.

    Judgement as an operating model

    If judgement is organisational work, it belongs in the operating model. Not in a slide deck about values.

    A workable operating model for judgement has a few visible components.

    Tempo

    Where do decisions actually happen?

    • Which meetings and checkpoints function as decision points in practice?
    • Where are exceptions approved, and by whom?
    • What is the minimum pause required before a draft becomes a commitment?

    Artefacts

    What gets produced when a decision is made?

    If the artefact is only a polished email, then decisions will be enacted without rationale, evidential trace, or clear ownership. Stronger artefacts include short decision notes, sign-off records, risk acceptances, and decision briefs that make the basis of commitment explicit.

    Controls

    Controls should be selective and proportionate. They should sit where irreversible cost lives.

    Examples include pre-send review for customer communications with financial implications, second-person review for HR notes, approval gates for policy changes, and escalation triggers when language shifts from exploratory to committing.

    The goal is not bureaucracy. The goal is to prevent unauthorised commitments wearing the disguise of professional prose.

    Feedback and learning

    Judgement improves when it is reviewed.

    • Were the assumptions warranted at the time?
    • Did the decision produce the intended effect?
    • What was missed or mis-weighted?
    • Should evidence thresholds change?

    Without review, organisations do not learn. They repeat the same failure patterns at higher speed.

    Make judgement visible

    One of the most effective interventions is simply to make judgement visible and inspectable.

    A judgement log is a lightweight record of:

    • the decision being made
    • options considered
    • evidence available at the time
    • uncertainties and risks
    • who owned the decision
    • what would change the decision
    • when it will be reviewed

    Used properly, this improves decision-making in the moment by forcing explicitness. It also improves learning later by reducing hindsight reconstruction.

    There is an obvious failure mode. Logs can become performative, post-hoc justification. The difference is follow-through. A judgement log without review is just a nicely formatted opinion.

    So the practice needs review tempo, scheduled points where judgements are confirmed, refined, or reversed, with explicit notes on what was learned.

    The point

    The risk is not bad AI. It is invisible decisions.

    When drafting becomes effortless, organisations start committing to things without noticing, and then cannot trace who approved what. The response is not to ban tools or publish vague principles. It is to design operations so that drafting and deciding are separated, decision rights are explicit, evidence standards are operational, and judgements are recorded and reviewed.

    Text is getting cheaper. Accountability is not.

     

  • When Trustees Must Push Back on AI Leadership Narratives

    Much of the current leadership commentary on AI is directionally sound. It emphasises that AI should be advisory rather than authoritative, that leaders should provide context rather than instructions, and that human responsibility does not dissolve simply because drafting has been automated. These are sensible claims. They are also, from the perspective of trustees and non-executive directors, insufficient.

    Boards are not responsible for endorsing the moral centre of a narrative. They are responsible for the operating conditions under which that narrative remains true. In AI-mediated work, the failure mode is rarely dramatic at first. It is procedural and quiet. When text becomes cheap, the organisation can move decision-making into drafting without noticing. A polished paragraph begins to function as a commitment. An AI-generated summary starts to substitute for a review of evidence. A recommendation becomes a default because it is convenient, not because it is justified. In this environment, the distinction between “draft” and “decision” becomes a governance boundary, not a matter of good personal habits.

    This is where trustees should press. Not because the narrative is wrong, but because it is incomplete in precisely the areas that define governance: decision rights, risk ownership, control design, and assurance. The core trustee question is always the same: what are we relying on here, and how do we know it is working?

    Guardrails are asserted, not operationalised

    Leadership writing often calls for “guardrails” in the form of values and decision rights. The difficulty is that guardrails are not a statement of intent. They are operational boundaries that hold under pressure. Values are necessary but not sufficient. Decision rights are decisive only if they are made explicit in workflows, not merely implied by an organisational chart.

    Trustees should treat “guardrails” as a claim that requires demonstration. Which decisions are currently being influenced by AI, and where are those decisions recorded as such? Who is the accountable owner for each? Where is the transition from drafting to deciding made explicit? What triggers escalation, and to whom? How are exceptions handled, and how is exception-handling reviewed?

    Without concrete answers, the organisation has not introduced guardrails. It has introduced language about guardrails.

    In practice, operationalisation begins with a decision inventory: a finite list of recurring decisions in which AI is used or will soon be used. The list is typically shorter than people assume, especially if it is constrained to decisions that create obligations, exposures, or material impacts. It then requires a decision rights map that specifies who drafts, who checks, who decides, who can override, and who must be informed. This is not procedural theatre. It is the minimum structure required to prevent accidental delegation, which is the characteristic governance hazard of AI-assisted drafting.

    Trustees should also focus attention where governance actually lives, in exceptions. Routine decisions tend to look coherent even in poorly governed systems. It is at the edges, unusual cases, time pressure, emotional friction, incomplete information, that accountability becomes unclear. If the organisation cannot explain how exceptions are labelled, owned, rationalised, and reviewed, then decision rights remain largely nominal.

    Risk is discussed implicitly, but not mapped into controls and assurance

    A second weakness in leadership narratives is the absence of a risk taxonomy and the corresponding absence of a control map and assurance story. Responsibility is asserted, but the organisation is not described as a system of risks, controls, and tests. Trustees cannot discharge their responsibilities in that register.

    AI-mediated work tends to change risk profiles in consistent ways. Confidentiality risk rises because staff can unknowingly disclose sensitive information through prompts or outputs. IP risk rises because commercially valuable content can be reproduced, shared, or stored in ways that were not previously plausible at scale. Regulatory and legal exposure can increase because outputs can contain ungrounded assertions, discriminatory language, or defamatory implications, particularly in high-stakes contexts such as HR, safeguarding, compliance, and client communications. Auditability deteriorates if the organisation cannot reconstruct who approved what, on what evidence, and under which conditions. Operational dependency grows as tools become embedded before they are treated as dependencies with resilience requirements. Model behaviour can drift as systems update and workflows evolve, undermining implicit assumptions about reliability.

    The trustee posture here should be clear. The question is not whether management has considered these risks in principle. The question is whether risks are mapped to concrete use cases and converted into preventive and detective controls with named owners, and whether there is an assurance mechanism capable of validating that the controls are actually functioning.

    An organisation does not need an elaborate governance programme to start. It does need to demonstrate basic discipline. For each high-impact AI-assisted use case, management should be able to state what can go wrong, what controls prevent it, what controls detect it, who owns those controls, and how they are tested. Trustees should not accept “we are careful” as an assurance story. Nor should they accept “nothing has gone wrong so far” as evidence of safety. Absence of detected failure is not evidence of a robust control environment.

    Judgement is treated as decisiveness, rather than decision mechanics

    A third weakness is the tendency to equate judgement with decisiveness and accountability. This association is understandable. Many organisations do need decisiveness. However, in AI-assisted contexts, decisiveness can become a mechanism for fast error. The very quality that is celebrated in leadership narratives can be amplified into a liability if it is not constrained by decision mechanics.

    Judgement, in governance terms, is not a temperament. It is a design property of decision-making under uncertainty. It depends on evidence thresholds, dissent channels, explicit sign-off points, escalation rules, and learning loops that produce system change rather than mere reflection. It depends on distinguishing reversible decisions from irreversible ones, and on ensuring that the organisation does not mistake persuasive language for justified commitment.

    AI increases the plausibility and fluency of drafts. That is precisely why trustees should insist that the organisation strengthens its decision mechanics at the points where commitment is made. If the organisation cannot identify those points, or cannot describe the evidence discipline that governs them, then it is likely that decision-making is already drifting into drafting.

    The “only humans can” obscures the control problem

    A fourth weakness is the rhetorical neatness of claims about what AI cannot do: it cannot set aspirations, cannot create truly new ideas, cannot take responsibility. These claims may be defensible philosophically. Trustees should not anchor on them operationally.

    The practical governance question is not whether AI can do something in principle. It is how AI is allowed to shape organisational decisions, and how the organisation prevents unacknowledged delegation. AI can influence organisational direction without possessing values. It can shape agendas by surfacing certain issues and suppressing others. It can shape options by generating some alternatives more readily than others. It can shape framing by presenting trade-offs in ways that nudge preferences. These are not abstract concerns. They are mechanisms through which AI can affect judgement.

    The trustee implication is straightforward. Governance cannot rely on metaphysical reassurance. It must rely on boundary design. Where may AI propose options? Where may it draft language? Where may it summarise evidence? Where is it prohibited from making recommendations without human verification? Where must a human explicitly attest that they reviewed underlying evidence rather than simply approving a draft?

    In an AI-mediated environment, governance requires friction at commitment points. Comforting narratives reduce friction. Trustees should be explicit about this trade-off, and resist the temptation to treat human exceptionalism as a substitute for operational control.

    Skills-based hiring is not a governance improvement unless validity is demonstrated

    Finally, the “paper ceiling” point, the claim that organisations should reduce reliance on credentials and adopt skills-based hiring, is socially important and potentially valuable. It is not, in itself, a governance improvement unless it is treated as a selection system that must be validated.

    Trustees should ask a simple question: what evidence shows that the proposed method predicts performance and reduces bias? Without validation, an organisation can replace one unfair filter with another, and make it harder to detect because the new filter is presented as progressive and modern.

    Audition-style selection can introduce its own biases through unequal access to preparation time, familiarity with the cultural norms of performance, and variability in evaluation. It can become inconsistent unless inter-rater reliability is tested and the audition tasks are designed and reviewed with the same seriousness as assessment in education. AI complicates this further because candidates can use AI in preparation or during the audition itself. If AI use is permitted, what competencies are being tested? If it is prohibited, how is enforcement designed without introducing new inequities? These are governance questions because they concern predictability, fairness, defensibility, and organisational reputation.

    The trustee move is to translate narrative into mechanism

    Across these five weaknesses, a single pattern recurs. Leadership narratives describe intent. Trustees must insist on mechanism. That means translating the language of judgement into decision rights and decision logs, converting risk awareness into control mapping, and converting learning culture into a cadence that produces measurable change.

    The central governance risk in AI-mediated work is not “bad AI” in the abstract. It is the gradual relocation of commitment into drafting, and the diffusion of accountability that follows. Trustees should therefore ask management to show where the boundaries are, who owns them, and how they are tested. If those questions are answered with clarity, the leadership narrative becomes more than a narrative. It becomes a governable operating model.

    That, ultimately, is what boards require: an organisation that can reconstruct who decided what, on what basis, with what safeguards against accidental delegation, and with what mechanisms for correction when judgement proves wrong. In an environment where language is cheap, making judgement visible is not a stylistic preference. It is a governance necessity.

     

  • Judgement as an Operating Model Problem in the Age of AI

    Most organisations treat judgement as a personal attribute. You hire “good people”, you develop leaders, you run training on critical thinking, you encourage “better decisions”. When something goes wrong, you look for a flawed individual judgement call.

    That framing is convenient, and often wrong.

    In AI-mediated work, judgement is less an individual virtue and more an operating model property. The quality of judgement you get is shaped by how work is organised: who has decision rights, how exceptions are handled, what evidence is required, what gets logged, what gets escalated, and what counts as “done”. AI doesn’t replace judgement. It changes how judgement failure happens, and it makes operating-model weaknesses visible faster and more painfully. 

    What the operating model lens reveals

    An operating model is the practical machinery of “how we run this place”: accountabilities, workflows, governance, and the information flows that coordinate action. The operating model lens is ruthless, because it assumes that outcomes are generated by systems, not by intentions.

    That matters because AI is now woven into the operating model in three ways:

    1. It accelerates throughput. Drafts, summaries, analyses, and responses appear instantly.
    2. It increases plausibility. Outputs sound coherent even when they are wrong, incomplete, or mis-scoped.
    3. It blurs the boundary between drafting and deciding. Text is no longer a bottleneck, so the natural pause where judgement used to sit gets squeezed out. 

    If your operating model assumed that drafting time created a moment for review, you just lost that safety feature. Not because anyone is reckless, but because the system no longer generates friction by default.

    Judgement failures are usually “process failures wearing a human mask”

    In post-mortems, organisations often say “we made a poor decision”. What they frequently mean is one of these:

    • We didn’t know who was allowed to decide.
    • We didn’t define what evidence was required.
    • We didn’t distinguish routine from exception.
    • We didn’t create a checkpoint where dissent could surface.
    • We didn’t log the rationale, so we can’t learn or defend it.
    • We didn’t notice a commitment had been made until the customer, regulator, or employee treated it as binding.

    Those are operating model failures. AI just makes them faster and easier to trigger.

    Classical judgement theory already hints at this. Under bounded rationality, decision quality depends on process appropriateness more than optimisation fantasies. Sensemaking emphasises that meaning is constructed socially and continuously, not discovered fully-formed. AI intensifies both realities: it participates in attention and framing, and it pushes organisations toward premature closure when outputs feel “good enough”. 

    Four operating model questions that decide your judgement quality

    If you want judgement to survive contact with ubiquitous AI, the operating model has to answer four questions explicitly, in writing, in workflows.

    1) Where do decision rights sit, and how are they triggered?

    The most common AI-era failure mode is decision-rights ambiguity. An AI-assisted email goes out promising a delivery date, a refund, or a concession. Nobody thought they were “making a decision”. They thought they were drafting.

    Operating model fix: define decision rights by decision type (refunds, contract language, pricing exceptions, patient risk thresholds, disciplinary records, etc.), and embed triggers. “If X, escalate to Y.” Not as a cultural norm. As a routing rule.

    This is the difference between “we trust people” and “we trust the system we’ve designed people to operate within”.

    2) What counts as sufficient evidence, and who verifies it?

    AI increases the risk of evidence laundering: plausible claims appear in polished prose, and the organisation treats them as if they were checked. This isn’t always hallucination. It can be selective summarisation, missing base rates, or silent assumptions.

    Operating model fix: specify evidence requirements for high-impact decisions. What sources are acceptable? What must be verified? What uncertainty must be disclosed? Who signs off?

    This is governance, not pedantry. In the terms of heuristics and biases research, you’re managing systemic error pathways rather than trying to “train bias out” of individuals. 

    3) How do we handle exceptions, not just the happy path?

    AI is great on the happy path: standard queries, routine cases, predictable customers, typical students, normal demand patterns. Executive pain lives in exceptions: novelty, edge cases, moral ambiguity, regime shifts.

    Naturalistic decision-making research is blunt here: experts don’t optimise; they recognise patterns and simulate consequences. AI can help by acting as a cognitive simulator (surfacing edge cases, second-order effects), but it can also drown teams in plausible alternatives with no grounding. 

    Operating model fix: define exception-handling explicitly. What constitutes an exception? What is the escalation path? What is the “stop the line” rule? Where must a human deliberately re-frame the situation rather than accept the AI’s framing?

    If you can’t state your exception model, your organisation is outsourcing it to the path of least resistance.

    4) What gets logged so we can learn, calibrate, and defend?

    AI-mediated work increases epistemic risk: uncertainty about what was known, what was assumed, what was generated, and who endorsed it. Without logs, you can’t do learning, and you can’t do accountability. You get blame, not improvement.

    Operating model fix: implement judgement traceability proportional to risk. Not “log everything” (you’ll create noise and resentment), but log the rationale for consequential decisions and the provenance of key claims. At minimum: decision owner, inputs, assumptions, dissent surfaced, and the reason for the final call.

    This aligns with the practical wisdom point: responsibility remains human, and executives become responsible for the cognitive environment that produced the decision, not just the final act. 

    The operating model reframes what “AI readiness” really means

    A lot of AI readiness talk focuses on literacy, tooling, and prompt competence. Those are useful, but they’re downstream. A technically fluent organisation can still be judgement-poor if its operating model creates accidental commitments and untraceable responsibility.

    From the operating model lens, AI readiness is closer to judgement readiness:

    • Can we name who decides what?
    • Can we tell “draft” from “decision” in our workflows?
    • Can we show what evidence we relied on?
    • Can we prove we considered counter-arguments?
    • Can we detect when the model is outside its domain of competence?
    • Can we learn when we were wrong?

    If those answers are fuzzy, AI will amplify the fuzziness, then external reality will punish it.

    A concrete way to start: map “decision surfaces”

    One practical move is to identify your organisation’s decision surfaces: points where text, analysis, or recommendations cross a boundary into an external commitment or irreversible internal action.

    Examples:

    • Customer emails and proposals
    • Contract amendments
    • HR notes and performance documentation
    • Clinical pathways and risk stratification thresholds
    • Financial approvals and credit decisions
    • Public statements and investor communications

    For each surface, ask:

    • What decisions are implicitly made here?
    • Who owns them?
    • What evidence is required?
    • What’s the exception rule?
    • What is the minimum log?

    This is boring work. It is also exactly the kind of boring work that prevents spectacularly expensive failures.

    The core claim

    In AI-mediated organisations, judgement quality is not primarily a training issue. It is an operating model issue.

    AI makes drafting cheap and fast. That’s the feature. The cost is that the organisation loses the natural friction that used to protect decision points. If you don’t replace that friction deliberately with decision rights, evidence rules, exception handling, and traceability, you won’t get “augmented intelligence”. You’ll get faster confusion with nicer wording.

    Executive value, then, becomes the governance of this machinery: designing an operating model where judgement remains visible, owned, and defensible even when cognition is distributed across humans and machines. 

  • Executive Judgement in AI-Mediated Organisations: Why Leadership Value Hasn’t Moved to the Machine

    Artificial intelligence is now embedded in the everyday cognitive environment of senior leaders. Executives operate amid algorithmically generated forecasts, recommendations, alerts, scenarios, summaries, and ranked options. Much of the commentary asks whether this makes decisions faster, smarter, or more consistent. That framing is understandable, but it risks missing the main event.

    The central challenge for executives is not learning AI tools, becoming more “data-driven”, or expanding analytical capacity. It is preserving and governing sound judgement when cognition itself is increasingly distributed across humans and machines. AI does not remove uncertainty, responsibility, or value conflict. Instead, it changes the conditions under which judgement is exercised, defended, and institutionalised.

    A useful way to see this is to start from a deliberately unfashionable claim: executive judgement is the primary unit of executive value in AI-mediated organisations. Tools can speed up analysis. Models can improve prediction. But no amount of optimisation can tell an organisation what it ought to do when aims conflict, when evidence is incomplete, when consequences are irreversible, or when legitimacy matters as much as performance. Those are the permanent features of executive work. AI reshapes how these features present themselves, but it does not eliminate them.

    Bounded rationality has moved

    Herbert Simon’s account of bounded rationality remains a solid foundation for understanding executive decision-making. Executives do not optimise across all alternatives with full information. They satisfice, using processes that are “good enough” under constraints of attention, time, uncertainty, and limited cognitive capacity. Contemporary AI discourse often treats these constraints as computational problems to be solved: more data, more processing, broader search, better predictions.

    But bounded rationality is not just about computation. It is a structural condition of decision-making under uncertainty and responsibility. AI may expand analytical reach, yet it introduces new constraints and new risk surfaces. Executives now face governance demands that did not previously exist: interpreting probabilistic outputs; judging whether a model’s domain of competence matches the current situation; managing drift and contextual misalignment; and justifying machine-influenced decisions to boards, regulators, and stakeholders. The satisficing threshold becomes multi-dimensional. Decisions must be not only effective, but also defensible, explainable (to the extent possible), and accountable.

    So AI doesn’t dissolve bounded rationality. It relocates it. The bottleneck shifts from “can we analyse enough?” to “can we govern what this analysis is doing to our decision system?”

    Sensemaking is the real battleground, and AI participates in it

    If Simon explains the limits of optimisation, Karl Weick explains something even more important for executive work: executives do not merely choose among options, they construct the situations to which they respond. Sensemaking is continuous, social, and oriented toward plausibility rather than objective completeness. People act on the story that feels coherent enough to coordinate action.

    AI systems now shape this story-making process. They do not merely present information. They generate framings: summaries, narratives, ranked priorities, recommended actions, and “what this means” interpretations. They influence salience (what gets noticed), urgency (what feels pressing), and plausibility (what seems like the obvious conclusion). And they do this persistently, not episodically. The executive is not just consulting a tool at decision time; the organisation is living inside a stream of machine-shaped attention and interpretation.

    This matters because sensemaking is upstream of “the decision”. If an AI system stabilises one framing too early, the organisation can converge prematurely. If it privileges what is easily measured, difficult-but-important considerations can be squeezed out. If it produces fluently written explanations, it can create an illusion of coherence that substitutes for scrutiny. When AI becomes an always-on participant in meaning construction, governing judgement becomes inseparable from governing the cognitive environment.

    Bias doesn’t vanish. It migrates across the human–machine boundary.

    A persistent misunderstanding is that AI reduces bias by replacing human judgement with statistical rigour. Research on heuristics and biases shows why this is naïve. Human judgement relies on heuristics that are adaptive under uncertainty, but they create systematic distortions. The tempting story is that AI corrects these distortions.

    In practice, bias doesn’t disappear. It relocates and recombines. It enters through problem formulation, data selection, model objectives, defaults, interface design, prompt choices, and interpretation of outputs. It also interacts with organisational incentives and human cognitive habits in ways that create systemic failure modes rather than individual errors.

    Two familiar patterns illustrate the problem. The first is automation bias: people defer to machine recommendations even when context suggests caution. The second is selective scepticism: people distrust the machine only when it conflicts with their prior beliefs, while embracing it as “objective” when it supports them. In both cases, the bias is no longer located neatly in a person’s head. It is distributed across a socio-technical system.

    This is why “debiasing training” for executives is a weak intervention. Bias in AI-mediated contexts is often embedded in workflows, defaults, and organisational routines. Likewise, technical interventions like explainability features or fairness metrics help, but they only touch fragments of a broader epistemic governance problem: how an organisation decides what to believe, what to ignore, and what to act on.

    Expert judgement isn’t option-comparison. It’s pattern recognition plus mental simulation.

    Naturalistic Decision Making research reinforces another awkward reality: experienced decision-makers rarely choose by comparing options in a neat analytic spreadsheet. In high-stakes environments, experts recognise patterns, generate a plausible course of action, and mentally simulate consequences to test feasibility. Alternatives are considered mainly when the first approach fails the simulation.

    This is not irrationality. It’s a sophisticated adaptation to time pressure and complexity. And it has big implications for AI. If executives are fundamentally operating through recognition and simulation, then AI’s most natural value is not “make the decision” but “extend the simulation”.

    Used well, AI can surface edge cases, generate counterfactuals, reveal second- and third-order consequences, or stress-test assumptions. It can act as a cognitive simulator that enriches the executive’s mental models. Used badly, it can swamp intuition with noise, or lock the organisation into historically dominant patterns by privileging what the training data makes salient.

    The key is that this interaction is governance-dependent. The outcome hinges less on raw model performance and more on how leaders calibrate trust, integrate outputs into deliberation, and preserve the ability to say: “This situation is outside the model’s competence.”

    Practical wisdom and responsibility are non-transferable

    Even if AI became dramatically better at prediction and recommendation, executive accountability would not change. Executive judgement is not purely cognitive; it is normative. Decisions commit the organisation to action under uncertainty, with consequences for people, resources, and futures. That is responsibility-bearing work.

    The classical language for this is practical wisdom (phronesis): deliberating well about what ought to be done in specific circumstances where rules are incomplete and values conflict. Technical systems can generate analysis. They cannot bear responsibility, justify decisions in moral terms, reconcile competing goods, or absorb blame when harm occurs. Institutions still hold humans accountable. More importantly, organisations still need humans to decide what kind of organisation they are trying to be.

    This is where much “AI governance” talk becomes too thin. Compliance frameworks and technical safeguards matter, but they are not a substitute for judgement. They can reduce certain classes of harm while leaving the core executive task untouched: deciding what to do when the formal criteria are insufficient or conflicting.

    The AI-Augmented Executive: a governance construct, not a tooling narrative

    Taken together, these strands point toward a reframing: executive capability in AI-mediated environments is best understood as the governance of distributed cognition.

    This is the heart of the “AI-Augmented Executive” construct. The AI-augmented executive is not simply someone who uses AI tools well, writes good prompts, or adopts analytics enthusiastically. It is an executive who retains responsibility for consequential decisions while deliberately governing how AI participates in organisational judgement.

    Four responsibilities follow.

    Governing cognitive boundaries. Leaders must decide where AI is epistemically valid and where it is not, especially under novelty, regime shift, or moral ambiguity. This includes knowing when to treat AI outputs as tentative hypotheses rather than as actionable conclusions.

    Governing sensemaking. Leaders must manage how machine-generated framings shape organisational narratives, salience, and closure. This includes protecting interpretive plurality, creating room for dissent, and preventing fluency from masquerading as truth.

    Governing bias and failure modes. Leaders must treat bias as systemic, not personal. The task is to design processes that detect and counterbalance bias propagation across data, defaults, prompts, incentives, and interpretation.

    Retaining accountability. Leaders remain answerable for outcomes and for the cognitive environment that produced them. In AI-saturated settings, this means being accountable not just for a decision, but for the socio-technical decision system.

    None of this implies that AI is unhelpful. It implies the opposite: AI is powerful enough to change the shape of judgement work. The risk is not “bad AI” in the simplistic sense. The risk is organisational decision-making drifting into a state where commitments occur by accident, responsibility becomes untraceable, and machine-shaped narratives quietly replace deliberation.

    A closing implication: stop treating AI as a decision upgrade

    Many organisations treat AI adoption as a competence story: train people, buy tools, improve speed and output. That approach can deliver local efficiency and still degrade judgement quality. It can create faster decisions with weaker accountability, more confident narratives with less epistemic humility, and broader analytics with narrower sensemaking.

    A more serious posture is to treat AI as a structural change in the organisation’s cognitive architecture. That demands governance, not just capability-building. It demands attention to decision rights, epistemic risk, interpretive discipline, and the design of workflows that keep responsibility visible.

    In an age where analysis is cheap and fluent text is abundant, judgement becomes the scarce resource again. That is not a romantic claim about human exceptionalism. It is the boring, durable reality of leadership: deciding what to do when the world refuses to become tidy, and owning the consequences when it doesn’t go to plan. 

  • Up tempo work

    Generative AI has quietly changed the tempo of work. Not in the headline places. In the boring places. Email, agendas, briefing notes, drafts of policies, draft replies to customers, draft performance notes. Stuff that used to take just enough effort to force a pause. 

    Now the pause is optional. That sounds like a productivity win. It is, sometimes. It’s also a governance problem wearing a productivity moustache.

    Because when drafting becomes effortless, organisations start committing to things without noticing. The thing that used to be “a draft” becomes “the decision”, because it reads cleanly and moves fast. 

    The claim

    The biggest leadership risk in the AI era is not that AI will make leaders obsolete. It’s that AI will make commitment too cheap, and organisations will confuse fluent drafting with actual decision-making.

    What would change my mind? Evidence that teams using AI heavily can consistently show (a) clear decision rights, (b) reliable escalation paths for exceptions, and (c) an audit trail that explains who owned what when it mattered, without slowing everything to a crawl. Not a policy. Actual practice.

    What I think is going on

    The current leadership narrative goes something like: AI can draft, but it can’t lead. Leaders must provide context, set guardrails, build trust, show judgement, and so on. 

    All true, in the abstract. But it misses the mechanics.

    AI doesn’t “replace leadership”. It changes the surface area of leadership. It pushes leadership into thousands of micro-moments, distributed across the organisation, where people are generating text and making commitments at speed. And those micro-moments are exactly where decision rights usually get fuzzy.

    So the right question isn’t “Can AI lead?” The question is: Where are decisions being made by accident, because text became cheap?

    The part people get wrong

    A lot of writing about AI leadership leans on “guardrails (clear values and decision rights)” as if saying it makes it real. 

    But guardrails are not values on a slide. Guardrails are a working control system:

    • which decisions exist (not “be responsible”, actual decisions)
    • who owns them
    • what counts as an exception
    • what must be escalated
    • what evidence is required before committing
    • how you find out when people bypass the route

    If you can’t answer those in plain English, the “guardrails” are vibes. Vibes do not survive contact with the inbox.

    A cleaner mental model

    McKinsey frames a shift from “command” to “context”.  I mostly agree, but I’d sharpen it:

    Leadership is moving from “deciding” to “designing decision conditions”.

    That means your job is to design the conditions under which other people, often using AI, can make decent calls under time pressure, without turning the organisation into a liability farm.

    Concrete example: customer support.

    AI helps a support agent draft a reply in 30 seconds. The model is good at sounding helpful. It will often over-promise because over-promising sounds helpful. A human who’s tired, new, or keen to close tickets can hit send.

    Now you’ve got an implied contract. Delivery teams inherit a mess. Finance gets dragged into refunds. Nobody can say whether this was an authorised exception or an accidental commitment.

    The fix isn’t “tell agents to be careful”. The fix is to explicitly separate:

    • drafting authority (anyone can draft),
    • commitment authority (only named roles can approve terms, money, timelines, exceptions),
    • release control (what must be checked before “send”, and who checks it).

    That’s governance. Not glamorous. Very effective.

    Judgement is not a personality trait

    McKinsey says leaders must demonstrate judgement, aligning choices to values, because AI is advisory not authoritative. 

    Yes. But “judgement” as a leadership trait is too squishy to operate at scale. I used to treat judgement as something you either have or you don’t. Now I think judgement is a system property as much as a personal one.

    Judgement shows up in:

    • what evidence is required before acting
    • whether uncertainty is made visible or papered over
    • how exceptions are handled
    • whether reversals are allowed without punishment
    • whether you can trace a decision back to a person, a rationale, and a timestamp

    If your operating environment rewards speed and punishes hesitation, you’ll get confident nonsense. AI just helps you generate it faster.

    Creativity, but make it accountable

    McKinsey argues leaders must design for nonlinear outcomes, not “20 percent better” but “10 times better”, and that humans must frame the problem, invite dissent, and hold the creative line. 

    Again, broadly right. Here’s the catch: AI makes it easy to produce ten options, which can create the illusion of creativity while reducing actual thinking. You get a pile of plausible outputs and nobody wants to be the boring person who asks, “What are we optimising for?”

    So I treat creativity work with AI like this:

    1. Write the brief like a contract. What is in scope, out of scope, what constraints are real, what success looks like, what failure looks like.
    2. Force one hard trade-off. Speed vs accuracy. Cost vs user harm. Personalisation vs privacy. Pick one. Make it explicit.
    3. Require a dissent paragraph. Not “risks”, a genuine counter-argument. If the best dissent you can write is weak, you probably don’t understand the space.
    4. Name the decision owner. The person who is on the hook when the shiny idea breaks.

    That’s how you get novelty without random motion.

    Where this breaks

    A few objections are fair.

    “This is too heavy for small teams.”

    If you try to build a full enterprise control framework, yes. But decision rights can be lightweight. A one-page “commitment map” is often enough to stop the worst mistakes.

    “We move too fast to add process.”

    You’re already paying for process. You’re just paying after the fact, in rework, customer fallout, HR pain, and fire drills. The question is where you want to spend your admin budget: before or after damage.

    “But leaders do need softer skills, trust, empathy, learning culture.”

    Agreed. The document makes a strong case for learning loops like premortems and after-action reviews.  I’m not arguing against the human stuff. I’m arguing that the human stuff fails without mechanics. Trust doesn’t scale by declaration. It scales when people can predict how decisions get made and how exceptions are handled.

    “AI tools can be configured to prevent this.”

    Sometimes. But configuration is still a governance choice. Who decides the rules? Who can override? What gets logged? Same problem, new wrapper.

    What I’d do if I were responsible

    • Map “commitment moments”. Where can someone, with a draft, create an obligation? Email, proposals, HR notes, customer replies, invoices, policy statements, procurement requests.
    • Define three decision classes.
      1. routine, can be auto-approved
      2. exception, needs named sign-off
      3. high-stakes, needs a second human and a record
    • Write a “draft vs decision” rule into workflows. Not training slides. Actual steps. If it matters, it gets reviewed.
    • Require minimal decision logs for exceptions. Two minutes, not a dissertation: what changed, why, who approved, what evidence, what you’ll check later.
    • Run one premortem per month on an AI-assisted process. “Assume this went wrong. How?” Then fix the top two failure modes. 
    • Protect leadership attention for inflection points. McKinseny cites the example that a CEO might keep 20 percent of the calendar empty.  The principle is right: protect time for the moments where judgement actually sits.

    Close

    I’m watching one thing more than anything else: whether organisations can keep the speed benefits of AI while making commitments harder to do by accident.

    Because that’s the new baseline. Drafting is cheap. Accountability is not. If you don’t design for that, your “AI transformation” will mostly be an expensive way to manufacture confident errors faster.