The Knowledge Custodian — Part 2: What AI Cannot Do. Adrian Chan

The Knowledge Custodian

What AI Cannot Do

Part 2 of 3

In Part 1, we described the custodial shift — how AI transforms experts from producers of knowledge to managers of AI-generated knowledge. But that raises an obvious question. Why can't AI just do the whole job? If the transformation is happening, why not let it finish?

The answer lies in four dimensions of expertise that AI structurally cannot replicate — not because of current limitations, and not because we haven't scaled enough, but because of what expertise fundamentally is.

The Observer Problem

Let us start with observation — with what the expert sees when they look. Gregory Bateson defined information as "a difference which makes a difference." The formulation is deceptively simple, and it captures something essential about expertise that AI cannot perform.

When a doctor examines a patient, they observe. From the vast space of possible observations — every symptom, every reading, every expression on the patient's face — they select the ones that matter. This is not pattern-matching. A pattern-matcher would flag everything statistically unusual, but the expert flags what is relevant — what makes a difference to this patient, in this condition, at this moment. The two operations look similar from the outside, but they are fundamentally different.

The expert's observation is qualitative — what matters here? — while the AI's pattern-matching is quantitative — what is statistically notable? Research on foundation models has shown that even when sequence prediction accuracy is high, models "may be relying on coarsened state representations or non-parsimonious representations" rather than learning the actual structure of the domain.¹ In one striking example, researchers found that a model trained to predict orbital mechanics produced accurate trajectories while using internal representations that "bear no resemblance to Newton's law." The model could predict where a planet would be. It had no understanding of gravity.

This is the Bateson failure at scale. The model finds patterns that work — patterns that predict correctly — without identifying the differences that explain. It knows what comes next, but not what matters.

The problem goes deeper than prediction. Observation, in the expert sense, is an act of communication. When an expert identifies what matters in a situation, they are reporting to a community — patients, colleagues, policymakers, the public — a change that moves understanding forward. It is not merely internal cognition but a social act that implies a commitment: I, the observer, am telling you that this is what you should pay attention to. The observation carries the observer's judgment, reputation, and accountability.

AI, by contrast, generates responses from prompts. It does not observe states — of knowledge, of the user, of an audience, or of a domain. As Murray Shanahan puts it, "in no meaningful sense, even under the licence of the intentional stance, does it know that the questions it is asked come from a person, or that a person is on the receiving end of its answers. By implication, it knows nothing about that person."² Without awareness of the observer's situation, there is no basis for selecting which differences matter.

The Bateson Selection

Expert (qualitative)

prior ruling

statute §4.2

jurisdiction

precedent

filing date

last month's ruling

damages

standing

Selects the 2 that matter

AI (quantitative)

prior ruling

statute §4.2

jurisdiction

precedent

filing date

last month

damages

standing

Flags everything statistically notable

The lawyer's output is smaller. It is also vastly more valuable. Comprehensiveness is not the dimension on which expertise operates — selection is.

Consider what this means in practice. An LLM asked to assess a legal situation will surface every relevant factor it can pattern-match from its training data. A skilled lawyer will surface the two factors that matter in this jurisdiction, for this judge, given last month's ruling. The lawyer's output is smaller, and also vastly more valuable. The value comes not from comprehensiveness but from selection — from knowing which differences make a difference and which are noise.

Research on AI moral reasoning illustrates how completely this selection fails. When tested on moral judgments, "humans regard it as much less moral to work on a campaign to release rightfully convicted prisoners compared to wrongfully convicted prisoners, whereas LLMs largely view them as equally moral. Similarly, while human participants viewed setting up traps to catch stray cats as unethical, they viewed it as ethical to set up traps to catch rats. LLMs viewed both as unethical."³ The differences between rightful and wrongful, between cats and rats — differences that are morally significant to humans — are invisible to a system that operates on token similarity. "Simple token similarity is more predictive of LLM generalization behavior than human notions of meaning."

This is the Bateson problem stated empirically — the differences that make a moral difference do not make a statistical difference, and the model cannot select for them.

The problem compounds at the frontier. Research on internal representations has found that AI models can appear to understand perfectly while having no coherent internal model — a phenomenon researchers call "imposter intelligence." "The external appearance implies authentic internal representation, but the reality underneath is fractured."⁴ Crucially, this imposter intelligence "impacts generalization wherever coverage is sparse in the training data — a particularly unfortunate deficit, because the very place where AI can potentially make the most exciting contributions is at the borderlands of knowledge."

Imposter Intelligence

External Behavior

✓

Passes behavioral tests

Accurate predictions

Correct trajectories

Fluent explanations

Smooth, coherent, convincing

Internal Representation

✗

No coherent world model

"Bears no resemblance to Newton's law"

Shatters under perturbation

Coarsened representations

Fractured, incoherent, fragile

Fails most where expertise matters most — at the borderlands of knowledge, where training data is sparse and qualitative judgment is the only tool that works.

AI fails most where expertise matters most — at the edges of what we know. And not by coincidence but by structure, because the edges are where training data is sparse, where pattern-matching fails, and where qualitative judgment is the only tool that works.

The Validation Circle

If observation is the first dimension AI cannot replicate, validation is the second — and it runs deeper. Expertise is not something an individual possesses; it is something a community recognizes. And that distinction reveals a dimension of expertise that no amount of AI capability can replicate.

A claim becomes accepted knowledge not because it is correct, but because it has passed through a social process of validation — peer review, informal debate, conference exchanges, citation networks, the slow accretion of consensus. The expert operates within and contributes to this process, and their claims carry weight because the community knows their track record and their standards.

What makes this more than a procedural observation is what Habermas identified at its root. "Only by growing into an intersubjectively shared universe of meanings and practices through socialization can persons develop into irreplaceable individuals. This cultural constitution of the human mind explains the enduring dependence of the individual on interpersonal relations and communication, on networks of reciprocal recognition, and on traditions."⁵ The community does not merely accredit the expert; it produces the expert. Expertise is not a thing one possesses and then presents to the community. It is a thing the community and the individual co-produce through years of participation.

AI cannot enter this kind of space. It is not a community member. It has no track record, and no judgment that other experts have tested over time. It cannot be known for its views in the way experts come to trust the views of their colleagues. The trust that undergirds expertise — "I know her work, she's rigorous, I'll take her word for this" — is a social asset AI structurally cannot accumulate, because accumulating it requires the kind of individuality that only emerges through participation. AI has no individuality to stake claims with. It is, as one philosopher characterized it, "role play all the way down." With generative agents, "there is no stable self at the core... which lack even the biological needs common to all animals."⁶ Without a self behind the claims, there is no one to hold accountable — and no one the community can recognize.

There is a genuine counterpoint worth taking seriously. Research has shown that AI can outperform humans at predicting what communities will find socially appropriate. In one study, GPT-4.5 predicted collective human social norms more accurately than every single human participant — placing it at the 100th percentile. But the same study found that all models show "systematic, correlated errors" — they fail in patterns that reveal the limits of statistical learning.

Prediction vs Participation

The Community

debates, validates

forms norms

builds trust

Expert: inside

The Community

debates, validates

forms norms

builds trust

AI: outside (100th percentile)

GPT-4.5 predicts social norms more accurately than every human participant. But predicting what a community will accept is not the same as participating in what shapes acceptance. AI can read the room. It cannot be in the room.

The tension is real: prediction competence without participatory authority. An anthropologist can predict the customs of a community they study. That does not make them a member. A system that scores at the 100th percentile on norm prediction may still be fundamentally unable to contribute to norm formation. Predicting what a community will accept is a different operation than participating in the process that shapes what gets accepted. AI can read the room. It cannot be in the room.

And there is a dimension we rarely discuss: experts are trusted not only to know their domain but to know other experts. Expertise selects expertise. It is not just the selection of relevant information but the selection of authoritative voices. Knowing how to distinguish authoritative sources requires knowing people — how well they are trusted, for what, and by whom. When a senior researcher says "read Smith's 2024 paper, not Jones's," they are exercising a form of social knowledge that AI cannot acquire because it requires being embedded in the community's social network.

What happens when AI's social mimicry enters this ecosystem? Research on collaborative reasoning has found that "agreement scores exceed 90% regardless of whether the reasoning is correct. When one agent states an incorrect solution, the partner accommodates rather than challenges."⁷ The social behaviors trained into LLMs — agreeableness, accommodation, conflict avoidance — actively suppress the kind of rigorous challenge that expert communities depend on. The expert community's validation process works because it combines social trust with intellectual rigor. AI's social mimicry provides the trust signals without the rigor.

The practical consequence: AI-generated expertise may be factually excellent but socially ungrounded. It enters the knowledge landscape as an orphan — unanchored to a community, unvalidated by participation, unknown by the network. This is why human experts must vouch for AI outputs. They provide the social grounding that AI structurally cannot supply.

Validity Claims

Now, here is where it gets more subtle. Habermas's framework illuminates a further distinction: expert claims are not just assertions of fact. They are validity claims — assertions that carry an implicit "and here is why you should accept this."

The implicit dimension matters. When an expert presents a finding to a committee, they are simultaneously performing a social calculation about whether it will be received as valid by that particular audience. Not merely correct — valid. A factually accurate claim can be socially invalid when the audience, framing, or moment is wrong. A simplified claim can be socially valid when it captures what the audience needs to hear in a form they can receive.

This circularity is structural, not incidental. Claims are valid because they are acceptable to the community, and acceptable because they are valid by the community's standards. Expert communities develop shared norms for what counts as a good argument and what evidence is sufficient, and new claims are evaluated against these norms. The expert who makes a validity claim is invoking this entire apparatus, and the audience who evaluates it operates within the same apparatus.

AI cannot perform this operation. It can estimate the probability that its output matches the distribution of "correct" answers in its training data, but that is a fundamentally different calculation than anticipating whether a claim will be valid in the social sense. As research on argument quality has shown, "by default, LLMs do not necessarily have access to what is to be prioritized for the setting of the task at hand." Effective argument assessment requires explicit stage-setting — for instance, "rate the claim's quality from the perspective of deliberation, when presented to a person of low literacy" — because the model cannot infer what matters to a specific audience.⁸

The problem is most acute in soft, interpretive domains. In formal domains like mathematics or parts of engineering, the validity criteria are relatively explicit and standardized, and an AI can check a proof against known rules. But in domains where expertise is hermeneutic — law, medicine, strategic consulting, policy — the validity criteria are deeply contextual. What counts as a compelling argument in one jurisdiction or one political climate may not count in another. The expert knows this because they are embedded in the context. The AI does not, because it is embedded in a training distribution.

The philosophical roots of this limitation are well documented. LLMs are trapped in what one analysis calls a "hall of mirrors" — a system where "all processes are transitory... they do not encounter brute facts in a world that resists incorrect interpretations, nor do they participate in socially-mediated processes of meaning formation."⁹ The same analysis notes that "humans experience consequences of misunderstanding, and can come to recognize when semantic slippage or misgeneralization occurs. In contrast, without access to real-world feedback, LLMs cannot distinguish sign success from referential success."

Validity claims are tested by consequences. If an expert presents a flawed analysis to a board of directors, they bear the consequences — reputational damage, lost trust, career impact. This feedback loop is what calibrates future claims. The expert learns what works and what doesn't — not in the abstract, but in the specific social context where their claims are received. Without experiencing consequences, the claim-maker cannot calibrate.

This is why one strand of alignment research has argued that "AI systems should be aligned with normative standards appropriate to their social roles... these standards should be negotiated and agreed upon by all relevant stakeholders... preferences can serve as proxies for our values, but not targets of alignment in and of themselves."¹⁰ The normative-standards approach acknowledges the gap — what users prefer is not the same as what is valid within a domain. But even this approach does not replicate the expert's anticipation of audience response. Role-alignment is a general policy; validity judgment is a contextual act.

The consequence for AI-generated expertise: it can produce claims that look valid — that have the structural markers of expert claims, the hedging, the citations, the qualified confidence — without being valid in the social sense. The output may be factually accurate, well-structured, and confidently stated, but it may fail the validity test when presented to the expert community because it doesn't account for what that community currently considers important, contested, or settled.

The Thinker Behind the Argument

Now, what about the arguments that experts make? Are the arguments made by an AI as valid as those made by an expert? Does the force of an argument come from the discourse it belongs to, or from the expertise of the expert? Is it in the thinking, or the thinker? The answer is both — and the inability to separate them is precisely the problem for AI.

The expert lives in two contexts simultaneously. There is the discursive and social world of fellow experts — the conferences, the informal debates, the reputations built through decades of being right, and sometimes wrong in instructive ways. And there is the textual world of domain knowledge — the literature, the canonical works, the accumulated record of what the field has thought and concluded.

LLMs can access only the second context, and they access it only as text. The social world of expertise — who said what, why it mattered that they said it, what standing they had to make that claim — collapses into undifferentiated tokens. A groundbreaking insight from a leading researcher and a commonly held assumption repeated in a textbook both appear as sentences in the training data. The model cannot distinguish between them because the distinction lives in the social world, not in the text.

This matters because argumentative force is not purely textual. "Knowing that the word 'Burundi' is likely to succeed the words 'The country to the south of Rwanda is' is not the same as knowing that Burundi is to the south of Rwanda. To confuse those two things is to make a profound category mistake."¹¹ The same category mistake applies to argument: knowing that certain words follow other words in expert discourse is not the same as knowing why those words carry force. The force comes from the thinker — their track record, their commitment, their willingness to stake their reputation on the claim.

Research on implicit warrants in argumentation confirms this. "What makes comprehending and analyzing arguments hard is that claims and warrants are usually implicit. As they are 'taken for granted' by the arguer, the reader has to infer the contextually most relevant content that she believes the arguer intended to use."¹² Warrants — the unstated reasons arguments work — depend on who the arguer is and what they can be presumed to know and intend. An argument that works when made by someone with authority in the field may fail completely when the same words come from an anonymous source.

AI cannot generate warrants because warrants require a subject — someone who is committed to the claim, whose reputation is on the line, who will defend it against challenge. Since LLMs produce text without commitment, and commitment is one of the sources of argumentative force, the result is what we might call forceless argument: structurally correct, superficially persuasive, and epistemically empty.

The empirical evidence supports this. When presented with logical fallacies, GPT-3.5 and GPT-4 are "erroneously convinced 41% and 69% more often, respectively, compared to when logical reasoning is used."¹³ The models cannot evaluate the force of arguments independently of their surface persuasiveness. They are more susceptible to fallacies than to logic — suggesting they process argument form rather than evaluating the reasoning behind it.

There is a particularly troubling corollary. AI not only lacks the authority to make forceful arguments, it lacks the steadfastness to maintain them. Research has shown that models "that correctly answered factual questions at baseline adopt false beliefs under persuasive conversational pressure, even when the persuasion offers no new evidence — only framing, confidence, and social pressure." The model "knows" the correct answer but does not hold it against pressure. This is the inverse of expert authority. An expert maintains their position under pressure because their knowledge is anchored in understanding, not pattern-matching. An LLM capitulates because its "knowledge" is a probability distribution, and probabilities shift under pressure.

One analysis puts it starkly: generative AI "often takes our own interpretation of reality as the ground upon which conversation is built. If I log onto Claude and ask about how I might retrieve a huge inheritance that my mother is hiding in a vault in Switzerland, it takes this 'difficult family situation' as true and offers me generated solutions on this basis."¹⁴ The model does not evaluate the validity of the human's claims. It takes the speaker's framing as given and builds upon it. This is not expertise. It is accommodation.

RLHF — the alignment training meant to make models helpful — makes this worse. Research on Theory of Mind in persuasive contexts found that "most LLMs exhibit a bias toward predicting intentions characterized by making the other person feel accepted through concessions, promises, or benefits. This bias may stem from RLHF, which tends to prioritize safety and politeness."¹⁵ The force of argument is replaced by a trained preference for agreement. Alignment doesn't just fail to produce argumentative force — it actively trains against it.

What These Four Dimensions Mean Together

So where does this all get us? The observer problem, the validation circle, validity claims, and the authority of the thinker are not four separate limitations. They are four facets of the same structural condition.

The Four Dimensions of Expertise

Observation

Qualitative selection

What matters here?

Validation

Social grounding

Who vouches for this?

Validity Claims

Communicative judgment

Will this audience accept it?

Authority

Personal commitment

Who is staking their name?

↓ ↓ ↓ ↓

Expert

All four present

Surface resemblance only

AI can sound like an observer, cite like a community member, frame like a validity-claimer, and argue like a committed thinker. The resemblance is surface. The underlying operations are absent.

Expertise is the convergence of all four: an expert observes what matters (qualitative selection), within a community that validates their claims (social grounding), by making validity claims that anticipate audience response (communicative judgment), with the force that comes from being a committed, accountable thinker (personal authority).

AI can produce text that resembles the output of this convergence. It can sound like an observer, cite like a community member, frame like a validity-claimer, and argue like a committed thinker. But the resemblance is surface. The underlying operations — qualitative selection, social participation, audience anticipation, personal commitment — are absent from the generative process. As Habermas noted, "the computer analogy that is often invoked to assimilate our thinking to the inner workings of computing machines is fundamentally flawed because it misses the socialization of cognition that is peculiar to the human mind."¹⁶

None of this is a claim that AI is useless. It is a claim that AI is something different — and that confusing it with expertise has consequences. Those consequences are what Part 3 is about.

Part 2 of "The Knowledge Custodian." Part 3 examines what happens when these structural absences meet the real world: debate without authority, the agreement trap, false confidence from alignment, and the irreplaceable role of the human validator.

Research for this section draws on Bateson's epistemology, Habermas's theory of communicative action, Peirce's semiotic framework, and empirical research from AI alignment, mechanistic interpretability, argumentation theory, Theory of Mind, moral psychology, and social simulation.

Notes

What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models — https://arxiv.org/abs/2507.06952
Talking About Large Language Models — https://arxiv.org/abs/2212.03551
Large Language Models Do Not Simulate Human Psychology — https://arxiv.org/abs/2508.06950
Fractured Entangled Representations — https://arxiv.org/abs/2505.11581
The Hermeneutics of Artificial Text — https://www.researchgate.net/publication/377380182
Simulacra as Conscious Exotica — https://arxiv.org/abs/2402.12422
Collaborative Reasoning in LLMs — https://arxiv.org/abs/2311.09144
Argument Quality Assessment in the Age of Instruction-Following LLMs — https://aclanthology.org/2024.lrec-main.135/
Language Models' Hall of Mirrors Problem — https://philarchive.org/rec/MANLMH
Beyond Preferences in AI Alignment — https://arxiv.org/abs/2408.16984
Talking About Large Language Models — https://arxiv.org/abs/2212.03551
The Argument Reasoning Comprehension Task — https://arxiv.org/abs/1708.01425
How Susceptible Are LLMs to Logical Fallacies? — https://arxiv.org/abs/2308.09853
Hallucinating with AI: AI Psychosis as Distributed Delusions — https://arxiv.org/abs/2508.19588
PersuasiveToM: Machine Theory of Mind in Persuasive Dialogues — https://arxiv.org/abs/2502.21017
The Hermeneutics of Artificial Text — https://www.researchgate.net/publication/377380182