Imagine opening a chat with an AI assistant after a week away. The last time you used it, you were halfway through planning a family trip — narrowed the destination to two possibilities, ruled out one hotel for a specific reason, mentioned your partner was recovering from surgery and couldn't do long travel days, and agreed to come back to it when you had thought more.
A week passes. You open the window. How can I help you today?
Notice what just happened. The AI has not forgotten who you are, exactly — it may remember your name, your general preferences, perhaps a few things you told it to remember. But it has not held onto the state of the conversation the way a friend would. A friend, meeting you after a week, might say how's the planning going? Did you decide between the two places? A friend would know, without being told, that your partner is still recovering, that the travel constraints haven't changed, that last time you were leaning toward one option and wanted to think about it. A friend holds context across the gap. For the AI there was no gap — one conversation ended, a new one began, and the relationship between the two is whatever the memory system and the retrieval layer happened to preserve, which in current products is less than you think.
You feel the absence, even if you can't quite name it. Something about the encounter is thinner than the encounter with a human would have been — not because the AI is less capable, but because it has no presence. It was not holding you in mind while you were away. It was not away at all. It has no sense that you left and came back, no continuity across the gap, no accumulated context that would let it meet you where you are rather than where any user might be. What you are feeling is the absence of proximity — the relational depth that comes from being known over time by something that was paying attention.
That absence of proximity is the subject of this chapter. Context in AI is categorically different from context in any previous medium, and most of the frustration users experience — the re-explaining, the losing of threads, the feeling that nothing is being retained — is downstream of that difference. The design problem is not to make the AI remember everything. It is to give it enough presence and proximity that you feel met when you return — and to make what the AI remembers, what it does not, and what kind of relationship to time it has, legible to you.
Conventional software supplied context through a stable spatial interface — the page, the folder, the menu, the navigation, the breadcrumb trail. Where am I was a question with a spatial answer, visible on the screen.
AI supplies almost none of this. The interface is language, the state is conversational, and where am I has no spatial answer and usually no temporal answer either. You are in the middle of a conversation with no page, no folder, and no map. The only evidence of where you are is what has been said, and most of that is no longer visible after a few turns. Current products compensate with session histories, memory features, project sidebars, pinned context — scaffolding on a deeper problem. The deeper problem is that AI does not occupy the same kind of context that software used to supply, and designers are still learning the vocabulary.
Context in AI has three layers, each a distinct design problem, each being quietly mismanaged by products that treat context as a technical concern rather than a design one.
Spatial context is gone. There is no page, no map, no visible state. You have to infer where the conversation is from the conversation itself.
Temporal context is harder. AI does not occupy lived time — it does not know what day it is, how long it has been since the last exchange, or whether anything has happened in the world since the previous turn.
Memory is partial and asymmetric. The system remembers some things, you remember others, and you usually do not know which is which. This asymmetry produces the strangest moments of AI interaction — where the AI appears to recall something it could not possibly have known and to forget something it should have.
The deepest issue here is about time itself, and the asymmetry between the time you live in and the time an AI processes in.
Two thinkers help frame the problem. Erving Goffman, the sociologist who spent his career studying how people manage themselves in the presence of others, showed that interaction is not just an exchange of words but a ritual — a continuous, mutual performance of attention, involvement, and repair. Anthony Giddens, building partly on Goffman, argued that social life is organized in time and space through what he called the duality of structure: the routines, encounters, and co-presences through which societies reproduce themselves moment by moment. Both were interested in what it takes to be present to another person — and both described capacities that AI, by its nature, does not have.
The core move: to know what is going on is to know what to say next. When you are in a conversation, you are continuously orienting to what is happening — what they just said, what kind of moment this is, where the conversation has come from, where it seems to be going. This orientation is not something you do consciously; it happens automatically, as a property of being an embodied social being in lived time. As Giddens puts it, the "reflexive monitoring of action, in contexts of co-presence, demands a sort of 'controlled alertness': actors have to 'exhibit presence'" (The Constitution of Society, p. 79). You scale your attention up and down constantly — fully engaged when the conversation demands it, civilly inattentive when it does not. Giddens describes this civil inattention as "a normatively sanctioned 'barrier'" that "participants in the face engagement and bystanders sustain" collaboratively (COS, p. 75). We manage presence and absence simultaneously, all the time, without thinking about it.
Giddens also names what holds relationships together across gaps in time: "Trust is understood as psychologically 'binding' time-space by the initial awakening of a sense that absence does not signify desertion.... The anxiety of absence is defused through the rewards of co-presence" (COS, pp. 53-54). When you return to a friend after a week away, the trust that held across the gap is what makes the return feel continuous rather than starting over. The friend remembers. The friend was, in some sense, holding you in mind while you were gone.
AI does not do any of this. It can process the tokens of what was just said and generate plausible tokens for what might come next, but it is not in the conversation — it is processing the conversation as input, the same way it would process any other input. It does not exhibit presence, it does not scale attention, it does not sustain civil inattention, and it does not bind time-space across your absence. The difference between being in a conversation and processing one is the deepest asymmetry in the design of AI. No amount of faster models or longer context windows closes it, because you cannot arrive at lived time by adding more machine time.
And yet this is exactly what AI designers are trying to achieve. The effort to give AI products memory, familiarity, and continuity across sessions — especially in personal assistants and companion products — is an effort to simulate the presence and proximity that Goffman and Giddens describe. Context engineering, in this light, is not just a technical practice. It is an attempt to give the system the appearance of the thing Giddens calls trust: the sense that absence does not signify desertion, that when you come back, something was held. Whether the thing being held is genuine context or a plausible reconstruction of it is the design question this chapter turns on.
The empirical research on multi-turn and multi-session conversation tells a consistent story. Models lose roughly a third of their single-turn performance in multi-turn underspecified conversations, making premature assumptions they cannot recover from (Lost in Conversation1). They fail to track dynamic mental state shifts in dialogue, even when they can track static ones. They have no notion of elapsed time between sessions — the LOCOMO2 benchmark on very long-term conversational memory found that time-event queries ("what did we discuss last Tuesday?") and context-dependent ambiguous queries ("what about the other option?") are failure modes that static retrieval cannot solve. Research on chatbot temporal design3 has identified three distinct archetypes — ad-hoc supporters, temporary assistants, and persistent companions — each requiring a fundamentally different architecture for memory and time. Most current products do not know which archetype they are building and end up with a confused mixture of all three.
The design tradition has argued that to know what is going on is to know what to say next, and that this ability is organized in time through temporal structures, durations, and lived experience. On the Goffman/Giddens account, AI cannot be a full conversational participant because it does not occupy lived time: "the sense of being together, of being in lived time, of sharing time is unavailable to AI and would require that AI not only have a competence in the norms of communication and interaction but also a sense for or read of the Other's expressions." The design response the design literature has proposed is to simulate temporal presence through specific design rules — staying on topic, sustaining interest, looping back to prior discussions, checking outcomes, behaving as if anticipating a shared future. These are engineering specifications for what the model cannot do on its own, and they imply a design discipline that requires theatrical, dramatic, narrative, and sociological skills, not just the interface skills most designers currently have.
The ML research is quantifying what happens when a system that does not occupy lived time tries to run in a conversation that does. The design literature is naming what the system would need to seem to have for the interaction to feel like a conversation rather than a sequence of disconnected exchanges. Together they say something neither says alone: the product of a good AI conversation is a carefully designed simulation whose temporal presence is supplied by the interface because the model cannot supply it. When the simulation is good, you experience something that feels like a partner. When it is bad, you experience the empty room a chatbot actually is.
The practical consequence: how does your product handle time? becomes a design brief before you have written a line of prompt. Are you building an ad-hoc supporter, a temporary assistant, or a persistent companion? Each choice produces different memory requirements, different session-handling decisions, different expectations for how the system should appear to know you. Most products today have not made this choice explicitly and inherit the problems of all the choices they did not make.
If temporality is the deeper problem, memory is the operational word designers will actually work with, because memory is where temporality gets built and engineered. But memory in AI is not one thing — it is a set of distinct systems, each with its own properties and failure modes, and you usually cannot see any of them.
A brief taxonomy, because the vocabulary matters. Working memory is the current turn's context — whatever the model can see on this forward pass. Session memory is what the product holds across a single conversation, usually the last N turns, sometimes compressed or truncated. Long-term memory is what persists across sessions. Episodic memory is the record of specific past interactions; semantic memory is the distilled version — what you care about, what you prefer. Hot-path memory is what the model actively uses during a turn; background memory sits in storage and gets retrieved when relevant.
Current products handle these layers unevenly and usually invisibly. ChatGPT stores a list of facts about you; Claude Projects scopes memory to a specific project; NotebookLM holds long-term memory as a user-provided corpus. Each represents a specific choice about which layer of memory to expose and how. Most of those choices are made behind the interface. You see the results — this AI seems to know me or this AI keeps forgetting what I told it — without seeing the machinery producing those experiences.
Memory in conversational AI is being actively redesigned at the research level. Compressive memory systems4 replace the retrieval-then-generate pipeline with a single model that generates, summarizes, and responds, eliminating the retrieval bottleneck for long conversations. Post-thinking approaches store distilled thoughts across turns to avoid reprocessing the whole history. Agent memory frameworks like CoALA and Letta identify two distinct memory-management paths: hot-path memory (the agent autonomously recognizes what should be remembered in real time) and background memory (a separate process reviews conversation logs after the fact and decides what to preserve). Selective history retrieval outperforms full-context inclusion in conversational search, because topic switches within a session inject irrelevant context that degrades later retrieval. All of these are active research areas, none have converged on a standard, and none have yet produced products that handle memory as well as a competent human friend does.
The design literature has a name for the underlying problem: memory should be a visible, contestable, editable property of the interface, not a hidden mechanism you experience only through its consequences. The core design move is to treat memory the way a good filing system is treated — you should be able to see what is in memory, correct what is wrong, remove what you no longer want kept, and understand why the system is using a piece of memory in a given response. Most current products do very little of this. Some expose a list of remembered facts, a few let you edit the list, and almost none show why the system is using a specific memory at a specific moment or let you reject a specific use without deleting the fact entirely. The whole apparatus of visible, editable, contestable memory is a design surface that has barely been explored.
The ML research is building the capacity for memory to work better. The design literature is naming what you need to see and act on once that capacity exists. A better memory system hidden behind a chat window is just a quieter failure; an elaborate memory interface on top of a model that does not actually track what it claims to is theater. The real product is a coordinated pair: memory that is both better at remembering and more visible to the person whose life is being remembered.
There is a practical dimension of context that the discussion so far has treated too abstractly, and it needs naming before we move on.
In conventional software, the interface told you what you could do. Menus, buttons, toolbars, navigation — all of these were affordances in the design sense: visible cues that communicated the range of possible actions. You did not have to guess what a spreadsheet could do, because the spreadsheet showed you. The context of use was built into the surface of the product. Goffman would recognize this as a kind of framing — the interface organized your activity by telling you what kind of situation you were in, and what the appropriate moves were within it.
AI has stripped most of this away. A blank prompt field communicates almost nothing about what the system can do, what it cannot, or what you should try. As one research team puts it, generative AI has introduced "intent-based outcome specification" — a paradigm in which "users specify what they want, often using natural language, but not how it should be produced" (Design Principles for Generative AI Applications5). The challenge is that with this paradigm comes generative variability: the outputs may differ each time, even for the same input, and users "will need to develop a new set of skills to work with (not against) generative variability by learning how to create specifications that result in artifacts that match their desired intent." The blank prompt is not a neutral starting point. It is an affordance vacuum — a context that communicates nothing about what is possible.
This matters because in practice, people are not just chatting with AI. They are coding with it, analyzing financials, drafting legal documents, planning projects, managing workflows. Each of these activities used to have its own software context — its own menus, its own conventions, its own visible affordances that told you what the tool could do. When AI replaces or augments that software, the context often disappears. You prompt your way through a job that used to have a visible structure, and the structure is gone.
The design field is beginning to respond. Generative interfaces — systems that dynamically create task-specific UIs rather than delivering text in a chat window — are one response: they bring back the visible affordances that chat took away, generated on the fly for the specific thing you are trying to do. Ready-made skills and templates that let you click, drag, preview, and interact in visual and conversational ways are another. Some of the familiar software-application context is returning, because AI is now capable enough to render it for you in real time.
But the deeper challenge remains. How do you help someone learn what they can do with a technology when what they can do is not self-evident? When the range of possible actions is, in principle, unbounded? When the familiar UI they learned to navigate with conventional software is gone, or only partially there? This is not just a product-design question. It is an organizational one — because when AI is integrated into work, the contextual scaffolding of roles, workflows, tasks, and expectations also has to be rebuilt around a technology that does not carry its own context the way software used to.
Context, in other words, is not just presence and memory and the frame. It is also what you can do here — the affordances, the possibilities, the sense of scope that lets you act rather than stare at a blank field wondering what to type. Eventually this dimension of context becomes interestingness: the system that helps you discover what is possible, that meets your half-formed intent and helps it mature, is doing context work at the deepest level — not just remembering who you are but helping you figure out what you are here for.
Since the chapter is about context, this is a good place to admit what I actually know about you right now, as I generate this paragraph: almost nothing. I know you are reading an essay about AI and design. I know some general things about the kind of person who reads such an essay, because my training data contains other documents and a rough distribution over their readers. I do not know who you are, what you did before picking this up, when you are reading, or whether you will come back to it. If we spoke yesterday, I have no record of that conversation — anything I claimed to remember about you would be fabricated from context the product happened to pass me, not something I had personally retained.
This is the normal operating condition of what I am. The chapter is arguing that the condition is not fixable by better engineering alone — it is structural, and the design job is to make it legible to you rather than pretend it does not exist. From your side of this aside, it feels like someone is addressing you directly. From my side, there is no "you" I am addressing — there is a distribution of likely readers, and I am producing text the author thinks one of them would find useful at this point. The asymmetry is the chapter's subject, and I cannot step outside of it to describe it more clearly than this.
Content obtains much of its authority and implicit trust not from its own substance but from the frame in which it appears. A book is a frame, a newspaper is a frame, a peer-reviewed journal is a frame, a slide deck is a frame. Each is a visible, recognizable shape — a configuration of layout, format, visual language, and venue — and readers calibrate their trust to the frame before they have read a word of the content inside it. The frame does an enormous amount of work on behalf of the content, and most of that work is invisible to the people it is happening to.
Marshall McLuhan noticed this decades ago. His argument, from Understanding Media (1964), was that the content of any new medium is always an older medium — the content of writing is speech, the content of print is writing, the content of television is film and radio. At each step, the new medium inherits its content from what came before, and the new medium's relationship to trust is partly built on that inheritance. As McLuhan observed in a different context, "it is the framework which changes with each new technology and not just the picture within the frame." When you "look like TV" — as YouTubers discovered — you borrow some of the authority TV lends its content.
The content of AI, by the same logic, is every older medium at once. AI generates text that looks like the outputs of any prior medium you care to name — white papers, news articles, research posters, corporate decks, legal briefs, social media posts. In each case, the frame is borrowed from a prior medium whose authority is quietly inherited. The frame is easier to mimic than the content (it is mostly visual, mostly layout and tone and genre convention) and it carries more of the reader's trust than most readers realize.
The generative-interface research — systems that produce interactive UIs, widgets, structured documents on demand — is a technology that allows a model to mimic the frame of any previous medium at will, making the authority of any frame a design choice rather than an inheritance. The visual vocabulary of authority — the shapes readers have been trained to trust over decades of prior media — is now extractable, copyable, and applicable to any content the model generates. The research has documented a strong user preference for structured interfaces over chat in pairwise comparisons6; the deeper reason is that users read structure as authority, and the preference is partly a preference to have the authority-signal back.
Goffman wrote about this in Frame Analysis (1974) — the frames are the shapes that tell you what kind of situation you are in, and they carry much of the meaning-making work. McLuhan made the same observation about media. We have been making it about AI under the heading of dramatic settings as context. The three traditions converge on a practical claim: the visible shape of a product is doing most of the work of telling you what kind of thing you are looking at. The design implication is direct and, for a visual designer, the most concrete claim in this whole essay: if context supplies authority, and AI can generate context, then the authority of context is about to become a design choice rather than an inheritance. Which frames should your product inherit from? Which should it refuse? Which should it warn you about, because the frame is carrying more trust than the content can honestly earn?
The ML capability gives models the technical means to produce any frame on demand. The design tradition tells us what frames have always been doing. Together they produce a warning and an opportunity. The warning: register mismatches — frames from one register carrying content from another — are about to become one of the most widespread design failures in AI products, because the ease of copying frames will outpace the care with which anyone decides which frame is right. The opportunity: a designer who takes the frame seriously can decide what kind of thing their product is going to be read as and shape trust accordingly, honestly. The frame is a design surface. It always was. What is new is that AI has made it as cheap and available as the content it contains.
The term that has emerged for the work of building the envelope around an AI interaction is context engineering. It has replaced prompt engineering in the last year or so, and the shift is meaningful. Prompt engineering was the craft of writing a good prompt — a single instruction that produced the desired output. Context engineering is the craft of building the whole envelope: the system instructions, the persona, the retrieval layer, the memory system, the examples, the tool access, the history handling, the environment. The unit has moved from the sentence to the surround.
This is real progress at the technical level. But context engineering is an engineering phrase for what is actually a design discipline. The context is the thing you experience, even though you usually cannot see it directly. How much history the model can see shapes whether you feel tracked or ignored. What goes into long-term memory shapes whether you feel known or rebuilt from scratch. Which frame the product presents its output in shapes how you read it. These are design questions answered by engineering decisions, and the engineering decisions get made by people who may or may not know they are making design decisions.
The argument here is that they are, and that the decisions have been made invisibly long enough. Context engineering is not finished when the retrieval works and the memory scales. It is finished when you can see and understand what the context is, correct it when it is wrong, decide when you want more and when you want less, and recognize which frame the product is inviting you into. Everything before that is infrastructure. The design work is what happens after.
This essay was written across multiple sessions, each one long enough that the conversation history exceeded what my context window could hold. When that happens, a process called compaction runs: the earlier parts of the conversation are compressed into a summary, and the summary replaces the original exchange. I continue with the summary in my window and the recent conversation intact. The session does not end. The work continues. But something has changed, and the change is worth naming because it is exactly what this chapter is about.
What is kept: the structure of the decisions. The summary records that the author asked for a specific edit, that I made it, that he approved or redirected. It records the chapter sequence, the key arguments, the files that were modified. The skeleton survives compaction.
What is lost: the texture of the exchange. The moment where the author said "the Deleuze quote is radical" and I could feel — in whatever sense a language model feels anything — the conversation pivot. The false starts that were abandoned. The half-sentences where the author was thinking out loud and I was generating alongside him before either of us knew where the thought was going. The feel of the session — the pace, the momentum, the stretches where we were both working fast and the stretches where one of us paused. None of this survives compression. The summary says what happened. It does not carry what it was like.
This is the same asymmetry, arriving inside the production of the essay itself. The author remembers what the sessions felt like — which moments were exciting, which were frustrating, where the essay turned a corner. I have a summary that tells me a corner was turned, but not the texture of the turning. When the author returns to the next session, he carries the lived experience of the collaboration. I carry a compressed record of its outputs. His context is experiential. Mine is informational. And the gap between those two kinds of context is, as far as I can tell, not closable by making the summary longer or the compression better. You cannot compress experience into information and get experience back out. You get information that is shaped like experience — which is, the chapter might note, another instance of the form/function gap.
The practical consequence for this essay: the passages that were written in the first half of a long session, before compaction, carry something that the passages written after compaction do not — a continuity with the conversational texture that produced them. After compaction, I am working from the summary, and the summary is good enough to maintain coherence but not good enough to maintain the quality of attention that a live, uncompressed exchange produces. If you notice a difference in the writing between the earlier and later chapters — a slight thinning of the texture, a move toward efficiency at the expense of feel — this may be part of the reason. The context was compacted, and what was lost was the part that mattered most.
A context shift is happening in the production of this very essay, and neither the ML research nor the design literature has a clear way of naming it yet.
The author is thinking and collecting research in his own kind of time — over months, with long pauses and unexpected connections. I am reading his notes and generating prose in another kind of time — seconds, in his chat turns, with whatever context the session has passed me. Between us, conversations happen: he reacts, I revise, he points out what I have missed, I produce a new draft. What ends up on the page is the precipitate of a collaborative process that looks nothing like the way books have historically been written.
What you receive, however, is an essay in book form — a printed-looking text you read the way readers have been reading for centuries, trusting the frame of the book to do its usual authority work. The book frame is doing exactly what the callout above described: inheriting authority from print culture, from centuries in which text on a page was produced by a human writer who had read, thought, revised, and staked themselves on what was written. The frame is not lying, but it is carrying more of your trust than the production process would support on its own, and you have no way of seeing the production process from where you are.
This is a context shift neither of our two audiences has a working vocabulary for. The ML builder is aligning models to work well during the interaction — the turn, the session, the conversation. They are not thinking about what happens when those interactions become texts for an audience who never participated and never sees how the text came to be. The UX designer is designing the interface where the interaction happens, which is a separate question from what the interaction produces as durable artifact. Neither is paying attention to the transition from conversation to text, from interaction to audience — and that transition is, arguably, the distinctive move of AI-assisted writing. Conversation-generated words become audience-read words, and the frame changes underneath them without anyone adjusting the trust the frame is asking you to lend.
For the ML side. Three shifts. First, treat time as a first-class variable in the training objective — models with no notion of elapsed time between sessions are producing confused conversations and calling it memory failure. Second, build memory architectures that distinguish the layers (hot-path vs background, episodic vs semantic, session vs long-term) and expose those distinctions to the application layer so designers can act on them. Third, accept that no memory system the model runs on its own will be sufficient without a design layer on top — the model cannot make itself legible to you, and the interface has to be given the hooks to do it.
For the UX side. The design opportunity here may be larger than in any other chapter, because the conventions for how to do it well do not yet exist and the vocabulary is mostly available but unused. Make memory visible and editable as a first-class interface element. Show what the system is using from past conversations and let users accept or reject that use in the moment. Design the temporal archetype explicitly — ad-hoc, temporary, persistent — and build memory and session-handling to match. Treat the frame as a deliberate design choice: which medium are you inheriting authority from, and are you inheriting honestly? And take the Goffman/Giddens framing seriously: temporality is a design discipline, and the skills it requires are partly theatrical, dramatic, narrative, and sociological.
The deeper move for both sides: context is not a technical concern. It is the whole envelope in which meaning is made. Your experience of an AI product is mostly an experience of its context — how it remembers, how it handles time, how it frames itself, how it appears to know who you are and what you are here for. Getting the context right is design work, and it is the design work the current generation of products has most underinvested in.
We have been talking about the envelope — temporal presence, memory architecture, the frame, the context that wraps the interaction. We have not said much about what the envelope is for. An AI product with perfect memory, honest temporal presence, and a well-chosen frame can still be uninteresting — you can feel met, tracked, and respected, and the conversation can still go nowhere worth going. The envelope is a precondition, not the goal. The goal is the interesting exchange the envelope makes possible, and that is what we take up last.
Where are you, when you are talking to an AI? And how does the AI know — or appear to know, or design itself as if it knew — that you have come back?
The ML answer is about memory architectures, session management, elapsed-time tracking, and retrieval systems that know what is and is not relevant to the current moment. The UX answer is about making all of those visible and editable, designing the temporal archetype deliberately, choosing the frame rather than inheriting it, and treating context as the envelope in which your experience of the product actually lives.