AI Context Window Explained: Why Your AI Forgets Mid-Conversation
You're deep into a conversation with ChatGPT. An hour in, you reference something you explained at the start. The AI responds like it never heard you. You scroll up—the message is right there. But the AI can't see it anymore.
This is the context window at work. Understanding it is the first step to actually solving AI memory problems.
What Is a Context Window?
The context window is the AI's working memory. It's the total amount of text the model can "see" when generating a response—your messages, its previous responses, any system instructions, and uploaded files.
Think of it like a physical window looking at a document. The AI can only read what's visible through that window. Everything outside the window doesn't exist for purposes of generating the next response.
Context Window Sizes by Model
Different AI models have different context window sizes. Here's the current landscape:
| Model | Context Window | Approximate Words |
|---|---|---|
| GPT-4o | 128K tokens | ~96,000 words |
| GPT-4 Turbo | 128K tokens | ~96,000 words |
| Claude 3.5 Sonnet | 200K tokens | ~150,000 words |
| Claude 3 Opus | 200K tokens | ~150,000 words |
| Gemini 1.5 Pro | 1M tokens | ~750,000 words |
These numbers look impressive until you use them in practice. A detailed back-and-forth about a complex project can consume 10,000+ tokens per exchange. Upload a few documents and you're already at 50,000.
What "Tokens" Actually Means
AI models don't process words—they process tokens. A token is roughly 3/4 of a word on average, but it varies:
- Common words like "the" or "and" are single tokens
- Longer words get split into multiple tokens
- Technical terms and proper nouns often use more tokens
- Code typically requires more tokens than prose
- Non-English languages often use more tokens per word
A 128K token limit isn't 128,000 words. It's closer to 96,000 words for typical English text. Include code, technical content, or non-English text and that number drops further.
What Happens When You Exceed the Window
When your conversation exceeds the context window, the AI doesn't crash or warn you. It simply stops seeing the oldest content.
In ChatGPT, this happens silently. Your messages from earlier in the conversation still appear in the interface—you can scroll up and read them—but the AI literally cannot access that text when generating responses.
Why Bigger Windows Don't Solve Everything
Gemini offers a million-token context window. Problem solved, right?
Not quite. Three issues remain:
1. Attention Dilution
The more text in the context window, the harder it is for the AI to focus on what matters. Studies show that AI models struggle to use information in the middle of very long contexts—a phenomenon called "lost in the middle."
A smaller context window with relevant information often outperforms a massive window with everything you've ever said.
2. Cost and Speed
Processing larger context windows costs more (for API users) and takes longer (for everyone). Every token in the window gets processed for every response. That adds up.
3. Cross-Conversation Context
Even a million-token window only covers the current conversation. Start a new chat and you're back to zero. The real memory problem isn't within conversations—it's across them.
Strategies for Working With Context Limits
Within a Single Conversation
- Front-load important context. Put critical information early in the conversation where it's less likely to fall out of the window.
- Summarize periodically. Ask the AI to summarize what it knows so far, then start a new conversation with that summary.
- Stay on topic. Tangents consume tokens. Keep conversations focused to preserve space for what matters.
- Use structured formats. Bullet points and clear headers help the AI find relevant information faster.
Across Conversations
This is where the real solution lives: external context that persists regardless of conversation length.
Custom instructions and ChatGPT's memory feature are attempts at this, but they're severely limited. Real persistent memory requires context files—documents the AI reads at the start of every conversation.
Claude Code supports this through CLAUDE.md files. You write markdown documents with your context, and Claude reads them automatically. Combined with a knowledge base system like Obsidian, you can give AI access to unlimited external memory.
Your context window becomes a viewport into a larger system, not the entire system itself.
The Architecture Shift
Most people treat AI chat as the complete experience. Type a message, get a response. The conversation is the whole thing.
That architecture has a ceiling. No matter how big the context window gets, you're still constrained by what fits in a single conversation.
The shift is treating the chat as an interface to something larger. Your knowledge base. Your operational context. Your business memory. The conversation accesses this information rather than containing it.
This is how you move from AI that forgets everything to AI that knows everything relevant about your work.
Ready to Build AI Memory That Persists?
I've built a system using Claude Code and Obsidian that gives AI access to my entire knowledge base. No context window limits. No re-explaining. Every conversation starts with full context.
Get the Setup ($997)What to Do Next
Understanding context windows is foundational. Now you know why AI forgets—it's not a bug, it's architecture.
The question is what you do with that understanding. You can optimize within the constraints: better prompts, periodic summaries, focused conversations. These help.
Or you can build around the constraints: persistent context files, external knowledge bases, AI that reads your documentation before every response. This transforms how you use AI entirely.
Both approaches are valid. One keeps you managing limits. The other eliminates them.