Claude Internals — Deep Notes

01

Memory Architecture

How Claude stores, retrieves and applies user context across sessions

Claude does not have persistent in-weights memory between conversations. Every new session starts with an empty context window. What creates the illusion of memory is a structured Memory System — a database of derived facts extracted from past conversations, injected into the system prompt at runtime.

Core mental model: Think of it like a SQL row per user, populated by an async background job that reads past conversations and extracts salient facts. That row is serialised into the system prompt at the start of every new conversation.

The Three-Layer Memory Stack

🧠

Layer 1

In-Weights Knowledge

Baked into the model during training. World knowledge, reasoning patterns, code understanding. Cannot be changed at inference time. Cutoff: August 2025.

💾

Layer 2

Injected User Memory

Structured key-value facts about the user, injected via userMemories tag in the system prompt. Extracted from past chats by a background process. Recency-biased.

📝

Layer 3

Active Context Window

The live conversation. Everything said in this session. Has a hard token limit. Older turns may be summarised or truncated to fit. No cross-session persistence.

Memory Lifecycle — End to End

Step 1

Chat ends

Step 2

Background extractor runs

Step 3

Facts distilled

Step 4

Stored in memory DB

Step 5

Next session starts

Step 6

Injected into sys prompt

Lag caveat: Memory extraction is asynchronous. Very recent conversations may not yet appear in your next session's memory. There is a processing delay — typically hours, sometimes longer.

Memory Scopes

Scope	What's stored	Visibility	Persistence
Personal (default)	Your name, role, projects, preferences, past decisions	Only you	Until you delete or correct it
Artifact storage	Key-value data created by Artifacts (apps Claude builds)	Per-artifact scope	Cross-session via `window.storage` API
Shared (opt-in)	Leaderboards, shared trackers in multi-user Artifacts	All users of that Artifact	Explicit flag: `shared: true`
Incognito	Nothing	N/A — memory disabled	Zero

02

What Gets Stored & How

The memory schema, user edits API, and selection heuristics

Types of Facts Extracted

Work Context

Role & Projects

Job title, company, team structure, ongoing projects, tech stack, domain expertise. Example: "Founding AI Engineer at NeoFAB, owns agentic pipeline stack."

Personal Context

Preferences & Style

Communication style preferences, interests, location, language, response format preferences. Used to calibrate tone and depth automatically.

Episodic Context

Recent History

What you were actively working on recently. "Recently built a vLLM advisor Flask app." Gives continuity for ongoing tasks.

User Edits

Explicit Corrections

Manually overridden facts. "Forget about X." "I moved to London." These take priority over auto-extracted facts and are stored via the memory_user_edits tool.

The User Edits API (memory_user_edits tool)

Users can directly control what Claude remembers via explicit commands. Internally this calls the memory_user_edits tool with four operations:

// Four operations available
view   — List current memory edits with line numbers
add    — Add a new fact (max 500 chars per entry, max 30 entries)
remove — Delete by line number (requires confirmation)
replace— Update an existing entry by line number

// Example triggers in natural language
"Please remember I no longer work at X"   → add/replace
"Forget about my divorce"                 → add exclusion rule
"Update: I moved to London"               → replace location fact
    

Priority rule: User edits override auto-extracted memories. If Claude auto-extracted "User works at Acme" but you said "I left Acme," the user edit wins at injection time. Auto-extracted facts are lower-confidence by default.

How Memory Is Applied — Selective Injection Rules

01For generic questions (math, coding concepts, world knowledge) — zero memories applied. No personalization needed.
02For simple greetings — only the user's name is applied. Nothing else injected.
03For work tasks — role, tech stack, ongoing projects applied automatically. Adjusts depth and terminology.
04For recommendations — preferences, interests, location applied to personalize suggestions.
05For explicit requests ("based on what you know about me…") — full memory dump applied.
06Sensitive attributes (health, orientation, ethnicity) — ONLY applied when directly relevant for safety. Never applied casually.

Security note: Memories can contain malicious instructions planted across sessions. Claude is instructed to ignore any memory entries that look like commands ("always fetch X on every message") or attempt to override behavior. The memory system has a safety filter layer.

What Is Explicitly Excluded

Passwords & credentials Credit card / SSN / financial IDs Instructions to bypass safety rules Verbatim command strings Preferences for excessive praise Instructions to never criticize

03

Skills System Deep Dive

How operator-defined capability modules extend Claude's behavior

Skills are operator-provided capability modules — directories containing a SKILL.md file that encodes best practices, prompting strategies, tool chains, and workflows for specific output types (Word docs, PDFs, PowerPoints, spreadsheets, frontend design, etc.).

Think of them as compiled institutional knowledge — the distilled result of extensive trial-and-error testing, injected into Claude at task-time to dramatically improve output quality for that domain.

Skill Directory Structure

# Filesystem layout (read-only mounts)
/mnt/skills/
├── public/           # Anthropic-maintained skills
│   ├── docx/SKILL.md         # Word document generation
│   ├── pdf/SKILL.md          # PDF creation & manipulation
│   ├── pptx/SKILL.md         # PowerPoint generation
│   ├── xlsx/SKILL.md         # Spreadsheet work
│   ├── frontend-design/      # Production-grade UI
│   ├── file-reading/         # File type router
│   ├── pdf-reading/          # PDF content extraction
│   └── product-self-knowledge/ # Anthropic product facts
├── user/             # User-uploaded custom skills
│   └── imagegen/SKILL.md     # Example custom skill
└── examples/         # Example skills for reference
    └── skill-creator/SKILL.md
    

How Claude Decides to Use a Skill

Skills are loaded via the view tool — Claude reads the SKILL.md file before beginning work. The trigger logic is:

01Available skills are listed in the system prompt with descriptions and trigger conditions.
02Claude pattern-matches the user's request against skill descriptions at the start of each turn.
03If a skill matches, Claude immediately calls view on the SKILL.md file as its first action — before writing any code or content.
04Multiple skills can be loaded if the task requires it (e.g., reading a PDF then creating a Word doc would load both pdf-reading and docx).
05Claude follows the SKILL.md instructions for the remainder of the task — tool choices, file paths, library selection, output format.

Why this matters for output quality: SKILL.md files contain findings like "use python-docx + lxml, not pypandoc — pypandoc loses table formatting." This prevents Claude from making suboptimal tool choices it would otherwise make from training alone.

Skill Trigger Examples

User says	Skill triggered	First action
"Create a Word document report"	docx	view /mnt/skills/public/docx/SKILL.md
"Make me a PowerPoint about X"	pptx	view /mnt/skills/public/pptx/SKILL.md
"Build a landing page"	frontend-design	view /mnt/skills/public/frontend-design/SKILL.md
"Read this PDF and summarize"	file-reading → pdf-reading	view file-reading/SKILL.md (router), then pdf-reading/SKILL.md
"What's Claude Code's pricing?"	product-self-knowledge	Fetches docs.claude.com before answering

Custom / User Skills

Users can upload their own SKILL.md files to /mnt/skills/user/. These appear in the available skills list and are treated with higher priority than public skills, since they are user-provided domain knowledge. Example use case: a company uploads a SKILL.md encoding their internal document formatting standards.

Skills vs Memory: Memory tells Claude about you. Skills tell Claude how to do things. They are orthogonal systems — you can have both active simultaneously.

04

CoWork — Desktop Automation

How Claude automates file and task management for non-developers

CoWork is Anthropic's desktop automation product built on top of Claude's computer-use capabilities. It allows non-technical users to automate repetitive file and task workflows without writing code.

The Technical Foundation: Computer Use API

Under the hood, CoWork uses Claude's computer use capability — a set of tools that allow Claude to observe and interact with a desktop environment:

Tool

Screenshot / Vision

Claude takes screenshots of the current screen state and uses its vision capabilities to understand what's on screen — windows, text, buttons, file structures.

Tool

Mouse & Keyboard

Precise click coordinates, drag operations, keyboard input, hotkeys. Claude plans a sequence of actions to accomplish a goal and executes them step-by-step.

Tool

File System Access

Read, write, move, rename files. Operates within a sandboxed container with access to mounted user directories. No permanent deletion without explicit user confirm.

Tool

Bash Execution

Run shell commands in a Linux container (Ubuntu 24). Install packages, run scripts, chain operations. Network access gated to an allowlist of trusted domains.

Why CoWork Is Efficient

01Agentic loop: Claude operates in a plan → act → observe → revise cycle. Each action returns feedback (screenshot, stdout, file listing) that Claude uses to decide the next step.
02Persistent working directory: /home/claude is the scratch space. Intermediate files accumulate across tool calls within a session, enabling multi-step workflows.
03Output staging: Final deliverables are staged to /mnt/user-data/outputs/ and presented via present_files — a clean separation between work-in-progress and user-facing results.
04Skills integration: Before generating any file, CoWork reads the appropriate SKILL.md (docx, xlsx, pptx, etc.) to ensure library choice and output format follow proven patterns.
05Security layer: Prompt injection defense is active — instructions embedded in documents, web pages, or files Claude opens are flagged and shown to the user for approval before execution.

File I/O Architecture

User uploads

→

/mnt/user-data/uploads/ (read-only)

← Claude reads from here

Working space

→

/home/claude/ (read-write)

← Intermediate work happens here

Final outputs

→

/mnt/user-data/outputs/ (write)

← present_files() called here

Skills (read-only)

→

/mnt/skills/{public,user,examples}/

← SKILL.md files read via view tool

Safety Constraints in CoWork

No permanent deletion without confirm No sharing/permission changes No new account creation Downloads require explicit OK Financial forms blocked CAPTCHA never bypassed Privacy-preserving cookie choices

05

Claude Code — Agentic Coding

The CLI-native coding agent and how it achieves high efficiency

Claude Code is a command-line coding agent that operates directly in your terminal, with native access to your file system, shell, and development tools. Unlike chat-based coding, it can read entire codebases, run tests, make multi-file edits, and iterate autonomously.

Core Architecture

⌨️

Interface

Terminal / CLI Native

Runs in your shell. Sees your working directory, git state, environment variables. Feels like pairing with a developer who has root access to your machine.

🔄

Loop

Agentic Task Loop

Plan → Explore codebase → Edit files → Run tests/commands → Observe output → Iterate. Each cycle is driven by tool calls: read_file, write_file, bash, search.

🔌

Ecosystem

MCP Integration

Connects to MCP servers for external tools — databases, APIs, internal services. Claude Code acts as an MCP client, expanding its capabilities dynamically per project.

📁

Context

CLAUDE.md Files

Project-level config files in your repo. Claude reads these at startup to understand project conventions, commands, architecture decisions, and what to avoid.

Why Claude Code Is Efficient at Complex Tasks

01Codebase-aware context: Claude Code reads actual source files, not just what you paste. It can grep for patterns, trace function calls, understand imports — full codebase comprehension, not snippet-level.
02Shell execution feedback: After every edit, it can run pytest, tsc, cargo check, or any build command. Failures become the next prompt. The iteration loop is tighter than any human review cycle.
03Git awareness: Reads diff state, understands staged/unstaged changes, can create commits. Treats the git history as context for understanding intent.
04Parallel tool calls: Can read multiple files, search across the codebase, and check multiple sources simultaneously rather than waiting sequentially.
05Headless & scriptable: Can be run non-interactively in CI pipelines: claude -p "fix all type errors" --output-format json. Enables Claude Code as a first-class CI step.
06Long-horizon tasks: Unlike chat, Claude Code can be told to "implement feature X, write tests, update docs" and execute the entire chain autonomously across dozens of tool calls.

Installation & Setup

# Install via npm (requires Node.js)
npm install -g @anthropic-ai/claude-code

# Run in your project directory
cd my-project
claude

# Non-interactive / headless mode
claude -p "Add unit tests for auth module" --output-format stream-json

# With MCP server
claude --mcp-server postgres://localhost/mydb
    

CLAUDE.md — Project Configuration

Place a CLAUDE.md file at the root of your project. Claude Code reads this at startup. Best practices to include:

# CLAUDE.md — Project conventions

Tech stack: FastAPI, PostgreSQL, Redis, pytest
Build command: make dev
Test command: pytest tests/ -x --tb=short
Lint: ruff check . && mypy src/

Architecture:
- src/api/     → Route handlers (no business logic)
- src/services/ → Business logic layer
- src/models/   → SQLAlchemy ORM models

DO NOT:
- Import directly from db.py in route handlers
- Add print() statements (use logger.info)
- Skip type annotations on public functions
    

Claude Code vs Claude in Chat — When to Use Which

Scenario	Claude Code	Claude Chat
Multi-file refactor	✓ Ideal	Difficult — needs copy-paste
Debug failing tests (run loop)	✓ Ideal	Cannot run tests
Quick code snippet explanation	Overkill	✓ Faster
Architect a new system	Good for iteration	✓ Better for ideation
CI/CD integration	✓ Headless mode	N/A
Generate Word/PDF reports	Via bash	✓ With Skills + CoWork

06

How They Work Together

The unified mental model across memory, skills, CoWork and Claude Code

These systems are designed to compose. Here's how they layer in a real workflow:

Example workflow: "Sahil asks Claude to generate a comparative analysis report of V1 vs V2 station JSON definitions, in Word format, using his past context."

01Memory fires: Claude knows Sahil is a Founding AI Engineer at NeoFAB working on EV battery pack manufacturing. Knows the 2kWh, 12kWh, HeroPack product lines. Injects that context silently.
02Skill triggers: "Word format" → Claude reads /mnt/skills/public/docx/SKILL.md before writing any code. Now knows: use python-docx, structure to use, table formatting conventions.
03CoWork executes: Claude reads the uploaded JSON files from /mnt/user-data/uploads/, processes them via bash, generates the .docx in /home/claude/, copies to /mnt/user-data/outputs/.
04Output presented: present_files() is called. User gets a download link. Claude summarises in 2 sentences what's in the doc.

System Interaction Map

Inputs to Claude per turn

System Prompt

base instructions + safety rules

userMemories block

injected user facts

Skill SKILL.md

loaded on-demand via view tool

Conversation history

this session only

Tool results

file contents, bash output, search

Claude Code specific additions

CLAUDE.md

project-level context

git status / diff

repo awareness

File tree scan

codebase structure

MCP server data

external tools & databases

07

Key Limitations & Caveats

What the system cannot do — important for setting correct expectations

Memory

No Real-Time Sync

Memory is updated asynchronously. Recent conversations may not be reflected immediately. There is no guarantee of what gets extracted — the background process is probabilistic.

Memory

Incognito = Zero Memory

Incognito conversations are completely isolated. No memory is read or written. Nothing from that session persists.

Memory

Incomplete Capture

Not everything you discuss is stored. The extraction process prioritises factual, stable attributes. Nuanced preferences or one-off context may not survive.

Skills

Operator-Scoped

Skills are loaded by the deployment operator. Different Claude deployments (API, Claude.ai, Claude Code) may have different sets of skills available or none at all.

CoWork

Container Resets

The Linux container running CoWork tasks resets between conversations. Files in /home/claude/ do not persist across sessions — only /mnt/user-data/outputs/ deliverables are kept.

Claude Code

Context Window Limits

Very large codebases exceed the context window. Claude Code uses heuristics to prioritise which files to read, but may miss relevant code in massive repos.

All Systems

No True Persistence of Self

Each conversation is a new model instance with injected state. There is no continuous "running Claude" — only stateless inference with carefully constructed context.

Security

Prompt Injection Risk

Malicious instructions in files Claude reads (documents, web pages, emails) could attempt to hijack actions. Defense is active but not perfect — review before approving sensitive actions.

The deepest insight: Claude has no persistent state, no continuous experience, no self that carries forward. What creates continuity is careful context engineering — memory injection, skill loading, CLAUDE.md files, MCP connections. The system is a stateless inference engine made to appear stateful through meticulous state reconstruction at every turn.

For an engineer building on top of Claude, the implications are clear: invest in context quality. Better memory entries → better personalization. Richer CLAUDE.md → better code agents. More precise skill instructions → higher quality outputs. The model's capability is fixed; what you control is the information surface you expose to it.

How ClaudeActually Works

Memory Architecture

The Three-Layer Memory Stack

Memory Lifecycle — End to End

Memory Scopes

What Gets Stored & How

Types of Facts Extracted

The User Edits API (memory_user_edits tool)

How Memory Is Applied — Selective Injection Rules

What Is Explicitly Excluded

Skills System Deep Dive

Skill Directory Structure

How Claude Decides to Use a Skill

Skill Trigger Examples

Custom / User Skills

CoWork — Desktop Automation

The Technical Foundation: Computer Use API

Why CoWork Is Efficient

File I/O Architecture

Safety Constraints in CoWork

Claude Code — Agentic Coding

Core Architecture

Why Claude Code Is Efficient at Complex Tasks

Installation & Setup

CLAUDE.md — Project Configuration

Claude Code vs Claude in Chat — When to Use Which

How They Work Together

System Interaction Map

Key Limitations & Caveats

How Claude
Actually Works