Architecture & Runtime

This chapter explains how Codex completes a task from an engineering perspective: where instructions come from, how tools are invoked, and how sandbox and approvals intervene. After reading it, you should understand why AGENTS.md should stay short and why Skills do not flood context on startup.


1. Macro flow of a task

Codex CLI follows a typical ReAct-style agent loop (Reason + Act):

flowchart LR
  A[User prompt] --> B[Load instruction chain]
  B --> C[Model plans]
  C --> D{Need tools?}
  D -->|Yes| E[Tool call]
  E --> F[Sandbox / approval check]
  F --> G[Execute and observe]
  G --> C
  D -->|No| H[Summary / patch output]

Key points:

  1. At session start, Codex builds an instruction chain (global AGENTS.md + path-scoped AGENTS.md files)
  2. The model picks tools: read files, apply patches, run Shell, call MCP, enable Skills, etc.
  3. Shell and out-of-scope writes go through sandbox_mode + approval_policy
  4. The loop continues until the task finishes, the user stops it, or limits are hit

Unlike one-shot chat completion, state continues through tool feedback, not a single turn.


2. Instruction chain

At the start of each run (typically each new TUI session), Codex resolves instructions in this order:

  1. Global: ~/.codex/AGENTS.override.md or ~/.codex/AGENTS.md (first non-empty wins)
  2. Project: From Git root down to the current working directory, at most one file per directory:
    • AGENTS.override.mdAGENTS.md → names in project_doc_fallback_filenames
  3. Merge: Concatenate root → leaf; closer to CWD = higher priority (appears later in the prompt)

Default cap: ~32 KiB merged (project_doc_max_bytes); overflow is truncated.

Design implications

ObservationReason
Subdirectory rules override rootLater concatenation = higher priority
Keep AGENTS.md conciseShares context budget with system prompt and Skill metadata
Put rules in the nearest relevant directoryLess noise, better hit rate

Live system/developer/user instructions beat AGENTS.md; AGENTS.md beats the model’s default coding habits.


3. Skills: progressive disclosure

Skills follow the Agent Skills open standard (SKILL.md + YAML frontmatter). Codex uses progressive disclosure:

Discovery          Selection            Execution
─────────          ─────────            ─────────
name/description → full SKILL.md body → references / scripts/
(resident metadata) (only when selected) (only when needed)

Benefits:

  • Many Skills in a repo without loading all bodies at startup
  • Explicit invocation via $skill-name and implicit invocation when the task matches
  • Clear description text with trigger scenarios improves selection accuracy

Plugins package Skills for distribution and may bundle MCP config and App metadata (agents/openai.yaml).


4. Where MCP sits

MCP (Model Context Protocol) connects Codex to systems outside the repo:

┌──────────┐    MCP Client     ┌─────────────┐
│  Codex   │ ◄──────────────► │ MCP Server  │
│  Agent   │   tools/resources│ (GitHub…)   │
└──────────┘                   └─────────────┘
  • Host: Codex
  • Server: External service (tools, resources, optional prompt templates)
  • Configured in ~/.codex/config.toml under [mcp_servers] (project .codex/config.toml after trusting the project)

Skills vs. MCP:

  • Skill = workflow (order of steps, which tool names to use)
  • MCP = capability (Issues, browser, internal doc APIs)

5. Subagents: parallel and specialized

Complex work can be delegated to subagents (e.g. test-only config, production logs via MCP). Each subagent:

  • Has a narrower tool set and prompt
  • Can explore in parallel, reducing main-thread context bloat

Typical uses: code review (independent agent reads diff without implementation bias), large-repo exploration.


6. Sandbox and approvals: two gates

This is a major difference from full-permission Shell agents.

ConceptQuestion it answers
sandbox_modeWhat read/write is technically allowed (e.g. read-only, workspace-write)
approval_policyMust the agent stop and ask when crossing boundaries (untrusted / on-request / never)

Recommended local default:

codex --sandbox workspace-write --ask-for-approval on-request

Or in config.toml:

sandbox_mode = "workspace-write"
approval_policy = "on-request"

In short:

  • Inside the workspace: read, edit, and run most commands
  • Writes outside workspace, network access, “untrusted” commands: trigger approval
  • --yolo (dangerously-bypass-approvals-and-sandbox): only in disposable VMs

Use /permissions in the TUI for temporary mode changes (e.g. read-only planning).


7. Models and reasoning

The CLI supports /model to switch models (GPT-5.x, Codex-tuned variants, etc.—depends on your account). Trade-offs include:

  • Instruction following
  • Multi-file refactor stability
  • Latency and cost

Use stronger reasoning models for long refactors; faster models for simple script generation.


8. Cloud and IDE relationship

SurfaceExecutionTypical use
CLI / App LocalLocal sandboxFast iteration, sensitive on-prem code
IDE extensionLocal + editor contextSmall incremental edits
CloudRemote container + GitHubAsync PRs, @codex on comments

AGENTS.md, Skills, and MCP can move across surfaces—see Migrate to Codex.


9. Mental model

Think of Codex as a contract engineer:

  • AGENTS.md = handbook + “how to build/test” from the README
  • Skills = standard operating procedures (SOPs)
  • MCP = accounts on internal systems
  • Sandbox / approvals = physical access control and change approval
  • Subagents = specialists pulled in for a subtask

Agent quality depends on the engineering guardrails, not just the model.


Next steps