Latent Space

May 16, 2025

ChatGPT Codex: The Missing Manual

This Latent Space episode dives deep into OpenAI's ChatGPT Codex with product lead Alexander and engineering lead Josh, who previously founded Airplane. They unpack the journey, philosophy, and practicalities of their new AI-powered software engineering agent.

The Genesis of an AI Coding Partner

"For me, like it's kind of like starts off as better chat, but then when you can give it tools, you can actually like make it an agent, right? Like an agent is like a reasoning model with like tools and an environment, guardrails and then maybe like training on like specific tasks." - Alexander
"For me personally towards the beginning of this year I saw the progress we're making in software agentic software development. And for me it was a bit of like my own moonlanding kind of moment that I suspected was about to happen." - Josh
ChatGPT Codex evolved from experiments giving reasoning models terminal access, leading to the realization that an AI could be an independent software engineering workhorse, not just a code generator.
The vision is to provide an agent with its own sandboxed compute environment, allowing it to tackle complex tasks, "think" for longer (up to 30 mins, sometimes an hour), and operate with greater autonomy.

Beyond Code Generation: An Agentic Approach

"Effectively all that scope creep was kind of like incrementally made sense because we kept leaning further and further into this idea that this is like not just like a model that's good at coding but rather this is an agent that is good at like independent software engineering work." - Alexander
ChatGPT Codex is trained for the full software development lifecycle, including inferring code style, writing PR descriptions with cited code, and attempting to test its own changes, reporting outcomes with logs.
It’s designed for "one-shot" delegation: users fire off tasks (even up to 60 per hour concurrently) and let the agent work independently, fostering an "abundance mindset" rather than meticulous prompt crafting for local execution.

Best Practices for AI Collaboration

"agents.md and... we put a lot of effort into making sure the agent like understand this hierarchy of instructions, right? You can put them in subdirectories and it'll understand which ones take presence over... which others." - Josh
agents.md: Use these files (distinct from READMEs) to provide specific, hierarchical instructions to ChatGPT Codex. It can parse these from subdirectories for context.
Linters & Formatters: Essential for agents to self-verify code quality. Commit hooks are also beneficial for automated checks.
Modular & Discoverable Code: Well-architected, modular codebases are easier for agents to navigate and modify. Even language choice (e.g., TypeScript over JavaScript) helps.

The "Model is the Product" Philosophy

"The most AGI pill thing to do is... be like we don't actually need to think about this problem. The model will just figure it out. All we have to do is give it harder and harder problems and then... it will just have an emergent property of managing its own context." - Alexander
OpenAI’s approach is to "teach the model why" rather than hardcoding deterministic guardrails, building for a future where the model makes most decisions.
This means pushing complexity into the model, believing it will develop emergent capabilities (like context management) when faced with increasingly difficult tasks, rather than developers building bespoke state machines.

Key Takeaways:

ChatGPT Codex isn't just another coding assistant; it's a leap towards autonomous software engineering agents. Success hinges on a new collaborative mindset and preparing codebases for AI interaction.

Delegate, Don't Micromanage: Leverage ChatGPT Codex's ability to run multiple (even 60/hour) long-running tasks in parallel. Think abundance, not scarcity of compute.
Structure for Success: Implement agents.md, linters, and modular architecture. This isn't just good practice; it’s crucial for AI agent performance.
Embrace the Evolution: The goal is an AGI super-assistant. Current limitations (like restricted network access for agents) will evolve, and user feedback on desired environment customization (e.g., Docker, dev containers) is key.

Podcast Link: https://www.youtube.com/watch?v=LIHP4BqwSw0

This episode unveils OpenAI's ChatGPT Codex, an advanced AI agent for software development, exploring its potential to autonomously handle complex coding tasks and what this signifies for the future of AI-driven development and the infrastructure supporting it.

Episode Introduction: The Dawn of Agentic Software Engineering

Alessio, Wix, and their OpenAI guests, Josh and Alexander, dive into the launch of ChatGPT Codex. The conversation kicks off with a lighthearted observation about the demo videos portraying engineers coding alongside their "AI friends," setting the stage for a deeper exploration of human-AI collaboration in software development.

The Genesis of ChatGPT Codex: Personal Journeys and Motivations

Alexander's Path to Agentic AI:
- Alexander, from OpenAI's product team, recounts his journey from working on "Multi," a human-to-human pair programming tool, to exploring human-AI collaboration.
- His work on desktop software and reasoning models at OpenAI led to a key insight: an agent is a "reasoning model with tools and an environment, guardrails, and then maybe like training on specific tasks."
- Early experiments involved giving models terminal access, with a "scientist" demo (an AI updating its own code) being a pivotal "AGI moment." This highlighted the need for safety and sandboxing, influencing the development of the Codex CLI.
- The drive for ChatGPT Codex stemmed from the need for models to "think for longer," use bigger models, and perform more actions safely without constant approvals, essentially "giving the model its own computer."
Josh's Leap into Agentic Development:
- Josh, formerly founder of Airplane (an internal developer tool platform), joined OpenAI driven by the conviction that an "agentic software engineer" was an imminent "moonlanding kind of moment."
- His experience at Airplane with developer workflows, cloud deployment, and composing compute primitives provided a strong foundation.
- He connected with Alexander's team as they were conceptualizing ChatGPT Codex, immediately engaging in debates about its form factor (CLI vs. other interfaces, parallel execution).
- Alexander noted Josh's clear vision: "Here's exactly like kind of the the change that I see in the world and therefore the type of product that I want to build... this is the only thing I want to work on."

ChatGPT Codex vs. Codex CLI: Beyond a Hosted Solution

Josh clarifies that while ChatGPT Codex involves running agents in OpenAI's cloud, it's fundamentally about a new form factor. This encompasses UI binding, scalability, caching, permissioning, and collaboration, going beyond a simple hosted CLI.
Alexander elaborates on ChatGPT Codex's evolution, driven by the concept of an "agent that is good at like independent software engineering work."
- This means capabilities beyond mere code generation, including adherence to instructions, inferring code style, and generating concise, useful PR descriptions that cite relevant code.
- A significant feature is its testing capability: the agent attempts to test its changes and reports outcomes with references to logs.
- SWE Bench, an evaluation benchmark for grading functional outputs of code, was mentioned as a reference, though Alexander noted ChatGPT Codex aims for PRs that are genuinely mergeable, considering code style and clarity, not just functional correctness.
Josh emphasizes the "feeling" of using ChatGPT Codex: an initial leap of faith followed by witnessing the agent's impressive long-running independence in writing code, creating mod scripts, and testing.

Strategic Implication for Crypto AI: The development of sophisticated AI agents like ChatGPT Codex signals a future where complex software, potentially including smart contracts or decentralized applications, could be developed and maintained with greater autonomy. Investors should monitor how these tools might lower barriers to entry for building on-chain or AI-integrated crypto projects.

Best Practices for Maximizing AI Agent Effectiveness

Essential Setup:
- Install linters and formatters: Agents can automatically use these for code quality.
- Utilize commit hooks: Beneficial for agent-driven development workflows.
`agents.md` - The Agent's Playbook:
- Josh highlights `agents.md` as crucial. This file allows users to provide hierarchical instructions to the agent, with awareness of subdirectories. OpenAI even uses GPT-4 to help draft these files.
- Start simple with `agents.md` and iterate. The future vision includes auto-generating this file based on PRs and feedback.
Codebase Discoverability and Structure (Alexander):
- Analogy: A base reasoning model is like a "precocious college grad"; ChatGPT Codex has "a few years of job experience." `agents.md` is like their "first day at your company" guide.
- Make your codebase discoverable: Good engineering practices (modular design, clear naming) help agents navigate and understand the code, similar to onboarding new human developers.
- Language choice: Typed languages like TypeScript are preferred over JavaScript for better agent performance.
- Modular architecture: Critical for agents to effectively test and modify code. Poor architecture can drastically reduce an agent's (and human's) commit velocity.
- Intentional Naming: OpenAI's internal codename "wham" was chosen for its uniqueness in the codebase, making it easy for agents to find relevant sections via prompts.
Human vs. Agent Readability:
- Josh believes that systems for human and AI developers are largely convergent. "How humans communicate to AI where to make the change... all those things aren't going to go away immediately and so I think the whole system still feels actually very human." Human oversight for review, deployment, and requirement specification remains vital.

Actionable Insight for Researchers: The `agents.md` concept and the emphasis on discoverable, modular codebases offer a framework for how researchers might structure data and instructions for more complex AI research tasks, potentially improving reproducibility and agent performance in scientific discovery.

The `agents.md` File: Purpose and Design Philosophy

Alexander explains the rationale behind `agents.md` instead of reusing `readme.md` or creating branded files (e.g., `codexagent.md`).
- Agents require different information than human contributors. While agents read `readme.md`, `agents.md` provides specific instructions not automatically inferred.
- A generic name (`agents.md`) was chosen for openness and to avoid a proliferation of tool-specific instruction files, promoting interoperability. "Part of why we made the Codex CLI open source is like a lot of problems like safety issues that you need to figure out for how to deploy these things safely and no one should have to figure these out like more than once."
Josh adds that agents can infer code style from the existing codebase, a task that often needs explicit instruction for human developers.

Agent Design: Trusting the Model Over Deterministic Scaffolding

Wix notes OpenAI's approach of trusting the model via prompting, questioning how context limits are managed with large codebases or extensive `agents.md` files.
Josh clarifies `agents.md` is a file the agent actively retrieves and parses, not just a static system prompt. OpenAI's philosophy is to "teach the model why this is the right way to do it" rather than hardcoding behavior, focusing on providing robust tools for context management and codebase exploration.
Alexander emphasizes that "the model is the product." Development involves deciding what users, developers, or the model itself should control.
- Many existing AI agents are complex state machines built by developers around multiple short model calls, limiting the problem's complexity to what a human developer can manage.
- OpenAI aims to push this state management and complexity into the model itself, enabling it to tackle more intricate tasks and eventually facilitate teams of agents. This involves curating training data to teach the model desired behaviors.

Strategic Implication for Investors: OpenAI's "model-as-the-product" philosophy, relying on emergent capabilities from larger models trained on diverse and complex tasks, suggests a long-term investment in foundational model capabilities. This contrasts with approaches focusing heavily on deterministic programming around smaller models.

Data, Development Cycles, and Long-Term Vision

Alexander describes an "AGI-pilled" approach to model improvement: instead of specific interventions for every problem (like context window management), the strategy is to "give it harder and harder problems and then like it will just have an emergent property of managing its own context."
Wix raises concerns about potentially slow and expensive development cycles if bug fixes require extensive data collection and retraining.
Alexander acknowledges this is a "long-term play." OpenAI's strategy involves building powerful bespoke models (like ChatGPT Codex for coding) and then generalizing these learnings into larger, more capable foundation models (e.g., GPT-4.1 benefiting from coding-specific improvements). This can lead to outsized returns due to knowledge transfer across domains.

ChatGPT Codex Operational Parameters: Task Duration and Concurrency

Task Length: While initial documentation suggested 1-30 minutes, Josh states the current hard cutoff is one hour (subject to change), with 30 minutes being a good ballpark for complex tasks. The average task time is significantly lower. This aligns with trends observed in other research, like the Meta paper estimating a 1-hour average autonomous time for agents.
Concurrency: Users can run multiple ChatGPT Codex tasks in parallel (e.g., 5-10 simultaneously). There's a limit of 60 tasks per hour, primarily for fraud prevention.
Alexander stresses an "abundance mindset": users are encouraged to delegate many tasks in parallel without excessive prompt engineering, letting the agent explore different avenues.

Actionable Insight for Researchers: The ability to run multiple long-duration autonomous tasks concurrently opens avenues for large-scale experimentation and exploration in AI research, provided the underlying compute and environment can support it.

User Experience: Shifting to a Delegation Mindset

Wix shares his initial experience of watching the agent code, then realizing the intended workflow is to "fire and forget," delegating tasks and moving on.
Alexander suggests using ChatGPT Codex on a mobile phone as a way to reinforce this delegation mindset. The platform is web-responsive, though not yet integrated into the native ChatGPT mobile app.

The Compute Backbone: Environment and Safety

ChatGPT Codex leverages a sophisticated compute platform, sharing some infrastructure with OpenAI's reinforcement learning (RL) efforts.
Environment Customization (Josh):
- Users can set up environments and run scripts, primarily for installing dependencies.
- A REPL (Read-Eval-Print Loop) allows humans to interactively test the environment setup.
Safety Measures:
- Crucially, when an agent is running a task, its internet access is currently cut off. Josh explains, "We still don't fully understand what letting it loose an agent in its own environment is going to do... there's still a lot of risk to this category."
- Future plans include allowing limited, controlled network access.
Interactivity: Direct human intervention or correction while an agent is mid-task is not yet a feature. The current focus is on "fully independent just like deliver massive value in one shot kind of approach."

Strategic Implication for Crypto AI: The sandboxed, no-internet-access execution environment for agents highlights critical security considerations for any AI interacting with valuable assets or sensitive data, a paramount concern in crypto. Future developments in controlled external access will be key for agents interacting with blockchains or external data oracles.

Research Preview: Path to Full Release and Call for Feedback

Alexander positions the ChatGPT Codex research preview as a "thought experiment" to explore the purest form of an "AGI-pilled" coding agent. It also serves as an experiment for how AGI might feel in non-developer roles.
The vision is a future where humans handle ambiguous or creative tasks, delegating most other work to ubiquitous AI agents.
Key areas for development towards a full release include:
- Multimodal inputs (e.g., visual inputs for UI generation).
- Expanded, safe access to external resources (like controlled network access).
- Tighter integration with existing developer tools and UIs.
Call to Action:
- Josh seeks feedback on environment customization preferences (e.g., Docker images vs. dev containers).
- Alexander encourages users to experiment extensively with ChatGPT Codex, discover novel workflows, and provide feedback, especially during the current period of generous rate limits. "We just want you to try it and figure out what sticks, what works, what doesn't, how do you prompt it, and then we want to learn from that."

Pricing and Value Proposition

While specific pricing is yet to be announced, Josh emphasizes OpenAI's goal: "We aim to deliver a lot of value... it's on us to show that and really make people realize like, wow, this is like doing very economically valuable work for me. And I think a lot of the pricing can fall from that."
Wix notes the general desire in the market for the "cheapest form of code attention," highlighting the challenge of pricing such powerful tools.

Conclusion: Agentic AI is Reshaping Development's Future

ChatGPT Codex represents a significant step towards autonomous AI software engineers, capable of handling complex, long-running tasks. For Crypto AI investors and researchers, this signals a need to monitor advancements in agentic AI, its impact on development paradigms, and the evolving infrastructure and safety protocols required to harness its potential.

ChatGPT Codex: The Missing Manual

Others You May Like

Wouter Haringhuizen: Future of Climate Forecasting, Bittensor Subnet 18, Weather API, DeSci | Ep. 51

Who Should Regulate AI? A Civil Debate on the Future of AI Policy

Novelty Search :: Bitmind AI :: Bittensor Subnet 34

ChatGPT Codex: The Missing Manual

Join 4,000+ smart readers to get access to all our research and tools for free.

Others You May Like

Wouter Haringhuizen: Future of Climate Forecasting, Bittensor Subnet 18, Weather API, DeSci | Ep. 51

Who Should Regulate AI? A Civil Debate on the Future of AI Policy

Novelty Search :: Bitmind AI :: Bittensor Subnet 34