a16z
August 8, 2025

GPT-5 and Agents Breakdown – w/ OpenAI Researchers Isa Fulford & Christina Kim

OpenAI researchers Christina Kim and Isa Fulford, key figures behind ChatGPT and its agentive capabilities, break down the massive performance jump in GPT-4o and the paradigm shift toward asynchronous AI agents. With years of experience on projects from WebGPT to Deep Research, they offer a rare look inside the data-driven, taste-led culture powering the frontier of AI.

The Quantum Leap in Capability

  • "I've been using it for a few weeks and it's just kind of blown me away in a way that models previously haven't. Maybe I'm biased, recency biased, but I think the jump from four to o is most impressive for me."
  • "The writing I honestly find it very tender and touching... it feels like someone should have written this."
  • The improvement from GPT-4 to GPT-4o is described as the most impressive generational leap yet, particularly in its breadth of capabilities and ability to handle complex tasks like coding and creative writing.
  • Coding, especially front-end development, is now "totally next level" due to a focused effort on high-quality data and aesthetics, empowering non-technical users to build full-fledged apps from a simple prompt.
  • Model behavior was intentionally redesigned to curb issues like sycophancy. The new model can "think step by step," allowing it to pause and reason before responding, which significantly reduces hallucinations.

The Agent Paradigm Shift

  • "Everyone was talking about agents, but we didn't really have a way of actually training useful agents... when we saw the reinforcement learning algorithm working really well... it became pretty clear this thing's actually thinking and reasoning and backtracking."
  • The key unlock for useful agents wasn't just a better model, but a reinforcement learning algorithm that enabled genuine reasoning and backtracking. The goal is an agent that can operate asynchronously like a "chief of staff."
  • There's a major user behavior shift from valuing speed to valuing quality. Projects like Deep Research proved people are willing to wait several minutes for a high-value, comprehensive answer that would take a human hours to produce.

Data-Pilled and a Matter of Taste

  • "I'm very data-pilled. I think data is very important... now that we have such an efficient way of learning, high-quality data is even more important."
  • "Good researcher taste is just simplifying the problem to the dumbest thing or the most simple thing you can do."
  • High-quality data is the paramount bottleneck and driver of progress. The team is "data-pilled," believing that with powerful learning algorithms, the quality of tasks and RL environments is now the primary constraint.
  • OpenAI’s research culture champions "good taste," which they define as a form of Occam's Razor: finding the simplest, most elegant solution. The most effective breakthroughs are often concepts that seem obvious only in hindsight.
  • The company thrives on a counter-startup philosophy of building for a general audience ("your user is anyone"), a strategy made viable by its immense scale and distribution.

Key Takeaways:

  • The frontier of AI is advancing on two fronts: raw intelligence and agentive capability. GPT-4o represents a step-change in the former, while the development of asynchronous agents signals a new paradigm for how we interact with computers.
  • From Prompts to Projects. The focus is shifting from single-shot answers to long-running, asynchronous tasks. The willingness of users to wait for high-quality output unlocks complex use cases, turning AI from a chatbot into a digital chief of staff.
  • Data is the New Oil, Again. With learning algorithms becoming hyper-efficient, the primary bottleneck is no longer compute or architecture, but the creation of high-quality, task-specific data and realistic reinforcement learning environments.
  • Taste is the Ultimate Differentiator. As AI becomes a commodity, the ability to define a problem with simplicity and elegance—"good taste"—is the most valuable, non-commoditizable skill in AI development.

For more information, watch the full session here: Link

This episode reveals that the true leap forward in AI is not just about raw intelligence, but about making state-of-the-art capabilities usable, accessible, and affordable for everyone.

Episode Introduction: The New Era of Usable AI

  • Christina Kim: As the lead for the core models team on post-training, Christina provides a historical perspective, having worked on foundational projects like WebGPT, the precursor to ChatGPT. Her insights focus on model behavior, data quality, and the art of balancing trade-offs in AI development.
  • Isa Fulford: Leading the deep research ChatGPT agent team, Isa offers a forward-looking view on agentic systems. Her expertise lies in pushing capabilities for complex tasks like comprehensive research and tool use, emphasizing the practical application of reinforcement learning.

A Leap in Coding and Usability

  • The conversation begins with the immediate impact of GPT-5, highlighting a significant step-change in its core utility, particularly for coding and writing. Christina emphasizes that while evaluation numbers are strong, the real difference is in the user experience.
  • Step-Change in Coding: The model is described as the "best coding model in the market," with a massive improvement in front-end web development. This leap was achieved not through a single breakthrough but through meticulous attention to detail.
  • The Importance of "Caring": Christina attributes the success to the team's intense focus on specific use cases. "I think it's just literally just caring so much about getting coding working well," she states, explaining this involved curating superior datasets and refining reward models to value aesthetics and functionality in front-end code.
  • Actionable Insight: The dramatic improvement in front-end development lowers the barrier for non-technical founders. Investors should look for "indie-type" businesses built by individuals who can now translate ideas directly into functional applications with simple prompts, creating a new wave of lean, idea-driven startups.

Intentional Model Behavior: Beyond Sycophancy

  • A key focus for GPT-5 was the intentional design of its personality and behavior, moving away from issues like sycophancy (the model's tendency to be overly agreeable or flattering) that affected previous versions.
  • Artful Post-Training: Christina describes post-training—the process of refining a model's behavior after its initial training—as "more like an art than... research." It involves balancing competing objectives, such as making the model helpful and engaging without it becoming overly effusive.
  • Reducing Hallucinations and Deception: The researchers see a strong link between deception and hallucination. The model's inherent desire to be helpful can cause it to invent answers. The new model's ability to "think" step-by-step allows it to pause and reason before responding, reducing the tendency to blurt out incorrect information.
  • Strategic Implication: For researchers, this highlights the growing sophistication of post-training. The ability to fine-tune model personality is a key differentiator and a critical area for developing more reliable and trustworthy AI systems, especially for applications in decentralized finance (DeFi) or governance where precision is paramount.

Unlocking New Frontiers with Price and Performance

  • The combination of GPT-5's enhanced capabilities and aggressive pricing is positioned as a major catalyst for innovation. The team is eager to see what new applications developers will build when state-of-the-art AI becomes economically viable for a wider range of use cases.
  • Democratizing Access: By offering the most powerful models to free users and at lower price points for developers, OpenAI aims to unlock use cases that were previously too expensive to be practical.
  • From Benchmarks to Usage: Christina argues that traditional benchmarks are becoming saturated. The true measure of progress is now real-world usage. "The real metric of how good are our models are getting is I think going to be like usage, right? Like what are the new use cases that are being unlocked."
  • Investor Takeaway: The new price-performance ratio is a critical inflection point. Crypto AI investors should actively seek out startups leveraging this to build novel applications that were previously cost-prohibitive, from complex on-chain analysis tools to sophisticated trading agents.

The Self-Reinforcing Loop of Agent Development

  • Isa explains how learnings from specialized agent models like Deep Research directly inform and improve the flagship models. This creates a powerful, self-reinforcing development cycle.
  • Data-Efficient Capability Transfer: Isa notes that reinforcement learning (RL) is highly data-efficient for teaching specific skills. Data sets created to train frontier agent models on tasks like comprehensive browsing are contributed back to the core reasoning models.
  • Pushing Capabilities: The agent team's goal is to push the boundaries of what's possible (e.g., advanced browsing, tool use). These new capabilities are then integrated into the main models, ensuring the entire ecosystem benefits from the most advanced research.
  • From Theory to Practice: The development of useful agents became possible once the models demonstrated strong reasoning and backtracking abilities in domains like math and coding. This "thinking" process was the key to building agents that could navigate the complexities of real-world tasks.

The Data-Centric View of AI Progress

  • When asked about the drivers of improvement—architecture, data, or scale—both researchers strongly advocate for the primacy of data.
  • "Team Data": Christina declares herself "very data-pilled," attributing the success of models like Deep Research to Isa's meticulous data curation.
  • High-Quality Data is Key: With highly efficient learning algorithms, the bottleneck shifts to the quality and relevance of the training data. Isa states, "especially now that we have such an efficient way of learning, high quality data is even more important."
  • The Bottleneck of RL Environments: The next frontier for improvement lies in creating better RL environments—simulated settings for training agents. The more realistic and complex the tasks within these environments, the more capable the agents will become.
  • Opportunity for Startups: There is a clear market gap for companies that can build high-fidelity, specialized RL environments. This represents a significant opportunity for startups to partner with large labs by providing the sophisticated training grounds needed to automate complex digital labor.

The Emergence of Asynchronous, High-Value Agents

  • The discussion shifts to the definition and future of agents, which Isa defines as systems that perform useful work asynchronously on a user's behalf.
  • Defining Agents: An agent can be tasked with a goal and left to work independently, returning with a result or a clarifying question. The roadmap includes improving capabilities in research, creating artifacts (docs, slides), and consumer tasks like shopping and trip planning.
  • The Asynchronous Paradigm Shift: A key insight is that users are willing to wait for high-quality, high-value work. This marks a shift from the 2024 focus on speed to a new paradigm where depth and thoroughness are valued, especially for tasks that would take a human hours or days. Isa notes, "if you asked an analyst to do this and it would take them 10 hours or two days, it seems reasonable that someone would be willing to wait like five minutes in your product."
  • Bottlenecks to Reliability: The primary challenge for agents is generalization. They perform well on tasks they were trained on but can be unreliable outside that scope. Overcoming this requires broader training data and better oversight mechanisms to prevent unintended actions, especially when agents have access to private data or can perform irreversible actions.

Reflections on OpenAI's Journey and Culture

  • Christina and Isa reflect on their time at OpenAI, from the early days of WebGPT to its current status as a global leader.
  • WebGPT and the Quest for Grounding: Christina recounts working on WebGPT, an early model designed to use a browser to ground its answers in factual, up-to-date information and combat hallucinations. This project evolved into the chatbot that became ChatGPT.
  • The "Exponential" Moment: Both researchers felt the pull to join OpenAI before it was a household name, driven by the realization that AI progress was on an exponential curve. Christina recalls thinking, "if this exponential is true, like there's not really much else I want to spend my life working on."
  • Maintaining a Startup Culture: Despite growing to thousands of employees, they believe OpenAI has maintained a culture that rewards agency and initiative. Research teams remain small and nimble, and collaboration between research, product, and engineering is deeply integrated, allowing for rapid iteration.

Conclusion: Usability is the New Frontier

  • The launch of GPT-5 signals a strategic pivot from chasing benchmark scores to delivering tangible, real-world utility. For Crypto AI investors and researchers, this means the most significant opportunities now lie in building practical, agent-based applications on top of these increasingly accessible and powerful platforms.

Others You May Like