Autonomy Is All You Need
– Michele Catasta, Replit By AI Engineer
Quick Insight: This summary is for builders moving beyond simple AI chat to fully autonomous software creation. It explains how Replit is removing the steering wheel so non-technical users can build production-grade apps without making a single technical choice.
This episode answers:
- Why is long-running autonomy often a vanity metric rather than a feature?
- How does Replit use Playwright to fix the painted door problem in AI-generated code?
- Why is the core loop the best orchestrator for parallel agent tasks?
Michele Catasta, VP of AI at Replit, argues that the future of software isn't AI assistance but total autonomy. By offloading technical decisions to agents, Replit aims to turn every knowledge worker into a software creator.
Top 3 Ideas
1. The Waymo Experience
- Full Autonomy: Replit is building for the non-technical user who cannot provide technical feedback. This means the agent must handle every architectural choice without human intervention.
- Rethinking Autonomy: True autonomy is measured by the irreducible amount of work an agent completes without user input. This focus moves the goalpost from how long a model runs to how much value it delivers.
- Technical Collaborator: Agents are transitioning from simple autocomplete tools to full-stack partners. This allows users to focus on what they are building rather than how it is built.
2. The Verification Pillar
- Painted Doors: Over 30% of AI-generated features are broken on the first attempt. Automated verification ensures the agent catches these errors before the user ever sees the interface.
- Playwright Integration: Replit agents write their own browser tests to verify functional correctness. This creates a regression suite that makes the software more resilient over time.
3. Orchestration and Parallelism
- Sub-agent Orchestration: Breaking tasks into fresh context windows prevents the main loop from getting confused. This separation of concerns keeps the agent coherent during long-running jobs.
- Parallel Execution: Moving task decomposition from the user to the agent reduces cognitive load. It allows the system to handle merge conflicts and speed up the development cycle.
Actionable Takeaways
- The Macro Shift: Software development is moving from human-led logic to agent-led verification.
- The Tactical Edge: Use sub-agents to isolate testing from creation to prevent context pollution.
- The Bottom Line: The technical barrier is evaporating. In the next 12 months, the winning platforms will be those that require the fewest technical decisions from the user.
Podcast Link: Click here to listen

Replit is engineering a transition from technical assistants to fully autonomous software agents that require zero user intervention or coding knowledge.
The Shift to Waymo Style Autonomy
- Catasta distinguishes between supervised autonomy and full autonomy. He compares current coding tools to Tesla Full Self-Driving (FSD), which requires a licensed driver to manage edge cases. Replit targets a Waymo style experience where the user sits in the back seat and lacks a steering wheel.
- Replit defines autonomy as the ability for an agent to make technical decisions independently.
- The target audience is knowledge workers who cannot provide technical feedback.
- Catasta argues that autonomy should not be a vanity metric based on runtime alone.
- Success depends on maximizing "reducible runtime," which is the duration an agent operates without human intervention.
- “We should offload completely the level of complexity away from them.” Speaker: Michele Catasta.
Eliminating Painted Doors via Verification
- Agents frequently create "painted doors," which are UI elements that appear functional but lack underlying code. Internal Replit data shows 30% of agent generated features are broken on the first attempt.
- Replit uses Playwright (a framework for automating web browser interactions) to verify code functional correctness.
- The system writes Playwright scripts to simulate user behavior and catch errors.
- This autonomous testing breaks the feedback bottleneck caused by non technical users.
- Catasta notes that Playwright code is more expressive than standard tool calling libraries like Stagehand.
- “Without testing, agents build a lot of painted doors.” Speaker: Michele Catasta.
Orchestration Over Context Volume
- Catasta claims that massive context windows (the amount of data a model can process at once) are unnecessary for long horizon tasks. Efficient state management and sub agent orchestration provide better results than 100 million token windows.
- Replit uses sub agents to maintain a "separation of concerns" (the design principle of dividing a program into distinct sections).
- Agents persist memories and plans in the file system rather than keeping everything in the active prompt.
- Sub agents start with a blank slate and receive only the specific context needed for their task.
- This approach increased Replit's "memories per compression" metric from 35 to 50.
- “Long context models are not needed to work on coherent and long trajectories.” Speaker: Michele Catasta.
Parallelism as a UX Necessity
- Parallelism is required to maintain user engagement during long running tasks. Current parallel agents require users to decompose tasks and resolve merge conflicts (errors occurring when two agents modify the same code).
- Replit is developing a "Core Loop Orchestrator" where the agent handles task decomposition.
- The agent manages parallel threads and attempts to mitigate merge conflicts automatically.
- Parallelism allows testing to run alongside code generation to reduce total latency.
- Catasta views extra compute as a necessary trade for improved user experience.
- “The core loop as an orchestrator is going to be our main bet for the next few months.” Speaker: Michele Catasta.
The New Metric:
- Investors should evaluate agents based on "Reducible Runtime" rather than total runtime or context window size. This measures how long an agent can work before requiring a technical decision from a human.
Verification Bottleneck:
- Capital is moving toward autonomous verification layers. Generation is becoming a commodity while functional testing is the new moat for reliability.
Orchestration Moats:
- The value in the AI stack is shifting from the base model to the orchestration layer that manages sub agents and parallel execution.
Strategic Conclusion:
- Replit is building a system where the agent acts as the primary orchestrator of software creation. This removes the technical burden from the user. The next industry milestone is the perfection of autonomous merge conflict resolution in parallel agentic workflows.