Latent Space
March 27, 2025

Building Manus AI (first ever Manus Meetup)

This podcast delves into the origin story of Manus AI, a general AI agent, and the journey of its creators, from their first product, Moni, a Chrome extension, to their ambitious foray into AI browsers and ultimately, Manus. The discussion highlights the evolution of their thinking, the key learnings from past projects, and the core design principles behind Manus.

The Origin of Manus AI

  • "This name, this word 'Manus,' comes from an old Latin word... which is also the MIT motto. It's like 'Mens et Manus,' which is 'mind and hand."
  • "For the past few years, the problem is that we've already had very powerful LLMs, but we just lock it in a black box, give it only the pen and a very little notebook. But we don't give it a computer. We don't give it outside world access, and we ask them to do very hard work, but they can’t do that."
  • Manus aims to empower LLMs by providing them with tools and access to interact with the real world, much like providing hands to a mind.
  • The name "Manus" emphasizes the importance of action and real-world impact, contrasting with purely theoretical or simulated AI capabilities.
  • Manus is designed to bridge the gap between the potential of LLMs and their practical application in real-world scenarios.

From Moni to Manus: A Product Journey

  • "Moni is a Chrome browser extension... the problem that you have to switch between apps, copy-paste text, copy-paste image, right, it's really frustrating. So we think we... let user to like use AI just in the context without switching."
  • "After building Moni we always... thinking about the next move... because... not everyone is using Chrome and everyone is using browser. And... we talk about extension, many people didn't know what extension is. So after building Moni we always think about the next move."
  • Moni, a successful Chrome extension with millions of users, streamlined AI integration within the browser context.
  • Limitations of browser extensions, such as platform dependence and user unfamiliarity, prompted exploration of new product avenues.
  • The team's experience with Moni informed their approach to Manus, emphasizing user experience and accessibility.

The AI Browser Pivot and its Lessons

  • "We found there are some problems... browser is for single-user usage... Once the AI started to control your browser, you have to take your hands off the keyboard and the mouse. Even one very small move will just break the whole process."
  • "AI should use browser... AI knows a lot of techniques humans doesn’t know."
  • An ambitious AI browser project was abandoned after six months due to single-user limitations and usability challenges.
  • The key learning was that AI agents should operate in the cloud, using their own browsers, freeing up the user's computer.
  • This pivot led to the conceptualization of Manus, with a focus on cloud-based, autonomous AI agents.

Key Design Principles of Manus

  • "At the very heart of Manus, actually, we just keep it very simple but very... sophisticated structure and it just gives it more intelligence... provide more context to the LLM and not try to control the LLM thinking."
  • "We give Manus a computer... we are using... E2B... to... assign each Manus task with a virtual machine on the cloud."
  • Manus prioritizes a simple yet sophisticated architecture, focusing on providing context to the LLM rather than constraining its thinking.
  • Each Manus task operates within its own virtual machine in the cloud, enabling flexibility and scalability.
  • The system incorporates tools, data access, and a training mechanism (knowledge system) to enhance the agent's capabilities and user experience.

Key Takeaways:

  • Manus AI represents a shift from passive AI tools to active, autonomous agents capable of performing complex tasks in the real world.
  • The creators of Manus emphasize a user-centric approach, prioritizing accessibility and seamless integration into existing workflows.
  • The evolution of Manus highlights the importance of iterative development, learning from past projects, and adapting to the evolving AI landscape.

Actionable Insights:

  • Manus AI empowers LLMs with “hands” to interact with the real world.
  • The platform's cloud-based architecture allows for seamless task execution without disrupting user workflows.
  • Manus is designed for general use, catering to diverse needs beyond specialized coding tasks.

For further insights and detailed discussions, watch the full podcast: Link

This episode unveils Manas, an AI agent designed to bridge the gap between powerful LLMs and real-world action, offering Crypto AI investors and researchers insights into the evolution of agentic AI and its practical implementation.

Introducing Manas: From 'Mind and Hand' to Actionable AI

  • Core Concept: Manas provides the necessary environment and tools for powerful LLMs to execute complex, multi-step tasks that require interaction with external systems (like browsers, files, APIs).
  • Speaker Context: Forest positions Manas not just as a technical solution but as a philosophical step forward in AI usability, emphasizing the need for action beyond pure reasoning.
  • Investor Insight: The core value proposition addresses a key limitation of current LLMs – their inability to reliably perform complex real-world tasks. This focus on the execution layer is a critical area for AI development and investment.

Benchmarking Manas: Performance and Cost-Efficiency on GAIA

  • Key Stat: Manas achieved an average cost of $2 per task on the GAIA benchmark, reportedly 10x cheaper than the previous state-of-the-art ($20) and significantly lower than other attempts ($100+).
  • Quote: “We found some articles discuss about the previous sota and research about when they run Benchmark... it seems like that vertical say like maybe $20 per for the previous sorta... during our test we just solved the task... at average cost $2.” - Forest, highlighting the significant cost advantage.
  • Technical Term: GAIA Benchmark: A benchmark specifically designed to evaluate the performance of general-purpose AI agents on tasks requiring complex web navigation, tool usage, and multi-step reasoning, mimicking real-world challenges.
  • Investor Insight: Demonstrating strong performance on a recognized benchmark like GAIA, coupled with significant cost-efficiency, signals potential competitive advantages in the emerging AI agent market. Cost is a major factor in the scalability and economic viability of AI solutions.

GAIA Benchmark Examples: Showcasing Agent Capabilities

  • Astronaut Task: Required the agent to find a specific image on NASA's website from 2006, identify the smaller astronaut in the image, find his name, research all his space missions across the internet, and sum the total time spent in space – an answer not directly available online.
  • Dog Harness Task: Involved analyzing an image to identify the brand of harness worn by dogs, navigating to a specific blog post on a non-official website from a particular date, scrolling through a lengthy article, and extracting specific information about meat mentioned in the text related to the brand ambassador.
  • Gaming Task: Briefly mentioned a task related to World of Warcraft, showcasing the benchmark's diversity.
  • Narrative: These examples underscore the agent's ability to perform multi-step reasoning, navigate complex websites, interact with visual information, scrape data, and synthesize information from multiple sources – capabilities far beyond simple chatbots.
  • Investor Insight: The ability to handle such diverse and complex tasks demonstrates the potential for general-purpose agents like Manas to automate sophisticated workflows currently performed by humans, opening up significant market opportunities across various industries.

Positioning Manas: The First General AI Agent?

  • Competitive Analysis: Forest mentions analyzing 21 agent-focused projects from Y Combinator's W25 batch. He states Manas could cover the use cases of ~76% of these diverse agents (spanning medical, legal, marketing, etc.) and, in their biased view, often performed better.
  • Strategic Goal: The focus is on serving "average people" and "normal users" with universal task capabilities, differentiating Manas from specialized tools like coding assistants.
  • Investor Insight: The ambition to create a general agent targets a potentially massive market but also presents significant technical challenges. Success hinges on the agent's adaptability and robustness across unforeseen tasks, a key area for researchers to monitor.

Company Journey Part 1: Monica - The Precursor Extension

  • Monica Features: Included in-context article simplification (preserving structure), YouTube video summarization/podcast generation, and PDF interaction within academic sites.
  • Success Metrics: Monica achieved significant traction, boasting 20 million monthly active users and generating substantial revenue (targeting $50M ARR).
  • Investor Insight: The success of Monica demonstrates the team's ability to identify user pain points related to AI interaction, build a popular product, and achieve market fit. This track record adds credibility to their current venture with Manas.

Company Journey Part 2: The AI Browser Pivot - A Crucial Learning Experience

  • Challenges Encountered:
    • User Experience: Having AI control the user's primary browser proved frustrating. Users had to take their hands off the keyboard/mouse and couldn't multitask.
    • Uncertainty: Users had no idea when the AI task would finish, requiring constant attention without interaction.
    • High Bar for Browsers: Users expect a vast feature set from their browser before valuing AI additions, making it hard for a startup to compete with incumbents like Chrome.
  • Outcome: The project was canceled in September 2023, just two weeks before its planned release, despite significant progress. Forest notes the irony of seeing Arc Browser announce a similar pivot away from their initial browser concept around the same time.
  • Investor Insight: This pivot, though costly, reveals critical lessons about human-AI interaction and market realities. It showed that deep AI integration might require dedicated environments rather than retrofitting existing user interfaces like the primary browser. The team's willingness to abandon a major project based on these learnings demonstrates strategic adaptability.

Company Journey Part 3: Inspiration from Cursor and the Birth of Manas

  • Cursor Observation: Non-technical users focused entirely on the output or result (Cursor's "right panel") and ignored the underlying code generation ("left panel"), using it for tasks like data visualization or file processing, not traditional coding.
  • Key Insight: The team realized they should build the "right panel" (task execution) in the cloud, hiding the complexity, inspired by non-coder usage patterns.
  • Synthesized Learnings:
    1. AI should use its own browser/environment, not the user's primary one.
    2. AI tasks should run in the cloud, freeing the user.
    3. Building a full browser is strategically difficult; focus on the AI value layer.
  • Decision: In October 2023, combining these insights, the team decided to build Manas – essentially the "right panel of Cursor, in the cloud."
  • Technical Term: Cursor: An AI-native code editor that integrates LLMs deeply into the software development workflow.
  • Investor Insight: This origin story highlights how observing user behavior and combining lessons from previous attempts led directly to the Manas concept. It emphasizes a product strategy focused on abstracting complexity and delivering task completion value, informed by real-world user interaction patterns.

How Manas Works: The Technical Pillars

  • Compute Environment: Manas assigns each task a dedicated cloud virtual machine (VM) using E2B, an open-source platform providing secure, sandboxed cloud environments for AI agents. This choice allows full OS access (terminal, VS Code, browser) and future flexibility to run diverse software (Windows, Android apps if needed), unlike more restrictive container solutions.
    • Technical Term: E2B: A platform specifically designed to give AI agents access to secure, sandboxed cloud execution environments.
    • Technical Term: Virtual Machine (VM): A software-based emulation of a physical computer, allowing operating systems and applications to run in an isolated environment.
  • Data Access: Manas integrates pre-paid access to various data APIs (e.g., stock data, Twitter, LinkedIn search) to overcome limitations where necessary data isn't on the open internet or requires payment/authentication. This simplifies the process for the end-user.
  • Training & Personalization: A "Know How" system allows users to teach Manas their preferences and standard operating procedures (e.g., "always deliver resume screening results as a spreadsheet," "remember 'Sam' refers to Sam Altman"). This allows the agent's output to become more tailored over time.
  • Investor Insight: The technical architecture prioritizes flexibility and capability (VMs via E2B) over potentially simpler but more limited approaches. Integrating data APIs and personalization addresses practical hurdles in making AI agents genuinely useful for complex, real-world tasks. This infrastructure layer is a key differentiator.

Manas' Core Philosophy: General Agent vs. Predefined Workflows

  • Approach: Instead of trying to control the LLM's thinking with rigid workflows, Manas focuses on providing a rich context, a capable environment ("hands"), and robust tools, allowing the LLM's intelligence to drive the task execution.
  • Quote: “At the very heart of man actually we just keep it very simple but very like sophisticated structure and it just gives it more intelligence... not try to control the LM thinking... we just keep focusing on the hand work yeah the the environment building.” - Forest, explaining their focus on enabling the LLM rather than restricting it.
  • Investor Insight: This commitment to a general agent model is ambitious. If successful, it could unlock a much broader range of applications than specialized agents. However, it relies heavily on the increasing capabilities and reliability of underlying LLMs and the robustness of the execution environment. Researchers should track progress in agent reliability and adaptability within this less constrained framework.

Q&A Highlights: Challenges and Future Directions

  • Improving Reliability: Improvements come from adding more tools (like the recently added image reading capability) and integrating more data APIs based on observed user needs. They also rely on advancements in foundation models from providers like OpenAI and Anthropic.
  • Handling Long Context: While foundation models are improving context length, Manas currently uses context "slitting" techniques when limits are exceeded, acknowledging this is an ongoing challenge.
  • Foundation Models: The team has no plans to build their own foundation models due to the immense cost, believing agentic capabilities will eventually become commoditized. Their focus remains on the execution environment and tooling.
  • Data Access & Paywalls: Accessing data behind logins or paywalls (often blocked by services like Cloudflare) is a major hurdle. Current workarounds involve residential IPs (used cautiously). Future strategies include potential partnerships with security providers to allow legitimate agent traffic and potentially paying for paywalled content access as part of their consumption-based pricing model.
  • Investor Insight: The Q&A candidly addresses critical hurdles for AI agents: reliability, context limits, data access, and cost. The team's strategy relies on leveraging external foundation models while focusing differentiation on the agent's execution environment, tooling, data integration, and adaptability – key areas for investors and researchers to evaluate.

Conclusion: Infrastructure is Key for Actionable AI

Manas represents a strategic bet on general-purpose AI agents, moving beyond LLM chat by providing cloud-based compute environments and tools. For Crypto AI investors and researchers, this highlights the critical role of infrastructure and execution capabilities in unlocking AI's real-world potential, demanding attention to agent architecture and data access solutions.

Others You May Like