a16z
March 15, 2025

Building the Next Generation of Conversational AI

In this episode, we dive into the creation of Sesame's conversational AI, featuring insights from Anit, one of its brilliant creators. The discussion explores the challenges and breakthroughs in developing a more human-like AI, focusing on the nuances of voice interaction and the future of AI as a computing interface.

The Challenge of Building Human-Like AI

  • “When you're building the product, you're using it every day... you kind of know how big the delta is from what you're putting out and what it could be.”
  • Developers often underestimate their product's quality due to their awareness of its potential future improvements.
  • The qualitative human reaction is a critical metric, yet difficult to quantify, in evaluating AI progress.
  • Trusting intuition is not enough; rigorous evaluation methods are essential for meaningful AI development.

Bridging Transcription and Text Processing

  • “Getting it to be very fast is a big challenge... we're moving towards just removing transcription entirely.”
  • Speed and latency in transcription are major challenges, with a shift towards transcription-free models on the horizon.
  • Future models aim to process audio directly, bypassing text transcription to enhance response times.
  • The current demo lacks audio context understanding, highlighting a significant area for future improvement.

Focus on Product Experience

  • “We have picked the right things to focus on... the naturalness of the voice, getting the voice to generate these human imperfections.”
  • Sesame prioritizes natural voice interactions over raw technical capabilities, creating a more engaging user experience.
  • The small team focuses on specific axes of improvement, such as voice naturalness, to differentiate their product.
  • The company draws inspiration from Pixar, emphasizing the importance of creative taste in technology development.

The Future of AI as a Computing Interface

  • “We see companions as an interface... natural language as an interface.”
  • AI companions are envisioned as a new interface for computing, emphasizing natural interaction over pure functionality.
  • The goal is to create a seamless, low-friction interface that mirrors human perception and context.
  • Glasses are seen as an optimal form factor for AI companions, providing constant, unobtrusive access to computing.

Key Takeaways:

  • AI development requires balancing technical rigor with qualitative user feedback to create truly engaging experiences.
  • The future of conversational AI lies in eliminating transcription, allowing direct audio processing for faster, more natural interactions.
  • AI companions are poised to become a new computing interface, emphasizing natural, human-like interactions over traditional utility.

For further insights and detailed discussions, watch the full podcast: Link

Others You May Like