a16z
June 13, 2025

What You Missed in AI This Week (Google, Apple, ChatGPT)

This week, a16z partners and identical twins, Justine and Olivia, unpack the whirlwind of advancements in consumer AI, from mind-bending video generation to the new economics of AI startups. They explore how these tools are not just novelties but are rapidly reshaping content creation and business models.

The Video Revolution: Google's V3 & AI Storytelling

  • “V3 was sort of like the ChatGPT moment for AI video.”
  • “What's very different about it is it generates audio natively at the same time it generates video.”
  • Google's V3 video model is a showstopper, generating video and synchronized audio directly from text prompts—think full talking head vlogs in one go. While currently limited to 8-second clips (and no audio for image-to-video yet), creators are cleverly using masked characters like Stormtroopers to build longer narratives.
  • Accessibility has broadened: what started on a $250/month Google plan is now hitting consumer platforms like Hedra and Krea via API, making high-end AI video creation more democratized and leading to an explosion of "faceless channels."

Voice AI Gets Real: ChatGPT and ElevenLabs Upgrades

  • “They've been rolling out some updates to make my voice sound more natural and expressive.” (re: ChatGPT)
  • “And now they essentially take all of the weird inflections, emotion, even accents, and they turn it into text prompting through these things called tags.” (re: ElevenLabs V3)
  • ChatGPT's advanced voice mode is back with a vengeance, now boasting incredibly human-like nuances—pauses, "ums," upward inflections—making conversations feel startlingly natural. This catches OpenAI up after competitors briefly stole the "most human" voice crown.
  • Meanwhile, ElevenLabs V3 is revolutionizing audio creation by allowing text prompts to dictate emotion, accents, and even interruptions using "tags," eliminating the cumbersome speech-to-text-to-speech workflows of yore. This opens up a new frontier for AI-driven narrative storytelling.
  • Contrast this with Apple, whose "Apple Intelligence" announcements felt a bit like outsourcing the heavy AI lifting to ChatGPT, leaving some users underwhelmed.

The New Economics of AI: Consumer Startups Booming

  • “The median ARR annualized revenue run rate is now $4.2 million at month 12 for consumer startups.”
  • “Consumer companies are actually ramping revenue faster, which again is like a total reversal from what we saw before.”
  • Forget slow burns; consumer AI startups are hitting a median $4.2M ARR within their first year, with top performers nearing $8.7M. This is double the pace of B2B AI counterparts, flipping the old startup script.
  • High inference costs forced early AI companies to charge, and it turns out consumers will pay (around $22/month on average) for powerful AI tools that offer tangible benefits, from creative superpowers to personalized coaching.
  • Paid user retention is solid, and for the first time, consumer subscriptions are seeing enterprise-like revenue expansion through upsells like credit packs.

Democratizing Creation: AI-Powered Branding

  • “Flux context... you can edit with words for the first time.”
  • “The next generation of entrepreneurs are going to be completely AI assisted.”
  • Tools like Flux Context (on Krea) are bringing Photoshop-level editing to the masses via natural language, maintaining impressive consistency for objects and characters.
  • The hosts demoed creating a froyo brand, "Melt," from scratch—ideation with ChatGPT, logo with Ideogram, and product/store mockups with Flux Context—all in a few hours. This signals a future where "full stack AI brands" (logo, product shots, AI-generated ads, even AI influencers) are the norm.

Key Takeaways:

  • AI is rapidly democratizing sophisticated content creation and enabling entirely new business models. The speed of innovation is intense, making it both an exciting and "exhausting" time for creatives and builders.
  • Video & Voice Converge: AI is now generating synchronized audio-visual content from simple text prompts, opening a Pandora's box for storytellers and "faceless" creators.
  • Consumer AI Pays: Startups in consumer AI are scaling revenue at unprecedented rates, proving users will pay premium subscriptions for powerful, AI-native experiences.
  • AI-Assisted Entrepreneurship: The barrier to launching a brand or product is crumbling, as AI tools empower anyone to design, market, and even conceptualize businesses with previously unimaginable speed and ease.

For further insights and detailed discussions, watch the full podcast: Link

This episode reveals how breakthrough AI video and voice models are not just democratizing creation but also forging new, rapidly monetizing markets, offering critical signals for Crypto AI investors and researchers.

Google's V3: The "ChatGPT Moment" for AI Video

  • The podcast kicks off with Justine and Olivia, partners at A16Z, discussing the seismic shift in AI video, largely catalyzed by Google's V3 model, which Olivia describes as "sort of like the ChatGPT moment for AI video." This new model from Google DeepMind distinctively generates audio natively alongside video, allowing for complex prompts that include dialogue and specific character interactions. However, its current capabilities are limited to 8-second generations and it doesn't produce audio when generating from an image, posing challenges for longer-form content and consistent character portrayal, leading to creative workarounds like using masked characters (e.g., Stormtroopers, Yeti) where facial inconsistencies are less noticeable.
  • V3 (Google's Video Model): Google DeepMind's latest video generation model capable of producing video with synchronized, natively generated audio from text prompts. It represents a significant step in creating more integrated and dynamic AI-generated video content.
  • Accessibility & Cost: Initially exclusive to a $250/month Google AI Ultra plan, V3 is now becoming available via API (Application Programming Interface) on consumer platforms like Hedra or Crea for around $10/month, and through developer-focused platforms like Fall or Replicate at approximately 75 cents per second. An API allows different software to communicate, enabling broader integration of V3.
  • Creator Impact: The speakers anticipate an explosion of "faceless channels," where creators use AI-generated characters, democratizing content creation for those not wishing to be on camera.
  • Future Outlook: While Google will likely aim for longer, more coherent video generation, the high operational costs of V3 suggest a push towards more optimized, cost-effective distilled models in the future.
  • Actionable Insight for Crypto AI Investors/Researchers: The evolving accessibility and pricing of foundational models like V3 are key for the economic viability of AI video startups. The trend of "faceless channels" could intersect with decentralized identity and content ownership models in the crypto space.

ChatGPT's Voice Gets More Human: Analyzing the "Advanced Voice Mode" Update

  • Olivia details a significant update to ChatGPT's "Advanced Voice Mode," which aims to make interactions far more human-like. ChatGPT is OpenAI's flagship LLM (Large Language Model), an AI trained on vast data to understand and generate human-like text and speech. This update, rolled out to paid users first, introduces more natural vocal inflections, such as upward intonations for questions, and human-like disfluencies like "um" and "uh." A live demo during the podcast effectively showcased these improvements.
  • Context: The update was notable as ChatGPT's voice capabilities had seemed to stagnate compared to rapidly advancing competitors like Sesame, open-source alternatives, Google's Gemini, and Gro's voice products.
  • Speaker Analysis: Olivia, drawing on her experience tracking the AI space, speculates that OpenAI's previous caution might have stemmed from the "Her" movie controversy (fears of AI sounding too human) and the company's broad research priorities, including text-based AGI (Artificial General Intelligence), the Sora video model, and image generation. AGI refers to AI with human-like cognitive abilities across diverse tasks.
  • Quote: During the demo, the updated ChatGPT voice states, "Exactly. Those little touches are all intentional to make the conversation feel more natural and relatable."
  • Actionable Insight for Crypto AI Investors/Researchers: The push towards hyper-realistic AI voices by major labs like OpenAI underscores the importance of user experience in AI adoption. For crypto AI, this could translate to more engaging AI agents in metaverses or more natural interfaces for decentralized applications, potentially increasing trust and usability.

Apple Intelligence: Underwhelming AI Updates and Siri's Continued Struggles

  • The conversation shifts to Apple's recent AI announcements, with both Justine and Olivia expressing a general market sentiment of disappointment towards Apple Intelligence, the company's new suite of AI features. Justine recounts a personal anecdote where Siri failed a simple date query and offered to search ChatGPT instead, highlighting Siri's ongoing limitations.
  • Apple's Strategy: The speakers note Apple's seemingly cautious approach, possibly influenced by past missteps with AI features (like jumbled notification summaries), and its tendency to outsource more complex AI tasks to an on-device version of ChatGPT.
  • Key Features: Announced updates include Gen Moji (Generative Emojis), call transcription, and real-time translation for calls and FaceTime. Gen Moji are AI-created emojis for personalized expression.
  • Adoption Gaps: Despite the utility of features like real-time translation, Olivia notes a surprising lack of widespread adoption so far.
  • Actionable Insight for Crypto AI Investors/Researchers: Apple's measured pace and reliance on partnerships for some advanced AI functionalities may create openings for specialized third-party AI applications, including decentralized ones, to thrive on iOS. Researchers could explore privacy-preserving AI solutions that align with Apple's user privacy stance but offer more robust capabilities.

ElevenLabs' 11v3: Revolutionizing AI Voice with Emotional Depth and Text-Based Control

  • Olivia introduces ElevenLabs' third-generation text-to-speech model, 11v3, as a significant advancement in AI voice generation. ElevenLabs is a company focused on creating realistic AI voices. Text-to-speech technology converts written input into audible speech. The standout feature of 11v3 is its ability to imbue voices with a wide range of emotions, inflections, and even accents directly through text-based "tags"—commands like "sadly," "whispering," or "interrupted." This bypasses the older, more cumbersome method of recording an emotional voice sample to guide the AI.
  • Demonstration: Olivia shares an example she created: a dialogue featuring a character with a Texan accent, cow mooing sound effects, and a natural-sounding interruption, all orchestrated via text prompts within the 11v3 editor.
  • Quote: Olivia explains, "And now they essentially take all of the weird inflections, emotion, even accents, and they turn it into text prompting through these things called tags."
  • Impact on Storytelling: Justine and Olivia agree that this level of control, combined with advancements like Google's V3 video, unlocks vast new possibilities for AI-driven narrative storytelling in video, gaming, and advertising.
  • Actionable Insight for Crypto AI Investors/Researchers: The granular, text-based control over vocal emotion and delivery offered by 11v3 is a powerful tool for creating immersive experiences. This could be leveraged in crypto for dynamic NFT characters, AI-driven game narratives in play-to-earn ecosystems, or more persuasive AI assistants in decentralized commerce.

Decoding the AI Gold Rush: Consumer Startups Outpace B2B in Revenue Ramp

  • Olivia presents compelling data from A16Z's analysis of AI startup growth, revealing that consumer AI companies are monetizing and scaling revenue at an unprecedented, and often faster, pace than their B2B counterparts. This marks a significant reversal of pre-AI trends where B2B SaaS typically showed faster initial revenue traction and consumer apps monetized much later, if at all directly.
  • Key Statistic: "What we found was actually pretty surprising which is that the median ARR (Annualized Recurring Revenue) is now $4.2 million at month 12 for consumer startups," Olivia states. ARR is a projection of recurring subscription revenue over a year. The top quartile reaches $8.7 million ARR in the first year.
  • Monetization Shift: The high inference costs (cost to run AI models for each user query) of early AI models compelled consumer companies to adopt subscription models. Consumers have shown willingness to pay, with average AI subscriptions around $22/month, double pre-AI rates.
  • Drivers of Willingness to Pay: Powerful AI-native products in creative tools, companion apps, education (language learning), and personalized coaching (nutrition) are delivering tangible value.
  • Retention & Expansion: While "AI tourism" (high churn of free users) exists, paid user retention is comparable to pre-AI benchmarks. Notably, consumer AI sees significant revenue expansion through upsells (e.g., credit packs), a dynamic previously common mostly in enterprise SaaS or gaming.
  • Actionable Insight for Crypto AI Investors/Researchers: The rapid monetization and high consumer willingness to pay for AI services present a strong case for consumer-focused crypto AI ventures. Business models incorporating tokenomics with subscription tiers or pay-per-use credits could find fertile ground, especially if they offer clear utility or novel experiences.

Full-Stack AI Brands: Creating "Melt" Froyo with Flux Context and Generative Tools

  • Justine demos how new AI tools can facilitate rapid brand creation, using her "Melt" Froyo brand concept as an example. The workflow involved using ChatGPT for initial ideation (name, branding), Ideogram (an AI image generator strong with text and logos) for logo design, and then Flux Context for product and store imagery.
  • Flux Context: This new image editing model from Black Forest Labs, hosted on platforms like Krea, allows users to edit images using natural language prompts with remarkable consistency in maintaining the subject's appearance across different modifications or environments. Justine describes it as "Photoshop but with natural language prompts."
  • Demonstration: Justine showed how she took an initial AI-generated image of a "Melt" Froyo cup and used Flux Context to place it in various settings (on a counter, in someone's hand), change its color, and even generate images of a "Melt" branded storefront.
  • Future Potential: The speakers envision "full-stack AI brands" where everything from logo and product design to marketing assets and even AI avatar-led ad campaigns are AI-generated. Olivia notes, "the next generation of entrepreneurs are going to be completely AI assisted."
  • Actionable Insight for Crypto AI Investors/Researchers: Tools like Flux Context dramatically lower barriers for creating compelling visual assets, crucial for marketing and community building in crypto AI projects. Researchers can explore how such consistent image generation and editing can be applied to creating diverse synthetic datasets for training AI models, or for generating unique, verifiable digital assets (NFTs) at scale.

This week's AI advancements in video, voice, and image editing are democratizing creation and forging new, rapidly monetizing AI-native business models. Crypto AI investors and researchers must closely track the accessibility and cost-efficiency of these tools to identify emerging opportunities in decentralized media, AI-driven entrepreneurship, and novel digital asset creation.

Others You May Like