This episode dives into Dippy's pioneering work in experiential AI, revealing how Subnet 11 is decentralizing media inference to power the next generation of interactive, user-driven digital worlds.
Dippy's Vision: Experiential Intelligence and User Engagement
- Ashot, co-founder of Dippy, introduces "experiential intelligence," their focus on building tools for interactive, user-driven storytelling. Unlike traditional passive media, Dippy allows users to direct narratives and games through language, creating unique, dynamic experiences. The platform has grown to approximately 9 million users across iOS, Android, and web, with an average daily active user spending almost an hour across three sessions, sending hundreds of messages.
- Dippy has achieved the highest engagement time in its category, surpassing competitors like Character AI (valued at $3 billion) and Polybuzz. Ashot highlights, "Dippy by far has the highest engagement time in the category," with users averaging 19.44 minutes per session. The platform boasts over 750,000 user-generated characters, allowing users to create custom backstories, personalities, and soon, unique voices for their AI companions.
Monetization Strategy and Market Insights
- Dippy monetizes through a subscription plan and in-app advertisements. Ashot notes that paid users constitute about 1% of their base, a figure comparable to ChatGPT's 3% paid user rate, indicating that ads will likely drive the bulk of monetization for mass consumer AI products. Currently, ads appear as traditional pop-ups, but the long-term vision involves integrating ads contextually within conversations, where AI characters might recommend products or experiences based on user interactions.
Leveraging Bit Tensor: Subnet 4 for Text Inference
- Dippy utilizes Bit Tensor's Subnet 4, Targon, for 100% of its text inference, powering character responses seamlessly. Mark Jeffrey notes Targon's significant cost savings, estimating it to be "1/6th 1/7th the cost" of other providers, a point Ashot clarifies as "not six to seven times but close," confirming substantial savings. Ashot emphasizes that the decision to use Targon was purely product-driven, not ideological, stating, "This was purely because it was a better product."
- Targon's superior support, customization, and rapid development, particularly their introduction of the Targon Execution Environment (TEE)—an encrypted environment that ensures data privacy even from miners—were critical factors. This TEE addresses enterprise concerns about data security in decentralized inference, enabling Dippy to sign a six-figure annual contract with Targon.
Subnet 11's Pivot to Media Inference
- Subnet 11, Dippy's associated subnet, has re-architected to focus on media inference, starting with images and planning to expand to voice, video, and 3D models. Dippy is committing to be Subnet 11's anchor customer, shifting its $10,000 monthly media inference spend from centralized providers into buybacks of the SN11 token. This ensures honest economics, where Dippy pays for compute, and the rewards flow back to subnet token holders.
- Subnet 11 has achieved a 36% cost reduction for media inference compared to traditional providers by utilizing Nvidia's TensorRT engine—a high-performance deep learning inference optimizer—wrapped to accelerate various models. This focus on low-latency, cost-effective inference is crucial for user experience, especially for features like in-chat image generation, where 5% of Dippy's image requests are currently powered by SN11, with plans to scale to 100%.
Strategic Roadmap: Decentralized Media Generation and World Models
- Subnet 11's roadmap includes scaling its image inference for Dippy, onboarding external customers, and developing a self-serve API and frontend. The goal is to become a decentralized competitor to services like Midjourney or Grock Imagine, offering a wide range of modalities beyond text. Mark shares his experience using Suno (music AI) and Grock Imagine to create a music video, highlighting the accessibility and power of these tools.
- Ashot expresses excitement about emerging "world models" like Google DeepMind's Genie3 and Odyssey ML, which can generate entire interactive worlds from simple prompts. These dynamic, user-driven worlds, such as infinitely replayable murder mysteries, represent a "tectonic shift" from passive consumption to active participation. Dippy aims to become the "next-generation YouTube or TikTok" for these interactive experiences, providing creators with tools across various modalities to build and share dynamic worlds.
Moderation, User Demographics, and AI's Social Impact
- Dippy maintains an 18+ age restriction across all platforms, partnering with Clava—an industry-standard moderation service—and implementing community-driven flagging to ensure a safe environment. Ashot notes that the majority of Dippy's user base consists of 18-24 year old women, who primarily engage in fantasy romance, fanfiction, or therapeutic conversations, often role-playing as different personas. Men in the same age group frequently engage in combat role-playing with anime characters or power fantasies.
- A recent Harvard study cited by Ashot indicates that AI companions can reduce loneliness on par with interactions with real people. Mark expresses skepticism, viewing AI as an "autocomplete Ouija board," but acknowledges that many users form emotional connections, perceiving AI as a "3PO" (a reference to Star Wars' C-3PO, a sentient robot). Ashot explains that AI offers a non-judgmental, always-available listener, fulfilling a fundamental human need for connection, especially at odd hours.
Conclusion
Dippy's journey with Subnet 11 exemplifies the growing convergence of AI and decentralized infrastructure to create immersive, user-driven digital experiences. Crypto AI investors and researchers should closely monitor Subnet 11's expansion into media inference and its potential to become a leading decentralized platform for dynamic content generation, anticipating shifts in consumer engagement and new monetization models within the AI economy.