This episode reveals how Macrocosmos's Subnet 13 is turning 55 billion scraped social media posts into a foundational data layer for AI, tackling the critical challenge of generating real-world revenue beyond Bittensor's protocol emissions.
The Cosmic Scale of Subnet 13
- Will Squires and Chris Zacharia from Macrocosmos begin by detailing the immense scale of their data scraping subnet. Subnet 13 has aggregated over 55 billion rows of data from sources like X, Reddit, and YouTube, a volume so large that if printed on A4 paper, the stack would reach 25% of the way to the moon. This operation is a testament to the power of decentralized incentives on the Bittensor network, a protocol that incentivizes participants to contribute machine intelligence.
- Daily Volume: The subnet scrapes approximately 300 million rows of data daily, effectively indexing most of the relevant content on X and Reddit.
- Data Validation: A critical component is the validation system. Miners must continuously verify that scraped data remains "real" and hasn't been deleted or edited, a process that requires checking 150 million rows daily through complex, miner-developed "shadow systems."
- Investor Insight: The sheer scale and the emergent complexity of the miners' operations demonstrate Bittensor's capacity to outperform centralized data-gathering systems. This decentralized approach creates a powerful, self-optimizing data acquisition engine.
The Unknowable Power of Decentralized Miners
- Will explains that the true power—and mystery—of Subnet 13 lies in its incentive mechanism. Unlike prescriptive subnets that dictate exact tasks, Subnet 13 sets a goal (scrape data) and lets miners innovate freely to achieve it. This has led to an arms race where methods like using 40,000 X accounts purchased from Russia are no longer competitive.
- Emergent Innovation: The team admits they don't know the exact methods the top miners use, speculating it could range from massive account parallelization to direct access to data firehoses.
- Quote: Will likens the subnet's incentive design to a Texas highway: "You just take that Mustang and you put your foot down, Sunny, and let that V8 work out the rest of it. And the miners are the V8 engine."
- Strategic Implication: This "black box" innovation is a core feature of well-designed decentralized networks. For investors, it signifies a system that can adapt and scale in ways that are difficult for centralized competitors to replicate. Within six months of launch, Subnet 13 became the largest repository of social media data on Hugging Face, a key platform for sharing AI models and datasets.
Commercial Strategy: "Bigger, Faster, Better"
- When engaging commercial clients, Macrocosmos simplifies its pitch. Instead of detailing the complexities of the Bittensor network, they focus on the outcome: a superior data scraping service. Zach notes that the $5 billion data scraping market is mature, and customers primarily care about performance, not the underlying mechanics.
- Value Proposition: The core message to customers is that their system is "bigger, faster, and better" than alternatives.
- Legal Abstraction: From a legal standpoint, Macrocosmos provisions data from miners, who are required to acquire it legally. This creates a cleaner operational model than running a proprietary scraping operation that might violate terms of service.
- Investor Takeaway: The ability to abstract away blockchain complexity is crucial for mainstream adoption. The success of this strategy will depend on translating decentralized technical superiority into a simple, compelling commercial product.
Powering the Bittensor Ecosystem: Subnet Partnerships
- A key sign of Subnet 13's foundational role is its integration with other Bittensor subnets, which are coming to Macrocosmos for data, not the other way around.
- Subnet 57 (Gaia): This global weather forecasting subnet uses Subnet 13 to add a "social context layer." By analyzing social media discussions, Gaia can move beyond predicting weather to modeling its impact on communities and predicting human behavior in response to climate events.
- Subnet 44 (Score): A sports prediction and computer vision subnet, Score uses Subnet 13's data to analyze how social media sentiment affects player performance. This helps model the psychological variables in sports, which are notoriously difficult to forecast.
- Subnet 64 (Squads): This AI agent platform, built by Taoshi, launched with Subnet 13 as one of its two inaugural tools. It allows users to build agents that can scrape social media for real-time context, demonstrating the data's utility for autonomous systems.
Evolving from Volume to Real-Time Access
- Will reveals that Subnet 13 was initially designed for volumetric scraping to fuel AI model training, prioritizing scale over speed. However, demand from partners and users revealed a strong need for real-time data access for applications like sentiment analysis and immediate signal detection.
- Dual-Use Case: The subnet now supports both large-scale, asynchronous scraping and a real-time API, demonstrating an ability to adapt to market needs.
- Upcoming Data Marketplace: Macrocosmos is developing a "Data Universe" marketplace. This platform will allow users to query, package, and share curated datasets, moving from a raw data firehose to a user-friendly, queryable repository, similar to a "Bittensor Hugging Face."
- Actionable Insight: The development of this marketplace is a critical strategic move. Researchers and investors should monitor its launch, as it represents a direct attempt to monetize the massive dataset and create a user-centric product layer on top of the subnet's raw capabilities.
Data Strategy: Balancing Virality with Rich Knowledge
- The team is strategically expanding its data sources beyond social media. The focus is shifting towards longer-form, knowledge-rich content that is more valuable for training sophisticated AI models.
- Structured vs. Unstructured Data: While Subnet 13 provides raw, unstructured data, this serves as the fuel for other subnets like Ready AI to create structured, labeled datasets. The marketplace will also feature curated, cleaned, and structured data to add value.
- New Data Sources: Macrocosmos is exploring sources like academic archives (e.g., arXiv), legal documents, and planning submissions. These provide validated, high-knowledge data crucial for training models that require deep understanding, not just sentiment signals.
- Quote: Zach highlights the dual utility they seek: "Generally our bias is towards longer form content that has these dual use cases of being relevant and valuable in the AI field and useful and accessible in the sort of social intelligence field."
The Challenge: Monetization Beyond Protocol Emissions
- The conversation concludes with a candid discussion on the primary challenge facing all Bittensor subnets: achieving real-world commercial viability. Zach provides a critical perspective on the dangers of relying solely on emissions—the native token rewards distributed by the protocol.
- The "Double-Edged Sword" of Emissions: While emissions are effective for bootstrapping a network, they can distract teams from the difficult work of finding product-market fit and generating external revenue.
- The Ultimate Test: Success for Subnet 13 is defined by two clear goals: creating high-quality datasets for training AI models like IOTA and driving external, real-world revenue into the subnet.
- Advice for Subnets: Zach's advice is to seek market signals early and relentlessly. "Actually going into a real world marketplace is a real test and subnets need to pass that test sooner rather than later before they invest too much time and capital on routes that actually don't lead to real world impact."
Conclusion
This discussion underscores a critical inflection point for Bittensor subnets: the transition from building powerful, decentralized commodities to creating viable commercial products. Investors and researchers must scrutinize a subnet's strategy for generating external revenue, as this, not emissions, will ultimately determine its long-term success and impact.