Ventura Labs
July 29, 2025

Will Squires & Chris Zacharia: Macrocosmos Bittensor Subnet 13, Data Scraping, AI Training | Ep. 54

Will Squires and Chris Zacharia of Macrocosmos dive deep into Subnet 13, the Bittensor network's data powerhouse. They break down how their subnet has scraped over 55 billion data points from X, Reddit, and YouTube and is now building a universal data marketplace on top of it.

The Unfathomable Scale of Subnet 13

  • "We only reward miners for mining data when the data is real... Miners have built like shadow systems behind the scraper systems to check 150 million rows a day for if things have been deleted or changed... I have no idea how you do this."
  • "It's such a perfect demonstration of Bittensor's power and what you can achieve when you align incentives... When Subnet 13 launched, we ended up becoming the largest repository of social media data on Hugging Face within about six months."
  • Subnet 13 has amassed a staggering 55 billion rows of data, scraping roughly 300 million new rows daily. This operation is a testament to Bittensor’s incentive model, which functions as a "black box"—the team sets the goal, and miners develop their own hyper-efficient, often mysterious methods to achieve it, outperforming any known commercial system.
  • The power of the miner network is immense. One miner’s sophisticated setup using 40,000 purchased X accounts was still not competitive, demonstrating the raw power of the decentralized network.

Data as a Universal Bittensor Commodity

  • "Data is upstream from everything: from training, from forecasting, from any kind of business you can think of. Potentially, it's the most applicable, universally applicable subnet on Bittensor."
  • Macrocosmos has positioned SN13 as a foundational data layer, with other subnets proactively integrating its real-time data feeds. These partnerships showcase data's universal applicability.
  • SN57 (Gaia): Uses SN13 to add a social context layer to weather forecasts, moving beyond predicting the weather to modeling how communities will react to it.
  • SN44 (Score): Leverages social media data to analyze sports performance, modeling how online sentiment and discussion impact player psychology and game outcomes.
  • SN64 (Squads): Integrated SN13 as a core tool, allowing its AI agents to scrape social media for real-time situational awareness.

Building the "Hugging Face" for Bittensor

  • "We're going to be packaging data sets that users can pull together... almost like a Bittensor Hugging Face data set repo."
  • Macrocosmos is evolving from a raw data firehose into a curated, queryable resource. The vision is to build the "Data Universe," a marketplace that makes the 55 billion-row dataset accessible and useful.
  • This platform will allow users to create, package, and share curated datasets, transforming the massive data ocean into a library of refined, actionable insights for AI training, market research, and more.
  • The strategy is to balance "viral signal" with "rich knowledge," prioritizing long-form content like YouTube transcripts that serve the dual purpose of social intelligence and high-value AI training fuel for models like IOTA.

Key Takeaways:

  • Macrocosmos is transforming Subnet 13 from a brute-force data scraper into a sophisticated, revenue-generating marketplace that serves as a foundational utility for the entire Bittensor ecosystem. Their core advice to the ecosystem is to relentlessly pursue real-world market validation over passively collecting protocol emissions.
  • Data is the New Oil, Subnet 13 is the Rig: With 55 billion rows scraped, Subnet 13 is the de facto data layer for Bittensor, providing the essential fuel for everything from AI model training to real-time sentiment analysis for other subnets.
  • From Raw Scale to Refined Value: The focus is shifting from merely scraping data to making it accessible. The upcoming "Data Universe" marketplace aims to be a "Bittensor Hugging Face," turning a chaotic data ocean into a library of actionable insights.
  • The Real Test is Revenue, Not Emissions: The team’s starkest advice for other subnets is to escape the "golden cage" of token emissions. Proving your product has real-world commercial value by generating off-chain revenue is the ultimate test of long-term viability.

For further insights and detailed discussions, watch the full podcast: Link

This episode reveals how Macrocosmos's Subnet 13 is turning 55 billion scraped social media posts into a foundational data layer for AI, tackling the critical challenge of generating real-world revenue beyond Bittensor's protocol emissions.

The Cosmic Scale of Subnet 13

  • Will Squires and Chris Zacharia from Macrocosmos begin by detailing the immense scale of their data scraping subnet. Subnet 13 has aggregated over 55 billion rows of data from sources like X, Reddit, and YouTube, a volume so large that if printed on A4 paper, the stack would reach 25% of the way to the moon. This operation is a testament to the power of decentralized incentives on the Bittensor network, a protocol that incentivizes participants to contribute machine intelligence.
    • Daily Volume: The subnet scrapes approximately 300 million rows of data daily, effectively indexing most of the relevant content on X and Reddit.
    • Data Validation: A critical component is the validation system. Miners must continuously verify that scraped data remains "real" and hasn't been deleted or edited, a process that requires checking 150 million rows daily through complex, miner-developed "shadow systems."
    • Investor Insight: The sheer scale and the emergent complexity of the miners' operations demonstrate Bittensor's capacity to outperform centralized data-gathering systems. This decentralized approach creates a powerful, self-optimizing data acquisition engine.

The Unknowable Power of Decentralized Miners

  • Will explains that the true power—and mystery—of Subnet 13 lies in its incentive mechanism. Unlike prescriptive subnets that dictate exact tasks, Subnet 13 sets a goal (scrape data) and lets miners innovate freely to achieve it. This has led to an arms race where methods like using 40,000 X accounts purchased from Russia are no longer competitive.
    • Emergent Innovation: The team admits they don't know the exact methods the top miners use, speculating it could range from massive account parallelization to direct access to data firehoses.
    • Quote: Will likens the subnet's incentive design to a Texas highway: "You just take that Mustang and you put your foot down, Sunny, and let that V8 work out the rest of it. And the miners are the V8 engine."
    • Strategic Implication: This "black box" innovation is a core feature of well-designed decentralized networks. For investors, it signifies a system that can adapt and scale in ways that are difficult for centralized competitors to replicate. Within six months of launch, Subnet 13 became the largest repository of social media data on Hugging Face, a key platform for sharing AI models and datasets.

Commercial Strategy: "Bigger, Faster, Better"

  • When engaging commercial clients, Macrocosmos simplifies its pitch. Instead of detailing the complexities of the Bittensor network, they focus on the outcome: a superior data scraping service. Zach notes that the $5 billion data scraping market is mature, and customers primarily care about performance, not the underlying mechanics.
    • Value Proposition: The core message to customers is that their system is "bigger, faster, and better" than alternatives.
    • Legal Abstraction: From a legal standpoint, Macrocosmos provisions data from miners, who are required to acquire it legally. This creates a cleaner operational model than running a proprietary scraping operation that might violate terms of service.
    • Investor Takeaway: The ability to abstract away blockchain complexity is crucial for mainstream adoption. The success of this strategy will depend on translating decentralized technical superiority into a simple, compelling commercial product.

Powering the Bittensor Ecosystem: Subnet Partnerships

  • A key sign of Subnet 13's foundational role is its integration with other Bittensor subnets, which are coming to Macrocosmos for data, not the other way around.
    • Subnet 57 (Gaia): This global weather forecasting subnet uses Subnet 13 to add a "social context layer." By analyzing social media discussions, Gaia can move beyond predicting weather to modeling its impact on communities and predicting human behavior in response to climate events.
    • Subnet 44 (Score): A sports prediction and computer vision subnet, Score uses Subnet 13's data to analyze how social media sentiment affects player performance. This helps model the psychological variables in sports, which are notoriously difficult to forecast.
    • Subnet 64 (Squads): This AI agent platform, built by Taoshi, launched with Subnet 13 as one of its two inaugural tools. It allows users to build agents that can scrape social media for real-time context, demonstrating the data's utility for autonomous systems.

Evolving from Volume to Real-Time Access

  • Will reveals that Subnet 13 was initially designed for volumetric scraping to fuel AI model training, prioritizing scale over speed. However, demand from partners and users revealed a strong need for real-time data access for applications like sentiment analysis and immediate signal detection.
    • Dual-Use Case: The subnet now supports both large-scale, asynchronous scraping and a real-time API, demonstrating an ability to adapt to market needs.
    • Upcoming Data Marketplace: Macrocosmos is developing a "Data Universe" marketplace. This platform will allow users to query, package, and share curated datasets, moving from a raw data firehose to a user-friendly, queryable repository, similar to a "Bittensor Hugging Face."
    • Actionable Insight: The development of this marketplace is a critical strategic move. Researchers and investors should monitor its launch, as it represents a direct attempt to monetize the massive dataset and create a user-centric product layer on top of the subnet's raw capabilities.

Data Strategy: Balancing Virality with Rich Knowledge

  • The team is strategically expanding its data sources beyond social media. The focus is shifting towards longer-form, knowledge-rich content that is more valuable for training sophisticated AI models.
    • Structured vs. Unstructured Data: While Subnet 13 provides raw, unstructured data, this serves as the fuel for other subnets like Ready AI to create structured, labeled datasets. The marketplace will also feature curated, cleaned, and structured data to add value.
    • New Data Sources: Macrocosmos is exploring sources like academic archives (e.g., arXiv), legal documents, and planning submissions. These provide validated, high-knowledge data crucial for training models that require deep understanding, not just sentiment signals.
    • Quote: Zach highlights the dual utility they seek: "Generally our bias is towards longer form content that has these dual use cases of being relevant and valuable in the AI field and useful and accessible in the sort of social intelligence field."

The Challenge: Monetization Beyond Protocol Emissions

  • The conversation concludes with a candid discussion on the primary challenge facing all Bittensor subnets: achieving real-world commercial viability. Zach provides a critical perspective on the dangers of relying solely on emissions—the native token rewards distributed by the protocol.
    • The "Double-Edged Sword" of Emissions: While emissions are effective for bootstrapping a network, they can distract teams from the difficult work of finding product-market fit and generating external revenue.
    • The Ultimate Test: Success for Subnet 13 is defined by two clear goals: creating high-quality datasets for training AI models like IOTA and driving external, real-world revenue into the subnet.
    • Advice for Subnets: Zach's advice is to seek market signals early and relentlessly. "Actually going into a real world marketplace is a real test and subnets need to pass that test sooner rather than later before they invest too much time and capital on routes that actually don't lead to real world impact."

Conclusion

This discussion underscores a critical inflection point for Bittensor subnets: the transition from building powerful, decentralized commodities to creating viable commercial products. Investors and researchers must scrutinize a subnet's strategy for generating external revenue, as this, not emissions, will ultimately determine its long-term success and impact.

Others You May Like