This episode unpacks ReadyAI's mission to structure the world's data for AI, detailing how transforming raw information—from social media to podcasts—into actionable insights drives value for enterprises and the Bittensor ecosystem.
ReadyAI's Mission and Roadmap
- David Fields, representing ReadyAI, outlines the company's core mission: structuring global data to make it universally accessible for AI applications.
- He introduces ReadyAI's toolkit, starting with the "jobs interface" available on their site. This tool allows users (enterprise or individual) to structure data, generate metadata tags, perform sentiment analysis, and apply custom tagging to both private and public datasets (like those on Hugging Face).
- David emphasizes that structured data is fundamental for optimizing AI model performance.
- Looking ahead, ReadyAI is piloting an end-to-end product for businesses, focusing on real-time social signal aggregation.
- This upcoming offering aims to provide enterprises with structured data outputs, enabling them to track sentiment shifts around their brand and competitors.
- David notes this is currently being tested with about a dozen enterprises and will roll out soon.
Data Sources and Structuring Process
- ReadyAI processes diverse data types.
- Initially focused on structuring large volumes of varied podcast data, the subnet now handles organic queries submitted through the jobs interface, primarily involving Hugging Face datasets.
- A key collaboration involves Subnet 13 (Data Universe), which aggregates raw social media data (mainly Twitter/X and Reddit).
- ReadyAI then processes this data, adding structure through sentiment analysis and relevant metadata tagging.
- David explains this structured output helps organize data for use in RAG systems or for fine-tuning AI models.
- He clarifies ReadyAI's role as the processing layer that derives insights from the raw data aggregated by partners like Subnet 13.
The Case for Structured Data: Accuracy, Cost, and RAG
- Accuracy: Large amounts of unstructured data in a context window can reduce the accuracy of the information retrieved, as the model struggles to pinpoint relevant details.
- Cost: AI model interactions are priced per token (units of text like words or characters). Querying large unstructured documents repeatedly incurs high token costs. David states, "if you instead take that information and vectorize it and put it into a vector database using a rag implementation, you can cut that, you know, query cost, you know, down by, you know, 100 or a thousandx."
- RAG (Retrieval-Augmented Generation): This technique involves retrieving relevant information from an external knowledge base (like a vector database containing structured data) before generating a response. David explains that using RAG with structured, vectorized data significantly lowers query costs and improves accuracy compared to relying solely on large context windows.
- Vectorization converts data into numerical representations (vectors) that capture semantic meaning, allowing efficient similarity searches in a vector database.
Enhancing AI Agents with Structured Data
- Structured data is crucial for the reliability of AI agents (autonomous programs performing tasks). David uses the example of Twitter-based AI agents like AIXBT, noting they sometimes provide incorrect information because they process a raw, untagged data firehose.
- Structuring this data—tagging relevance, source trustworthiness, etc.—prevents "garbage in, garbage out."
- He explains that structuring helps agents efficiently access the specific information needed, reducing token usage and improving the likelihood of correct outcomes.
- This is vital as agents performing multi-step tasks (like research or making purchases) require high accuracy at each step to avoid overall failure.
- David mentions the emergence of standards like MCP (Model Context Protocol) by Claude (Anthropic) aims to create consistent ways information is fed into agents, underscoring the need for organized data inputs.
Evolving Data Formats: The Importance of Markdown
- David notes that ReadyAI is exploring different structured data formats beyond basic tagging.
- He specifically highlights Markdown, a lightweight markup language with plain-text formatting syntax, as increasingly important.
- Research shows feeding Markdown-formatted information into RAG systems significantly boosts accuracy compared to unstructured data.
- ReadyAI aims to support an evolving set of structured data standards as AI models develop.
- David emphasizes the goal is to continuously adapt the subnet to provide the most effective data outputs required by new AI techniques and standards.
Subnet 33 Evolution: Incentive Mechanisms and Expansion
- Expanding the types of data ReadyAI's subnet (Subnet 33 on Bittensor) can process requires careful evolution of its incentive mechanism.
- Bittensor is a decentralized network that incentivizes the creation and operation of specialized AI models (subnets) through its tokenomics.
- David explains the core challenge is ensuring the mechanism accurately rewards miners for producing high-quality structured output across diverse data types.
- ReadyAI has stabilized its mechanism after initial challenges and is now focused on generalizing how organic data is ingested and evaluated.
- Future plans involve incorporating a time dimension into the incentive mechanism, evaluating not just the quality but also the speed at which miners structure data.
- The goal remains aligning miner scores directly with the quality and utility of their output.
Incentive Mechanism Stability and Miner Ecosystem Growth
- David confirms that ReadyAI's incentive mechanism has stabilized after overcoming early issues like "tag stuffing," where miners added irrelevant tags to game the scoring system.
- He stresses the importance of the miner community and ReadyAI's efforts to grow it.
- They recently released a "miner optimization toolkit" with a Docker image (a standardized unit of software packaging) downloadable from Docker Hub.
- This toolkit simplifies the onboarding process, allowing miners to run on various hardware (from Raspberry Pi to Mac M-series silicon) with one click.
- While onboarding is easier, David cautions that competition remains fierce, with top miners significantly outperforming standard models like GPT-4o.
- The aim is to attract top talent by lowering entry barriers.
Enterprise Adoption: Privacy, Use Cases, and Monetization Strategy
- ReadyAI is actively engaging with enterprises, leveraging the team's background (Disney, Google/AdSense acquisition) and a dedicated sales director.
- David acknowledges enterprise concerns around data privacy and security.
- Strategically, ReadyAI is currently focusing on structuring publicly accessible data, like social media signals and Common Crawl web data, avoiding sensitive customer data flowing through the public subnet for now.
- For future handling of private data, they are exploring Trusted Execution Environments (TEEs)—secure areas within a processor ensuring code and data confidentiality and integrity—and privacy-preserving token techniques where data is anonymized before reaching miners.
- The primary enterprise use case currently revolves around deriving insights (like sentiment analysis, competitor tracking, customer risk signals) from public social media data.
Business Model Insights: Comparisons to Scale AI and Product Strategy
- David draws parallels between ReadyAI's business model and Scale AI, a company known for data annotation using human labelers.
- He argues that enterprises are already comfortable sharing anonymized data with distributed human workforces (like Scale AI's annotators), suggesting the model of distributing data to decentralized miners on Bittensor isn't an insurmountable privacy hurdle for many use cases.
- ReadyAI aims to emulate Scale AI by not only providing the structured data pipeline but also working directly with companies (like their Common Crawl project) to build custom AI models and systems leveraging that data.
- Additionally, they plan to offer "wrapper" products—seamless end-user applications built on top of the subnet, where the underlying Bittensor infrastructure might be invisible to the user.
Partnership Deep Dive: Common Crawl Collaboration
- The partnership with Common Crawl, a non-profit maintaining an open repository of web crawl data, serves as a key case study.
- ReadyAI processed Common Crawl's public data (from their listserv, Discord, and the web crawl itself) using Subnet 33.
- This structured data was used to build a highly accurate AI model/agent for Common Crawl's community of AI researchers.
- This agent provides insights based on Common Crawl's data and is continuously updated in real-time as new information becomes available.
- The model is accessible on Common Crawl's site, Discord, and Slack, demonstrating a practical application of ReadyAI's structured data pipeline.
Target Sectors and Showcasing Capabilities
- While engaging various enterprises, David emphasizes that ReadyAI is also focused on showcasing the power of structured data.
- The Common Crawl project is one example.
- Another upcoming project involves collaborating with a prominent crypto Twitter creator to build a sophisticated AI agent (similar in concept to AIXBT but aiming for higher quality).
- This agent will integrate diverse data sources, including social media and on-chain data, structured by ReadyAI's subnet.
- David states the goal is to demonstrate tangible value and accelerate adoption by showcasing what's possible with high-quality structured data inputs.
- He indicates this creator-focused agent will be announced in the coming weeks.
Navigating the Detail (DTA) Launch: Market Volatility and Strategy
- David shares his perspective on the launch of Detail (DTA), Bittensor's major 2.0 upgrade introducing dynamic token allocation across subnets.
- He praises the technical execution by the core team (Opentensor Foundation).
- ReadyAI anticipated high volatility in the initial 30-60 days due to the fair launch nature of subnet tokens and extremely low liquidity.
- Their strategy was to remain "heads down," focusing on building and delaying major public announcements until the market stabilized (~45 days post-launch).
- This was due to the initial downward price pressure caused by the root proportion mechanism in a low-liquidity environment.
- With the market now more stable and liquidity improved, ReadyAI is becoming more active publicly.
- David expresses confidence in the subnet's trajectory moving forward.
Analyzing Detail Dynamics: Sum Prices and Tokenomics
- David discusses sum prices, a metric comparing a subnet token's price to the price of TAO (Bittensor's native token).
- He notes recent sum prices exceeding 1 (hitting ~2.04 at the time of recording), suggesting the market values subnet outputs (the "commodities") beyond just their share of TAO emissions.
- This indicates confidence in individual subnets' execution and value proposition.
- He explains the initial downward pressure stemmed from the root vs. alpha proportion.
- At launch, 100% of validator rewards flowed through the "root network" (based on overall TAO stake), and these rewards (paid in subnet tokens) were often immediately sold back into TAO, suppressing subnet token prices.
- Over 60 days, this shifted gradually towards the "alpha proportion" (based on stake delegated directly to the subnet's validators using the subnet's own token).
- David mentions his subnet was around 40% alpha / 60% root at the time of recording, moving towards 50/50.
- He cautions that while sum prices above 1 are positive, the low-liquidity environment means volatility will likely continue.
- He also clarifies that Detail's design doesn't inherently pull sum prices back to 1; it's a free market dynamic.
- High APRs (Annual Percentage Rates) on alpha tokens are possible but highly volatile, as they depend heavily on the amount of alpha staked.
Operational Impact of Detail
- David states Detail hasn't fundamentally changed ReadyAI's day-to-day operations much, aside from managing team expectations regarding initial token price volatility.
- Their focus remains on long-term building, and they have no immediate plans to sell their token holdings.
- The main operational consideration was navigating the early low-liquidity, high-volatility period.
Long-Term Vision: Enterprise AI Readiness and Data's Role
- Looking 5-10 years out, ReadyAI aims to make all the world's data AI-ready.
- This involves creating open-source datasets to benefit the broader AI community and, crucially, helping enterprises navigate their transition to becoming AI-ready.
- David observes that most large companies are "not even in the first inning" of AI adoption.
- He highlights enterprise challenges: data silos across different systems and business units, concerns over data privacy (especially with models from companies like OpenAI perceived as not respecting data ownership), and the difficulty of integrating legacy systems.
- The ultimate goal for enterprises, which ReadyAI wants to facilitate, is creating seamless systems where authorized employees can access cross-company insights instantly.
- David believes AI holds the promise to make large organizations vastly more agile. "We think data... may be the most important piece of... what makes AI so powerful," he asserts.
Strategic Advice for Subnets: Horizontal vs. Vertical Integration
- David reflects on a shift in the Bittensor ecosystem post-Detail.
- While subnets were initially designed primarily as commodity producers for validators, Detail emphasizes the role of subnet owners (often the largest validators on their own subnet) in monetizing these commodities directly, driven by alpha token staking.
- He suggests that even for horizontally-focused subnets like ReadyAI (providing infrastructure/data usable by many), there's a need to also build vertical products.
- While the structured data pipeline is a horizontal offering, ReadyAI is also building specific applications (like the Common Crawl agent, the upcoming creator agent, enterprise wrappers) on top of it.
- He advises other subnets to consider that while millions of businesses need foundational tools (like vector databases), the larger unserved market lies in providing tailored solutions and end-user products built upon the subnet's core commodity.
Conclusion
ReadyAI's strategy underscores structured data's pivotal role in boosting AI model accuracy and cost-effectiveness, particularly for enterprises navigating early AI adoption. Crypto AI investors and researchers should closely monitor the development of structured data pipelines like ReadyAI's and the evolving tokenomic incentives within Bittensor's Detail framework.