This episode reveals how Tensorplex Dojo is pioneering a human-in-the-loop system on Bittensor to solve AI's critical data scarcity problem, creating high-quality, human-generated datasets that could unlock the next wave of model improvements.
Introduction to Tensorplex and Backprop Finance
- Backprop Finance: A trader-centric platform and progressive web app (PWA) that allows users to trade and analyze subnets without needing a separate wallet. It features a non-custodial wallet generated locally on the user's device, streamlining staking and swapping.
- Traction: The platform has processed over $500 million in transaction volume across more than 200,000 transactions.
- The Backprop Analogy: Darwin explains the name is a nod to backpropagation, a core concept in training neural networks. He draws a powerful analogy where the Bittensor network is a three-layer neural network, and staking or unstaking via Backprop Finance is akin to the backpropagation step, adjusting the "weights" (i.e., capital allocation) of subnets to optimize the entire network's output.
Darwin: "If you think about the entire of the tensor network, it's kind of like a three-layer neuro network... by staking and unstaking onto the subnet you effectively adjust the weights of the neurons... That's why back prop finance the interface is kind of like the back propagation step for the entire tensor network."
The Genesis of Dojo: A Human Feedback Layer
- The Problem: If a miner develops a model superior to what validators can assess, it risks being penalized. This creates a ceiling on innovation within a subnet.
- Dojo's Solution: To break this ceiling, Tensorplex proposed abstracting human feedback into a dedicated service layer. Dojo acts as a universal "human feedback layer" that any subnet can tap into to complement its automated validation mechanisms.
- Strategic Goal: This allows the network to validate and reward true state-of-the-art performance, pushing the boundaries of what can be built on Bittensor and attracting attention from the broader AI ecosystem.
The AI Data Bottleneck and Model Collapse
- The Data Limit: A 2022 study projected the world would exhaust high-quality public data for training large models between 2026 and 2032. This problem is accelerated as platforms increasingly place content behind logins.
- Synthetic Data's Pitfall: The natural solution is to generate synthetic data, but this introduces the risk of model collapse. This phenomenon occurs when a model is iteratively trained on its own synthetic outputs, causing its biases to compound until the generated data becomes corrupted and useless.
- Actionable Insight: Human validation is the critical ingredient to prevent model collapse. It "grounds" the synthetic data generation process, ensuring that the data remains aligned with real-world quality and user preferences.
The Critical Role of Human Preference and RLHF
- Why Humans Are Needed: While a model can be trained via simulation for tasks with clear, objective functions (e.g., winning a game of Go), it fails on subjective tasks. Defining what makes an interface "intuitive" or an image "visually appealing" requires human judgment.
- The Moat of Human Preference: The host compares this to Scale AI's business model, which is built on a moat of human-labeled data. You cannot synthetically generate the "ground truth" of human preference; it must be sourced directly from humans.
Darwin: "Unless we're training for some other entity to use this AI models, then we'll have to get feedback from that entity instead. But in this case, it's all a human-centric development."
Dojo's Subnet Mechanism: Generating and Ranking Tasks
- The Workflow:
- Task Generation: A validator synthetically generates a prompt, for example, "create an interactive educational quiz about data packets."
- Output Generation: The validator uses a model to generate multiple different interactive UIs based on that prompt.
- Human Ranking: These outputs are sent to miners (human labelers), who must interact with each version—testing buttons, answering questions, and trying to break them—to rank them based on quality, appeal, and functionality.
- Investor Takeaway: This mechanism is designed to generate data for tasks that are difficult for AI to evaluate automatically, such as user experience in interactive applications. The data collected is therefore unique and highly valuable.
Validating Human Input: The Perturbation Method
- The Challenge: In subjective tasks, there is no single "correct" answer, making it difficult to score miners' contributions.
- The Solution: Validators create a "relative ground truth" by introducing perturbations—slight changes to the prompts. Some of these are negative augmentations, designed to subtly degrade the quality of the output in ways that only an attentive human would notice.
- Validation in Action: Miners are scored on their ability to consistently rank the negatively perturbed (lower quality) outputs below the original ones. This allows the network to validate the integrity of the human feedback without needing a definitive ground truth.
The Miner Ecosystem: Industrialized Human Labeling
- Industrialized Operations: Darwin reveals that large, organized teams of miners have emerged.
- A team in Vietnam hires students and recent graduates to perform labeling tasks.
- Teams in China have developed sophisticated AI pipelines to automate parts of the workflow, with humans stepping in for final validation and complex cases.
- Lowering Barriers to Entry: To encourage broad participation, Dojo offers a platform where miners can provide API keys to non-technical labelers, who can then contribute directly through a web interface without needing to set up their own server.
- Strategic Insight: The emergence of these specialized "human labeling farms" validates the economic model of the subnet and proves its ability to source human intelligence at scale.
Iterative Improvement with Rich Human Feedback (RHF)
- Beyond Rankings: In addition to ranking outputs, miners now provide specific, written text feedback on how to improve the best-ranked option.
- The Iterative Loop: This text feedback is fed back into the generation pipeline to create a new, improved output. This new version is then returned to the miner pool for another round of ranking and feedback.
- Data Flywheel: This continuous, iterative process creates an incredibly valuable dataset. It not only improves the quality of the synthetic data but also captures the reasoning process of human labelers, which can be used to train more advanced reasoning models.
Proof of Value: Fine-Tuning a State-of-the-Art Model
- The Experiment: The team took the Qwen2.5 Coder (a 7-billion parameter model) and fine-tuned it using two types of data from Dojo:
- Codebases: Over 160,000 UI codebases generated on the subnet were used for supervised fine-tuning.
- Human Preferences: Miner rankings were used to perform DPO (Direct Preference Optimization), a technique that directly trains a model on human preference data (e.g., "Output A is better than Output B").
- The Results: The model fine-tuned with DPO produced "substantially more visually appealing" and functional user interfaces than the base model. Remarkably, this specialized training also led to generalized performance improvements across other coding benchmarks.
- Actionable Insight: This result proves that Dojo's human-generated data is not just theoretical; it is a tangible asset capable of producing best-in-class, specialized AI models that can outperform larger, more generalized ones in specific domains. Darwin asserts this is likely the best model in the world at its size for this specific task.
Future Directions and Long-Term Vision
- Ecosystem Collaboration: Dojo is partnering with Subnet 17 (3D Generation) and Subnet 44 (Video Analytics) to provide a human feedback and validation layer for their outputs.
- New Frontier: Robotics: The team is working with a robotics startup, recognizing a huge opportunity for human operators to provide feedback for complex physical tasks that are difficult to simulate.
- Long-Term Vision:
- Near-Term: Evolve into a platform for human-agentic collaboration, where AI agents can tap the human labor pool to complete tasks.
- Far-Term: Serve as a crucial alignment mechanism, allowing humanity to steer the values and principles of advanced AI systems.
Conclusion
Dojo's mechanism for generating high-quality, human-verified data creates a defensible moat and a unique asset for training specialized AI models. Investors and researchers should monitor the performance of models fine-tuned on this data, as it represents a new frontier for creating differentiated AI capabilities within the Bittensor ecosystem.