Opentensor Foundation
July 19, 2025

SN52 :: Tensorplex Dojo :: High-Quality Human-Generated Datasets on Bittensor

Darwin from Tensorplex breaks down Subnet 52, Dojo, a platform designed to solve AI’s looming data crisis by crowdsourcing high-quality human feedback. This isn't just about labeling images; it's about building an essential human-in-the-loop layer to ground, refine, and align AI across the entire Bittensor ecosystem.

The Human Feedback Layer for Bittensor

  • “Instead of doing it for every subnet, why not just abstract this away as a service where we essentially act as the human feedback layer that all subnets can tap onto to complement their validation mechanisms.”

As AI models risk "model collapse" from training on their own synthetic outputs, Dojo provides the crucial grounding of real human preference. It acts as a service for other subnets, enabling them to push beyond the limits of automated validation and create truly state-of-the-art models.

The Dojo Mechanism: Grounding AI with Human Preference

  • “It's based on this idea that only a human would be able to detect these subtle changes, subtle disimprovements in the outputs generated by an LLM in a complex prompt.”
  • Dojo’s validators generate complex tasks, like interactive UIs, and then create slightly flawed versions. Miners, acting as human labelers, must rank these outputs. This clever trick validates the labelers by testing their ability to spot subtle flaws that an AI would miss.
  • The system has evolved to collect not just rankings but also rich text feedback. This iterative loop—where human reasoning is used to regenerate and further refine outputs—creates a powerful flywheel for improving model performance and capturing reasoning steps.

From Data to Dominance: Fine-Tuning a Superior Model

  • “Specifically for generating interfaces at this size, I don't think there would be a model that would be better than what we have here.”
  • By leveraging over 10 million labels from its network, the Dojo team fine-tuned a Qwen 2.5 7B model. The resulting model is now arguably the world’s best at its size for generating complex, interactive user interfaces.
  • The model was trained using Direct Preference Optimization (DPO), which directly incorporates human preference rankings into the training process. This not only improved UI generation but also helped the model generalize its capabilities to other domains.

Key Takeaways:

  • Dojo demonstrates how Bittensor's incentive mechanism can bootstrap not just compute, but a global, on-demand pool of human intelligence. It provides a unique solution to AI's data problem and a path toward more aligned, capable models.
  • 1. Human Intelligence is the Ultimate Moat: In an era of synthetic data, Dojo is creating a defensible moat by generating proprietary, high-quality human preference data. This is the raw material for the next generation of fine-tuned, specialized models.
  • 2. A New Paradigm for Validation: Dojo’s mechanism of using subtle "perturbations" to test labelers is a breakthrough. It solves the cold start problem of validating subjective human feedback in a decentralized network.
  • 3. The Future is Human-Agentic Collaboration: Dojo is evolving from a data-generation subnet to a platform for human-agentic workflows, with applications in robotics, video analytics, and 3D generation. In the long term, it aims to be a crucial tool for aligning AI with human values.

For further insights and detailed discussions, watch the video: Link

This episode reveals how Tensorplex Dojo is pioneering a human-in-the-loop system on Bittensor to solve AI's critical data scarcity problem, creating high-quality, human-generated datasets that could unlock the next wave of model improvements.

Introduction to Tensorplex and Backprop Finance

  • Backprop Finance: A trader-centric platform and progressive web app (PWA) that allows users to trade and analyze subnets without needing a separate wallet. It features a non-custodial wallet generated locally on the user's device, streamlining staking and swapping.
  • Traction: The platform has processed over $500 million in transaction volume across more than 200,000 transactions.
  • The Backprop Analogy: Darwin explains the name is a nod to backpropagation, a core concept in training neural networks. He draws a powerful analogy where the Bittensor network is a three-layer neural network, and staking or unstaking via Backprop Finance is akin to the backpropagation step, adjusting the "weights" (i.e., capital allocation) of subnets to optimize the entire network's output.

Darwin: "If you think about the entire of the tensor network, it's kind of like a three-layer neuro network... by staking and unstaking onto the subnet you effectively adjust the weights of the neurons... That's why back prop finance the interface is kind of like the back propagation step for the entire tensor network."

The Genesis of Dojo: A Human Feedback Layer

  • The Problem: If a miner develops a model superior to what validators can assess, it risks being penalized. This creates a ceiling on innovation within a subnet.
  • Dojo's Solution: To break this ceiling, Tensorplex proposed abstracting human feedback into a dedicated service layer. Dojo acts as a universal "human feedback layer" that any subnet can tap into to complement its automated validation mechanisms.
  • Strategic Goal: This allows the network to validate and reward true state-of-the-art performance, pushing the boundaries of what can be built on Bittensor and attracting attention from the broader AI ecosystem.

The AI Data Bottleneck and Model Collapse

  • The Data Limit: A 2022 study projected the world would exhaust high-quality public data for training large models between 2026 and 2032. This problem is accelerated as platforms increasingly place content behind logins.
  • Synthetic Data's Pitfall: The natural solution is to generate synthetic data, but this introduces the risk of model collapse. This phenomenon occurs when a model is iteratively trained on its own synthetic outputs, causing its biases to compound until the generated data becomes corrupted and useless.
  • Actionable Insight: Human validation is the critical ingredient to prevent model collapse. It "grounds" the synthetic data generation process, ensuring that the data remains aligned with real-world quality and user preferences.

The Critical Role of Human Preference and RLHF

  • Why Humans Are Needed: While a model can be trained via simulation for tasks with clear, objective functions (e.g., winning a game of Go), it fails on subjective tasks. Defining what makes an interface "intuitive" or an image "visually appealing" requires human judgment.
  • The Moat of Human Preference: The host compares this to Scale AI's business model, which is built on a moat of human-labeled data. You cannot synthetically generate the "ground truth" of human preference; it must be sourced directly from humans.

Darwin: "Unless we're training for some other entity to use this AI models, then we'll have to get feedback from that entity instead. But in this case, it's all a human-centric development."

Dojo's Subnet Mechanism: Generating and Ranking Tasks

  • The Workflow:
    • Task Generation: A validator synthetically generates a prompt, for example, "create an interactive educational quiz about data packets."
    • Output Generation: The validator uses a model to generate multiple different interactive UIs based on that prompt.
    • Human Ranking: These outputs are sent to miners (human labelers), who must interact with each version—testing buttons, answering questions, and trying to break them—to rank them based on quality, appeal, and functionality.
  • Investor Takeaway: This mechanism is designed to generate data for tasks that are difficult for AI to evaluate automatically, such as user experience in interactive applications. The data collected is therefore unique and highly valuable.

Validating Human Input: The Perturbation Method

  • The Challenge: In subjective tasks, there is no single "correct" answer, making it difficult to score miners' contributions.
  • The Solution: Validators create a "relative ground truth" by introducing perturbations—slight changes to the prompts. Some of these are negative augmentations, designed to subtly degrade the quality of the output in ways that only an attentive human would notice.
  • Validation in Action: Miners are scored on their ability to consistently rank the negatively perturbed (lower quality) outputs below the original ones. This allows the network to validate the integrity of the human feedback without needing a definitive ground truth.

The Miner Ecosystem: Industrialized Human Labeling

  • Industrialized Operations: Darwin reveals that large, organized teams of miners have emerged.
    • A team in Vietnam hires students and recent graduates to perform labeling tasks.
    • Teams in China have developed sophisticated AI pipelines to automate parts of the workflow, with humans stepping in for final validation and complex cases.
  • Lowering Barriers to Entry: To encourage broad participation, Dojo offers a platform where miners can provide API keys to non-technical labelers, who can then contribute directly through a web interface without needing to set up their own server.
  • Strategic Insight: The emergence of these specialized "human labeling farms" validates the economic model of the subnet and proves its ability to source human intelligence at scale.

Iterative Improvement with Rich Human Feedback (RHF)

  • Beyond Rankings: In addition to ranking outputs, miners now provide specific, written text feedback on how to improve the best-ranked option.
  • The Iterative Loop: This text feedback is fed back into the generation pipeline to create a new, improved output. This new version is then returned to the miner pool for another round of ranking and feedback.
  • Data Flywheel: This continuous, iterative process creates an incredibly valuable dataset. It not only improves the quality of the synthetic data but also captures the reasoning process of human labelers, which can be used to train more advanced reasoning models.

Proof of Value: Fine-Tuning a State-of-the-Art Model

  • The Experiment: The team took the Qwen2.5 Coder (a 7-billion parameter model) and fine-tuned it using two types of data from Dojo:
    • Codebases: Over 160,000 UI codebases generated on the subnet were used for supervised fine-tuning.
    • Human Preferences: Miner rankings were used to perform DPO (Direct Preference Optimization), a technique that directly trains a model on human preference data (e.g., "Output A is better than Output B").
  • The Results: The model fine-tuned with DPO produced "substantially more visually appealing" and functional user interfaces than the base model. Remarkably, this specialized training also led to generalized performance improvements across other coding benchmarks.
  • Actionable Insight: This result proves that Dojo's human-generated data is not just theoretical; it is a tangible asset capable of producing best-in-class, specialized AI models that can outperform larger, more generalized ones in specific domains. Darwin asserts this is likely the best model in the world at its size for this specific task.

Future Directions and Long-Term Vision

  • Ecosystem Collaboration: Dojo is partnering with Subnet 17 (3D Generation) and Subnet 44 (Video Analytics) to provide a human feedback and validation layer for their outputs.
  • New Frontier: Robotics: The team is working with a robotics startup, recognizing a huge opportunity for human operators to provide feedback for complex physical tasks that are difficult to simulate.
  • Long-Term Vision:
    • Near-Term: Evolve into a platform for human-agentic collaboration, where AI agents can tap the human labor pool to complete tasks.
    • Far-Term: Serve as a crucial alignment mechanism, allowing humanity to steer the values and principles of advanced AI systems.

Conclusion

Dojo's mechanism for generating high-quality, human-verified data creates a defensible moat and a unique asset for training specialized AI models. Investors and researchers should monitor the performance of models fine-tuned on this data, as it represents a new frontier for creating differentiated AI capabilities within the Bittensor ecosystem.

Others You May Like