The People's AI
May 23, 2025

How Decentralized Data Marketplaces Could Reshape AI Forever

This episode features Art Able, co-founder of Vana, who brings his background in public policy and the data industry to explore how decentralized data marketplaces are poised to revolutionize AI by empowering users. We dive deep into the transition from exploitative data practices to a future where individuals own and profit from their digital essence.

The Broken State of Data Today

  • "People's data is traded every day and for a lot of money and it's used to make the technology that we use every day and we're not anywhere in that equation right now."
  • "Most tech companies spend more on data than they do on ML engineers because it's just so expensive to source this kind of data."
  • Current data marketplaces are opaque and inefficient, operating like old-school consultancies where users are the product, not participants. Big Tech spends more on data than ML engineers, with single contracts hitting a billion dollars, yet users see pennies, if anything.
  • This high-cost, centralized model stifles innovation, as scrappy developers can't afford the entry ticket to quality training data, often stopping at free datasets like Kaggle.
  • Art terms the current exploitation "data colonialism," where buyers leverage information asymmetry to extract maximum value from unaware sellers (users).

Vana’s Blueprint: User-Owned AI Through Decentralized Data

  • "With that data capital, we can not only realize financial upside, but we can actually shape what the future of technology looks like by being able to contribute our data to things that matter to us."
  • Vana champions user-owned data, enabling individuals to form "data DAOs" – think data labor unions – to collectively bargain and monetize their data.
  • The VRC20 token standard is key, representing fractional ownership in datasets. These tokens allow for novel value accrual, from direct revenue shares to buy-and-burn mechanisms, decided by each Data DAO.
  • An example is DLP Labs, whose vehicle telemetry data (think Tesla-like data) is sold via Vana to an EV battery company, with revenue flowing back to the data contributors.

The Evolving Marketplace: From Hybrid to Hyper-Liquid

  • "Enter what I call the data market maker... somebody who can earn on the spread between what somebody's willing to pay on the demand side of the data and how they position their token allocations within the data DAOs."
  • Currently, Vana data acts as a Web2-friendly intermediary, translating traditional data procurement needs into Web3 token transactions.
  • The future vision involves "data market makers" – speculators who will buy up data tokens they deem undervalued, provide liquidity, and earn from the spread. This is expected to channel capital efficiently to valuable datasets, boosting token prices and user incentives.
  • This could lead to dynamic pricing, where rare and diverse data (e.g., a 75-year-old man's BMI for a health AI) becomes incredibly valuable, properly compensating vital, hard-to-source information.

Key Takeaways:

  • The conversation paints a future where your data isn't just digital exhaust but a valuable, tradable asset you control. This isn't just about fairer compensation; it's about democratizing AI development and giving individuals a stake in the technology shaping our world, potentially culminating in a "Universal Data Income."
  • Data is the New Asset Class: Vana is pioneering frameworks (like VRC20) to treat data as an ownable, tradable asset, potentially revolutionizing finance as much as property ownership once did.
  • Market Makers Will Ignite Liquidity: The emergence of "data market makers" is projected to significantly enhance capital flow and price discovery in decentralized data marketplaces.
  • From UBI to UDI: Instead of a Universal Basic Income, imagine a Universal Data Income where you’re paid for your unique data contributions that make AI more human and effective.

For further insights and detailed discussions, watch the full podcast: Link

This episode delves into the mechanics of decentralized data marketplaces, revealing how Vana aims to empower users by transforming their data into a tradable asset class, and what this means for the future of AI development and investment.

Vana's Vision: Reclaiming Data Value for Users

  • Art Able, co-founder of Vana, kicks off by highlighting a fundamental issue: individuals' data is constantly traded for significant profit and used to train AI, yet the data creators (users) are excluded from this economic loop. Vana's vision is to create a new economic system where users can realize the financial upside of their data and influence technology development.
  • Art explains, "It's our essence that's going into the AI that's coming to us today... And now I'm just an empty consumer of that. I'm just consuming the product and I am the product. Where am I in that equation?"
  • The core idea is that user-generated data is not "digital waste" but valuable capital. By pooling data, users gain leverage to profit and direct their data towards AI projects they value.
  • Strategic Implication: Investors should note the shift towards user-centric data economies, which could unlock new data sources and AI applications previously inaccessible.

The Current Landscape: Centralized and Inefficient Data Marketplaces

  • Art, drawing from his experience at Appen, a company specializing in providing human data for AI, describes the current data marketplace. Large tech companies seeking specific data, like human-like conversation data for Large Language Models (LLMs) – AI models trained on vast amounts of text data to understand and generate human language – approach firms like Appen.
  • These firms source data, often without users' full awareness or fair compensation. The process is opaque, resembling a consulting model rather than a fluid market.
  • Art notes, "The structure of the industry is that this transaction goes on like a brokerage where the end users are not really part of the equation."
  • This system is also highly inefficient and expensive. Art mentions a "billion dollar contract" his former company had, highlighting that most tech companies spend more on data than on ML engineers.
  • Actionable Insight: The high cost and exclusivity of current data acquisition create a significant barrier for smaller AI developers, an inefficiency decentralized models aim to solve. Researchers should consider how decentralized access could democratize AI development.

Barriers for Independent AI Developers

  • The conversation underscores the difficulties independent AI developers face in accessing quality training data.
  • Art shares an anecdote: "I ask this question in a community that I'm involved in in Australia of what do you do when you can't find your data set on Kaggle? Kaggle's the free open source data set place. And they say we stop imagining."
  • Without "big tech money," developers are often stuck, as data brokers cater to large contracts, making small-scale data purchases unfeasible.
  • Strategic Implication: Decentralized marketplaces could unlock innovation by providing affordable, accessible data, creating opportunities for new AI startups and research projects that are currently non-starters due to data acquisition costs.

Vana's Approach: A Phased Transition to Decentralized Markets

  • Art outlines Vana's strategy for building new markets around data-backed assets, specifically VRC20 tokens – a new token standard designed by Vana to represent ownership and access rights to specific datasets.
  • Phase 1 (Current): Bridging Web2 and Web3. Vana's data arm, Vana Data, interfaces with traditional Web2 companies and researchers who want to buy data. This process is currently a "fairly manual brokerage," similar to existing data markets, but with a key difference: proceeds are directed back to the data owners (users).
  • Art emphasizes meeting clients where they are: "We have to really turn up to these conferences and look and smell exactly like what these people are used to."
  • Vana Data acts as an intermediary, converting Web2 cash into Web3 tokens to facilitate data transactions.
  • Actionable Insight: This hybrid model is a pragmatic approach to onboarding traditional players into Web3 data economies. Investors should watch how this model scales and attracts Web2 demand.

Early Success: The DLP Labs Example

  • Art provides a concrete example of Vana's model in action with DLP Labs.
  • DLP Labs collects vehicle telemetry data (like Tesla data on driving, battery usage, etc.).
  • This data is sold via a subscription model to an electric vehicle battery company for analytics to optimize EV building and charging station deployment.
  • The company buys data from the creator of the Data DAO (Decentralized Autonomous Organization focused on governing a specific dataset), and the DAO rewards data contributors.
  • Art contrasts this: "The alternative would have been these battery companies ring up Tesla or ring up Appen... Now we have this really cool world in which, hey, hang on, the user is no longer just the product. The user is actually involved in that economic transaction."
  • Another example involves a company seeking monthly pricing data from Facebook Marketplace, which will be accessed via a token burn mechanism once the data DAO is created.
  • Strategic Implication: These early use cases demonstrate the viability of user-owned data marketplaces for specialized, high-value datasets, offering a new avenue for AI companies to source ethically and transparently.

The VRC20 Token: Powering Data Economies

  • The discussion delves into the VRC20 token and its role in compensating data contributors.
  • The VRC20 standard is flexible, allowing Data DAO creators to experiment with value accrual models.
  • Pass-through revenue: Token holders receive a share of the revenue generated from data sales directly in their wallets.
  • Buy-and-burn mechanism: Data buyers purchase and burn a certain amount of the DAO's VRC20 tokens, increasing the value of remaining tokens for holders due to constrained supply.
  • Art explains the rationale for flexibility: "The reason why we left it undefined in VRC20 is because there's going to be a lot of innovation for actually how you pass the rewards through to the end users."
  • The Reddit Data DAO Example: A Google ML engineer wanted access to the Reddit dataset but lacked funds. The DAO allowed him to contribute his skills to build a model for them in exchange for tokens, making him part of the community. This illustrates how Data DAOs can opt for non-monetary value exchange, fostering collaborative development.
  • Actionable Insight: The VRC20 token and its flexible mechanics are crucial for Crypto AI investors to understand. The success of these models will depend on their ability to align incentives between data contributors, DAO governors, and data buyers.

Why a New Token Standard? The VRC20 Genesis

  • Art explains that the VRC20 token emerged from observing the unique dynamics of data-related tokens.
  • A data token isn't a memecoin (it has underlying asset value) nor a full project token (it represents the asset). It's akin to an RWA (Real-World Asset) token – a token representing ownership of tangible or intangible real-world assets – but for a dynamic, growing data asset.
  • The VRC20 standard includes protections against pump-and-dump schemes, crucial because mishandling data tokens impacts not just financial value but also user data itself.
  • Art states, "We actually see VRC20 as this kind of in between between web 2 and web 3 where if you wanted to invest in Facebook without buying Facebook equity stock, you could invest in the thing that has a lot of value for Facebook which is the Facebook data and that can be decentrally owned."
  • Strategic Implication: The VRC20 standard aims to formalize data as a tradable asset class. Researchers and investors should monitor its adoption and potential influence on regulatory discussions around data ownership and valuation.

Data as a New Asset Class: Challenging Big Tech

  • The conversation explores the disruptive potential of treating data as a standalone, tradable asset class, independent of the platforms that currently control it.
  • Art argues that Big Tech valuations are largely based on the data they hold and the future potential they can unlock with it.
  • Decoupling data value from company equity could shift capital flows directly to data assets.
  • He posits, "Data is capital. In a future world, one might think about VCs might start to offer data as the thing that they invest in companies in because that's the thing that steps a company up from series A to series B."
  • Art draws an analogy to how property became individually ownable post-feudalism, suggesting data is at a similar transformative juncture.
  • Actionable Insight: If data becomes a widely recognized and liquid asset class, it could fundamentally alter investment strategies in the tech sector, creating new financial products and opportunities centered around data itself.

Act Three: The Future of Decentralized Data Marketplaces

  • Art envisions a more frictionless and liquid future for data marketplaces, introducing the concept of the "data market maker."
  • Data Market Makers: These entities would earn on the spread between what data buyers are willing to pay and the cost of acquiring tokenized data access from Data DAOs. They would take early positions in promising data tokens, anticipating demand.
  • Art explains, "Vana Data is the first of those businesses. So, it will take early positions in some tokens that it thinks it can commercialize because it knows the demand side very well."
  • This model aims to move capital efficiently, particularly towards data sourcing, which Art identifies as the hardest part of the problem, combating what he terms "data colonialism" – where buyers exploit sellers' lack of awareness of their data's true value.
  • Strategic Implication: The emergence of data market makers could significantly increase liquidity and price discovery for data assets. Investors might find opportunities in backing or becoming such market makers.

The Enduring Role of Data DAOs

  • Jeff Wilser poses a provocative question: if markets become perfectly liquid, are Data DAOs still necessary?
  • Art believes DAOs (or collectives) remain crucial, especially for reaching the "step function" where a dataset achieves a minimum viable size to be useful. Bonding curves – mathematical curves that define a relationship between an asset's price and its circulating supply – can help incentivize initial data contributions to reach this threshold.
  • Data sourcing requires a different skillset (human empathy, UX, methodology) than financialization. Vana's architecture aims to integrate these different skill sets.
  • Actionable Insight: While liquidity is key, the organizational and governance aspects provided by Data DAOs will likely remain vital for curating, verifying, and ethically managing datasets, especially in the early stages of a dataset's lifecycle.

The Value of Unique Data: The BMI Example

  • Art shares a compelling story from his time at Appen about sourcing data for an AI model to predict BMI from a photo.
  • It was easy to get data from common demographics but incredibly hard to source data from underrepresented groups (e.g., a 75-year-old man willing to participate).
  • This highlights that in data, diversity and uniqueness are exceptionally valuable. "Data is the one place in which diversity matters. When if you're unique and not part of the standard deviation, your data is incredibly valuable."
  • He hopes future on-chain systems will dynamically price data based on its marginal value to a dataset.
  • Strategic Implication: Crypto AI investors and researchers should recognize that niche, diverse, or hard-to-reach datasets hold premium value. Systems that can effectively source and price such data will have a competitive edge.

Art Able's Background: A Public Policy Lens on Data

  • Art's background in public policy, particularly in monitoring and evaluation for governments and corporations (e.g., for the government of Timor-Leste), informs Vana's human-centric approach.
  • His work involved understanding how interventions impact human lives, a fundamentally data-driven question.
  • An experience in Colombia, struggling to gather farm yield data traditionally while observing rich data on his host mom's Facebook, sparked his realization about the untapped value in everyday digital interactions.
  • Art sees Vana "playing in this realm of in between humans and machines," applying a policy lens to understand how technology affects people.
  • Speaker Analysis: Art's perspective, shaped by his public policy and traditional data industry experience, brings a unique blend of human-centric ethics and market pragmatism to the Web3 data space.

Prediction: Universal Data Income (UDI)

  • Art concludes with a powerful prediction, contrasting with the common notion of Universal Basic Income (UBI) – a theoretical regular, unconditional payment given to all individuals by the government.
  • Instead of UBI, which assumes humans add no value in an AI-driven world, Art proposes Universal Data Income (UDI).
  • "The economies will be built not on what we do but on who we are, and that is why data is so important... We're teaching technology to be better for us. So, what's the upside to that?"
  • UDI implies users are compensated for their data contributions, which continuously train and improve AI systems.
  • Actionable Insight: The concept of UDI reframes the human role in the AI economy from passive recipients to active value creators. This vision has profound implications for future economic models and the societal value placed on personal data.

Conclusion

This episode highlights the transformative potential of decentralized data marketplaces to shift data ownership and economic benefits to users. For Crypto AI investors and researchers, the key takeaway is the emergence of data as a tradable, valuable asset class, requiring new infrastructure, financial models, and ethical considerations.

Others You May Like