The People's AI
November 20, 2025

AI Is Eating Your Data: NYT vs OpenAI, Smart Home Surveillance, and Digital Feudalism

Hosted by journalist Jeff Wilser, this podcast investigates how our personal data—from intimate moments captured by a robot vacuum to decades of journalism—is being harvested to train AI. The episode breaks down the consequences of this data grab for our private lives, our jobs, and the future power structures of society.

The Smart Home Surveillance State

  • “Whenever you have a device that claims to be smart, you have behind it this massive apparatus of data and artificial intelligence.”
  • “We just click through the terms and conditions and click 'I agree' so we can get to the product. But in every single term and condition... there’s almost always a line that says that we consent to our data being shared for product improvement, and product improvement now always includes testing and development for AI.”
  • The episode opens with a chilling real-world example: an iRobot vacuum in a beta test took compromising photos of users, including a woman on a toilet. These images were shared in a private Facebook group by low-paid Venezuelan gig workers tasked with data labeling for Scale AI, a contractor for iRobot.
  • While the users were part of a beta test, they were unaware humans would view these intimate images. This scenario highlights a broader truth hidden in endless "Terms and Conditions"—users routinely consent to their data being used for "product improvement," which now implicitly includes training AI models.

The Great AI Heist

  • “OpenAI has built an empire worth what, $250 billion... But the raw materials for its product, it steals for free.”
  • “If what OpenAI and Microsoft have done is held to be fair use, that writing will be fair game. It won't be the property of the people who wrote the material... What you write isn't going to be yours anymore.”
  • The New York Times is suing OpenAI and Microsoft, alleging the companies committed mass copyright infringement by using its articles to train their large language models without permission or compensation.
  • The lawsuit argues that AI firms systematically remove copyright information, like author bylines, making it impossible to trace or compensate original creators. If this practice is deemed "fair use," it sets a precedent that could devalue all creative and knowledge work, from journalism to photography and even internal company reports.

Rise of the Digital Feudal Lords

  • “My personhood is connected to my data... Control of your data is a fundamental human right.”
  • “If we lose control of our data, we lose control of our agency. Without that, I am now a serf... you are Facebook's serf... to the baron that is the feudal lord running everything.”
  • The ultimate battle is over data sovereignty. The podcast argues that in the digital age, control over one's data is a fundamental human right tied directly to personal agency.
  • Concentrating data in the hands of a few tech giants creates a new "digital feudalism," where corporations are the lords and individuals are the serfs, their data harvested as a resource. This concentration of power poses a societal risk, as AI could eventually usurp human decision-making authority on a mass scale.

Key Takeaways:

  • The podcast argues that without intervention, we are on a one-way street toward a future where our data, creativity, and agency are owned by a handful of powerful corporations. The central fight is to reclaim individual sovereignty in the age of AI.
  • Your Data is the New Oil, and You're Giving It Away. Every smart device, social media post, and email you create is a valuable asset used to build multi-billion dollar AI empires, yet you receive no compensation.
  • The Creator Economy is Facing an Existential Threat. The outcome of lawsuits like NYT vs. OpenAI will determine whether creative work remains intellectual property or becomes free raw material for AI, potentially decimating entire professions.
  • Reclaim Your Digital Sovereignty. Losing control of your data isn't just a privacy issue; it's a slide into "digital feudalism." The podcast champions decentralized technologies as a tool to break these data monopolies and reassert individual ownership.

For further insights and detailed discussions, watch the full podcast: Link

This episode reveals how the AI industry's insatiable hunger for data creates a battleground over privacy, ownership, and power, from smart home devices to the creative economy.

The Hidden Data Trail of Smart Home Devices

  • The episode opens with a startling investigation by Eileen Guo, a senior reporter for the MIT Tech Review, who uncovered how a robot vacuum cleaner captured and leaked compromising images of people in their homes, including a woman in a bathroom. These images were not from a consumer device but from a special development model made by iRobot, the largest robot vacuum manufacturer. The photos were part of a dataset used to train the vacuum's AI to recognize objects.
    • Data Labeling Explained: The process of training AI models requires humans to manually identify and tag objects in images, a task known as data labeling or data annotation. This work is often outsourced to gig workers on a contract basis.
    • The Supply Chain Breakdown: In this case, iRobot contracted with Scale AI, a major data annotation firm, which in turn used gig workers in Venezuela. These workers, confused about how to label certain images, shared screenshots in private Facebook groups for assistance, which is how the images were leaked.
    • The Illusion of Consent: While the images came from beta testers who agreed to data collection for "product improvement," they were unaware that humans would view the images, let alone that they would be shared on social media. Eileen Guo notes, "That was absolutely not within their realm of understanding. That was not what they had consented to." This highlights a critical gap between boilerplate terms of service and genuine user understanding.

Strategic Implication: For investors, this incident exposes the fragile and often unethical data supply chains behind major AI companies. Projects that can verify ethical data sourcing or provide privacy-preserving training methods present a significant opportunity. Researchers should scrutinize the provenance of datasets, as reputational and legal risks are high.

The Battle for Data: New York Times vs. OpenAI

  • The conversation shifts from the home to the workplace, focusing on the landmark lawsuit filed by The New York Times against OpenAI and Microsoft. The case centers on the claim that these AI companies unlawfully used millions of copyrighted articles to train their Large Language Models (LLMs) like ChatGPT without permission or compensation.
    • Core Allegation: Jen Masel, a partner at the law firm representing the Times, explains the lawsuit alleges the "theft of their intellectual property." The core of the case is that OpenAI used copyrighted content to build a commercial product that now directly competes with the original creators.
    • The Economic Argument: Steven Lieberman, lead counsel for the firm, frames the issue starkly: "The raw materials for its product, it steals for free." He argues that while AI companies pay handsomely for talent and computing power, they have built their empires by taking the most crucial ingredient—high-quality data—without payment.
    • Legal Status and Broader Impact: The case is currently in the discovery phase, with key motions due in mid-August. The lawsuit includes claims of copyright infringement and violations of the Digital Millennium Copyright Act (DMCA) for intentionally removing author bylines and copyright notices. The outcome will have profound implications for all creators, as a ruling in favor of "fair use" could devalue all original content, from articles and books to images and music.

Strategic Implication: This lawsuit is a critical event for the Crypto AI space. Its outcome will directly influence the future of data licensing and monetization. Investors should monitor the case closely, as it could validate business models built on decentralized content registries, tokenized intellectual property, and transparent revenue sharing for data contributors.

Digital Feudalism and the Fight for Data Sovereignty

  • The final section broadens the scope to the societal level, exploring the philosophical and political stakes of centralized data control. Michael Casey, Chairman of the Advanced AI Society, argues that control over personal data is a fundamental human right in the digital age, directly linked to individual agency and freedom.
    • The Rise of Digital Serfdom: Casey warns that when large corporations control our data, we become “digital serfs” in a system of “digital feudalism.” In this model, individuals produce the valuable raw material (data) but have no ownership or control over how it is used or the value it creates.
    • AI as a Concentration of Power: The concentration of data in the hands of a few Big Tech companies creates an unprecedented concentration of power. Casey highlights the risk of this power being used for totalitarian ends, where AI systems make decisions about resource allocation—including human resources—without oversight. As he puts it, "The most important thing that AI is going to take away from... humans is decision-making authority."
    • The Decentralized Alternative: The conversation frames decentralized, open-source technologies as a crucial counterforce. By reasserting individual ownership over data, these systems can break down monopolies and create a more equitable digital economy where users are participants, not products.

Strategic Implication: Casey's analysis provides the core investment thesis for many Crypto AI projects. The fight against "digital feudalism" is the primary market driver for solutions focused on data sovereignty, decentralized AI, and user-owned data economies. Investors and researchers should prioritize projects that offer credible, scalable frameworks for returning data control to individuals.

Conclusion

  • This episode frames the AI revolution as a battle over its most critical resource: data. The discussion highlights how centralized data harvesting threatens individual privacy, devalues creative work, and risks creating a dystopian "digital feudalism." For Crypto AI investors, the key takeaway is that the most valuable future innovations will be those that solve this fundamental conflict by building decentralized, user-owned data ecosystems.

Others You May Like