a16z
June 16, 2025

AI is Revolutionizing Web Security - Bots, Agents, & Real-Time Defense

This podcast dives into the rapidly evolving landscape of web security, where AI-driven bots and agents are reshaping traffic patterns and challenging traditional defense mechanisms. Experts discuss how to navigate this new era of automated interactions effectively.

The Shifting Bot Battleground

  • "50% of traffic is already bots. It's already automated and agents are only really just getting going."
  • The internet's underbelly is buzzing, and it's not just nefarious hackers. Half of all web traffic is already automated. While old-school Denial of Service (DoS) attacks are now largely commoditized, handled by cloud providers like a utility, the real chess match has moved up the stack. The challenge isn't just fending off brute force; it's discerning the intent behind increasingly sophisticated automated traffic, with AI agents promising to amplify this trend exponentially.
  • Automated bots generate a staggering 50% of current internet traffic.
  • The rise of AI agents, still in their early days, signals an impending explosion in automated web interactions.
  • Traditional volumetric DoS attacks are no longer the primary concern for most application owners, thanks to provider-level mitigations.

AI Agents: Friend, Foe, or Future Customer?

  • "Just blocking them just because they're AI is the wrong answer. You've really got to understand why you want them, what they're doing, who they're coming from."
  • AI isn't just a new attack vector; it's a new user category. These AI agents, sometimes acting directly on behalf of humans for tasks like research or booking, mean a simple "block all bots" strategy is a surefire way to shoot your revenue in the foot. OpenAI, for instance, has multiple "personalities"—some train models, others power search, and some execute user commands. Businesses are already seeing higher conversions from this AI traffic, making a nuanced approach essential.
  • Blanket-blocking AI traffic is counterproductive, as many AI-driven interactions can lead to increased revenue and user engagement.
  • Distinguishing intent is key: an AI scraping data maliciously versus an AI helping a user make a purchase requires different responses.
  • Application context is crucial. For an e-commerce site, wrongly blocking an AI-assisted purchase means lost revenue.

Smarter Defenses for a Smarter Threat

  • "Delaying an HTTP request for 5 seconds, that's not going to work. And so I think the trend that we're seeing with the improvement cost, the inference cost, but also the latency... that's going to be the key."
  • Forget the sledgehammer; modern web security needs a scalpel. Defenses are becoming layered, starting with robots.txt for well-behaved bots, moving to IP reputation, user-agent analysis, and cryptographic verification for declared identities (like Googlebot or OpenAI bots). Advanced techniques involve fingerprinting TLS and HTTP characteristics to spot anomalies. The holy grail? Real-time, on-device AI that analyzes the full context of a request in milliseconds.
  • Defense is multi-layered: robots.txt, IP/User-Agent analysis, reverse DNS lookups, and client fingerprinting (e.g., J3/J4 hashes).
  • The latency of current large language models (LLMs) is a barrier for real-time request blocking, which needs millisecond-level decisions.
  • Future solutions involve lightweight, fast-inference edge AI models embedded within applications for real-time, context-aware security decisions.

Key Takeaways:

  • The era of simple bot-blocking is over. AI agents are becoming significant web citizens, demanding sophisticated, context-aware security.
  • Embrace Nuance: AI traffic isn't monolithic. Develop granular controls to allow beneficial AI while blocking malicious actors, understanding that AI can be a customer.
  • Layer Your Defenses: Combine traditional methods with modern fingerprinting and identity verification, preparing for a future where AI analyzes traffic in real time.
  • Context is King: Security decisions must be deeply integrated with application logic to avoid harming user experience or revenue.

For further insights, watch the podcast: Link

This episode unpacks the evolving landscape of web traffic, where AI-driven bots and agents are rapidly becoming dominant, forcing a paradigm shift in how we approach web security and user interaction.

The New Reality of Bot Traffic

  • The speaker highlights a critical statistic: "50% of traffic is already bots... and agents are only really just getting going." This sets the stage for an impending explosion in automated traffic.
  • While current AI agents are often slow or in preview, their proliferation is inevitable, demanding a move beyond simply blocking AI traffic.
  • Strategic Implication: Investors and researchers must recognize that AI agents are not just a niche but the future of web interaction, requiring infrastructure and strategies that can differentiate and manage this new wave of traffic.

From DDoS to Discerning Good Bots from Bad

  • The speaker, drawing on historical context, notes that traditional DDoS (Distributed Denial of Service) attacks—overwhelming a server with traffic from multiple compromised computer systems—are now largely commoditized and handled by network or cloud providers.
  • The contemporary challenge lies in distinguishing between beneficial bots (e.g., search engine crawlers), malicious bots, and AI agents acting on behalf of humans. This is no longer a binary decision.
  • "The challenge is really about how do you distinguish between the good bots and the bad bots? And then with AI changing things, it's bots that might even be acting on behalf of humans," the speaker explains, emphasizing the increased complexity.
  • Actionable Insight: Businesses must shift from blunt blocking mechanisms to sophisticated systems that understand the intent and origin of bot traffic to avoid alienating legitimate AI-driven interactions.

The Perils of Imprecise Blocking and the Need for Application Context

  • Legacy bot detection methods, often relying on simple IP address or user-agent string blacklisting, are described as imprecise, akin to "using a hammer."
  • These outdated approaches risk blocking legitimate traffic, including AI bots acting for users looking to make purchases, leading to lost revenue.
  • The speaker stresses the importance of application context: "You need to know where in the application the traffic is coming to. You need to know who the user is, the session and to understand in which case you want to allow or deny that."
  • For instance, in e-commerce, blocking a potentially fraudulent but ultimately legitimate transaction is worse than flagging it for human review.
  • Strategic Consideration: Crypto AI platforms, especially those with transactional components, must integrate application-level awareness into their security, as network-level blocking alone is insufficient and potentially harmful.

Critiquing Legacy Solutions and Embracing Nuance for AI Traffic

  • Many existing security solutions, even those marketed with "AI names," still rely on old-school network telemetry, analyzing traffic before it reaches the application. This lack of application context is a significant drawback.
  • The speaker points out that entities like OpenAI deploy multiple types of bots, some for training models, others for user-initiated searches, or real-time actions. A blanket "block AI" approach is "too blunt of an instrument."
  • Blocking all AI traffic can lead to businesses "duping" themselves out of new revenue streams or getting down-ranked by AI crawlers, similar to blocking Google.
  • Actionable Insight: Researchers should investigate and develop security models that can dynamically assess the utility of different AI bot types based on their declared purpose and behavior within the application context.

The Role and Limitations of `robots.txt`

  • The `robots.txt` file is a long-standing, voluntary standard allowing website owners to instruct web crawlers which parts of their site should not be accessed.
  • While useful for guiding "good bots" like Googlebot, its voluntary nature means it has no enforcement mechanism.
  • Newer or malicious bots may ignore `robots.txt` or even use it to identify sensitive areas to target.
  • The speaker notes, "The challenge with that is it's voluntary and there's no enforcement of it."
  • Strategic Implication: While `robots.txt` remains a foundational tool, it's insufficient for robust bot management. Crypto AI projects need to layer more sophisticated, enforceable controls.

Understanding and Managing Diverse AI Agents

  • The speaker details various OpenAI crawlers as examples:
    • One crawls sites to train OpenAI models (the most common target for blocking).
    • Another acts like Googlebot, building a search index when users ask questions in ChatGPT. This is generally beneficial for site visibility and traffic.
    • A real-time agent fetches and summarizes specific URLs or answers questions from documents on behalf of a user.
    • The "computer operator" agent uses headless web browsers (web browsers without a graphical user interface, often used for automation) or full browsers in a VM to take actions like booking tickets.
  • The challenge is nuanced: allowing an agent to research for a user is good, but allowing it to scalp concert tickets is bad. Control needs to be granular, perhaps allowing a bot to queue but requiring human intervention for purchase.
  • Actionable Insight: Developers must design systems that can identify and apply different policies to various AI agent types, considering the specific actions they are attempting within the application.

Layered Defense: A Multi-Faceted Approach to Bot Management

  • The speaker advocates for building layers of protection:
    • `robots.txt`: Manages well-behaved bots.
    • IP Reputation: Analyzing traffic origin (e.g., data centers vs. residential IPs). However, abusers use proxies (intermediary servers) on residential IPs, complicating this.
    • User-Agent String: A field where bots can identify themselves. Many legitimate bots do, and their identity can be verified via reverse DNS lookup (querying the DNS for hostnames associated with an IP address).
    • Fingerprinting: Creating a unique identifier for a client based on its characteristics.
      • J3/J4 hash: Algorithms (some open source) that hash TLS handshake parameters to identify clients.
      • J4H: Looks at HTTP headers. Modern hashing resists simple evasion tactics like reordering headers.
  • "So you take all of the metrics around a session and you create a hash of it and then you stick it in a database... and you look for matches to that hash," the host summarizes, capturing the essence of fingerprinting.
  • Strategic Consideration: Crypto AI security architectures should implement a defense-in-depth strategy, combining these techniques to build a comprehensive profile of incoming traffic before it interacts with sensitive application logic.

Emerging Identity and Signature Solutions

  • The speaker discusses new developments in providing verifiable signatures for requests:
    • Apple Privacy Pass: A hash attached to requests from Apple ecosystem users, leveraging iCloud subscription as a proxy for human verification.
    • Cloudflare's similar initiative for automated requests, using public key cryptography for verification.
  • These aim to help distinguish legitimate automated clients from malicious ones.
  • Actionable Insight: Researchers should monitor the development and adoption of these cryptographic attestation methods, as they could become crucial for establishing trust in an increasingly automated web.

The Agent-Driven Future and Evolving Bot Behavior

  • The host observes a personal trend: "I interact with the internet less and less directly... I'm going through some sort of AI type thing." This points to a future where AI agents are primary internet consumers.
  • With 50% of traffic already bots and agents emerging, an "explosion in traffic" is expected.
  • The speaker notes that old-school methods assume malicious intent, which is increasingly inaccurate as beneficial AI agents proliferate.
  • Encouragingly, AI bot behavior is improving: "Today we know that these bots can be verified. They are identifying themselves. They are much better citizens of the internet."
  • Strategic Implication: Future-proofing systems means designing for a world where AI agents are the norm, requiring sophisticated, context-aware rules rather than simple block/allow decisions.

The Enduring Challenge of Proving Humanness and the Role of AI in Detection

  • Proving humanness online is a long-standing, unsolved problem, as evidenced by a decades-long NIST working group.
  • While digital signatures are a pure solution, their user experience (UX) has hindered adoption.
  • AI, specifically Machine Learning (ML), has been used in traffic analysis for over a decade. The new generation of Large Language Models (LLMs) offers new possibilities.
  • A key challenge for LLMs is inference speed; decisions for web requests need to be made in milliseconds.
  • Actionable Insight: Crypto AI researchers could explore hybrid models combining fast, classic ML for initial filtering with more sophisticated LLM analysis for complex cases or background pattern detection.

The Future: Edge Inference and AI-Powered Security Co-Pilots

  • The speaker anticipates the deployment of new edge models—AI models optimized for low-resource environments like mobile devices or IoT—capable of millisecond inference for real-time application security.
  • The falling cost of inference is a major driver. The host draws a parallel to cloud storage costs plummeting.
  • The host envisions a future with "an LLM running locally that's basically going to be Clippy but for CISOs," providing real-time security advice.
  • The speaker confirms work towards embedding analysis into every request, using the full context (user, session, application) for local decision-making on web servers or at the edge.
  • "Delaying an HTTP request for 5 seconds, that's not going to work. And so I think the trend that we're seeing with the improvement cost, the inference cost, but also the latency... that's going to be the key," states the speaker.
  • Strategic Consideration: Investors should look for solutions leveraging low-latency edge AI for real-time threat detection and response, as this will be critical for securing next-generation applications. Advertisers, too, will benefit from fast, on-edge inference to combat issues like click spam.

Conclusion

The discussion underscores a critical shift: AI agents are becoming primary web interactors, demanding security that moves beyond crude blocking to nuanced, context-aware analysis. Crypto AI investors and researchers must prioritize developing and adopting layered, AI-driven security capable of real-time, granular control to harness the benefits of this new automated era.

Others You May Like