Latent Space
April 19, 2025

The Rise and Fall of the Vector DB category: Jo Kristian Bergum (ex-Chief Scientist, Vespa)

Jo Kristian Bergum, drawing on 20 years in search infrastructure including time as Chief Scientist at Vespa, dissects the rapid ascent and subsequent decline of the standalone vector database category, arguing that "search" is the more enduring abstraction for the AI era.

The Fading Vector Database Category

  • "I'm not saying that the companies are dying, right? I'm just saying that the separate infrastructure category [vector databases] is dying because you have vector search capabilities in almost any DB technology nowadays."
  • "I think Pinecone was one of the pioneers framing it as a new infrastructure category... And naturally then if you want to do anything in AI, then you need to have a vector database."
  • The hype cycle, fueled by early AI cookbooks and VC funding ($230M+ raised quickly across the category), positioned vector databases as essential AI infrastructure.
  • However, vector search is becoming a feature within existing databases (Postgres/PGVector, Elasticsearch, MongoDB) and search engines, rather than a standalone category.
  • This convergence, combined with increased competition (e.g., Turbopuffer), challenges the necessity of a separate vector DB, making the category itself appear too narrow and potentially unsustainable long-term.

Search: The Natural Abstraction for RAG

  • "I want to call these new companies that they are like search engines... I think that's a more natural abstraction for connecting AI with knowledge and all arguments for doing RAG."
  • "I think the natural concept there is search... [It's] more like the natural abstraction instead of jumping into vectors and how you represent that is more of like a detail like of how you implement search."
  • Bergum argues that focusing solely on "vector databases" misses the point; the core task for Retrieval-Augmented Generation (RAG) is search.
  • Vectors and embeddings are implementation details, not the fundamental concept. Framing the interaction as "search" allows for more flexible approaches (keyword, semantic, hybrid).
  • Modern AI agents often interact with distinct "search tools" (web search, code search), reinforcing this broader abstraction over specific database types.

Embeddings: Vital, But Not the Whole Story

  • "Embeddings are here to stay... It's just that it's not only about similarity searches in this kind of embedding space."
  • While embeddings became accessible to all developers post-ChatGPT and are crucial for representing complex data, relying solely on embedding similarity for search is insufficient.
  • Effective search needs more signals than just vector proximity, incorporating factors like data freshness, authority, and potentially traditional keyword relevance (hybrid search). Simply embedding the web won't create the next Google.

RAG Endures, Implementation Evolves

  • "I think that RAG is definitely not dead... augmenting AI with retrieval or search is still going to be relevant and I think it's going to be relevant for a very long time."
  • Despite the vector DB category shift and growing LLM context windows, RAG remains vital for grounding AI in specific, large, or dynamic knowledge bases.
  • For implementation, start with a BM25 baseline, layer in embeddings for hybrid search, and consider reranking based on cost/latency needs. Data quality ("look at your data") is paramount.
  • Bergum advises against running ML models directly in the database (like Postgres ML), preferring separate infrastructure due to different scaling needs and developer experience concerns.

Key Takeaways:

  • The standalone "vector database" category is likely dissolving into a feature set within broader data platforms. While the specific tooling evolves, the core need for effective retrieval (search) to augment AI is stronger than ever. Developers should focus on the search problem itself, leveraging hybrid techniques and robust baselines, rather than fixating on specific database technologies.
  • Vector DBs Fading: The category is dying as capabilities merge into existing databases; focus on vector search as a feature.
  • Search Over Vectors: Frame RAG around the core concept of "search," not the implementation detail of "vector databases."
  • RAG is Here to Stay: Longer context windows won't kill RAG for most real-world applications; hybrid search and data quality are key.

Link: https://www.youtube.com/watch?v=fiXDxS7xGks

This episode dissects the rapid rise and potential fall of the dedicated vector database category, exploring why traditional search paradigms might be reclaiming their dominance in the AI retrieval landscape and what this means for AI infrastructure investment.

Meet Jo Kristian Bergum & Setting the Stage

  • Jo Kristian Bergum, based in Trondheim, Norway (home to a strong technical university), brings two decades of experience in search infrastructure from roles at Yahoo, Fast Search & Transfer, and Vespa.
  • His recent focus includes embeddings and neural search, leading up to the pivotal "ChatGPT moment" in late 2022.
  • Bergum notes the influence of early OpenAI cookbooks that linked connecting LLMs to data primarily through embeddings and vector databases, setting the stage for a specific infrastructure trend.

The "Rise and Fall of Vector Databases" Thesis

  • Bergum clarifies his motivation for writing his widely discussed piece: the framing of vector databases (pioneered by companies like Pinecone) as a new, essential infrastructure category for AI, particularly for handling embeddings.
  • He argues that while embeddings are crucial, the idea that a separate vector database is mandatory for AI is flawed.
  • His core thesis isn't that companies like Pinecone or Chroma are failing, but that the distinct category of "vector database" is dissolving as vector search capabilities become integrated into existing databases and search engines.
  • "I'm not saying that the companies are dying, but I'm saying that the category is dying," Bergum emphasizes, distinguishing between company viability and category definition.

Market Dynamics and Convergence

  • The rapid ascent and perceived cooling-off of pioneers like Pinecone are discussed, alongside their recent repositioning towards a more developer-focused message, away from broader "memory for AI" slogans.
  • Competition from newer players (e.g., Turbopuffer) and the integration of vector search into established systems (Postgres via pgvector, Elasticsearch, Vespa, etc.) are driving this convergence.
  • Strategic Insight: The rapid commoditization of basic vector search functionality within existing data platforms challenges the long-term defensibility of standalone vector database providers, signaling a potential market consolidation or pivot for these companies. Investors should scrutinize the unique value proposition beyond basic vector indexing.

Search as the Natural Abstraction for RAG

  • Bergum posits that "search" is a more natural and robust abstraction than "vector database" for connecting AI models with knowledge, a process often termed Retrieval-Augmented Generation (RAG).
  • RAG (Retrieval-Augmented Generation): A technique where an AI model retrieves relevant information from an external knowledge base before generating a response, improving accuracy and grounding answers in specific data.
  • He references tools like Perplexity AI (using its "cascade mode") which employ various search tools (code search, web search, semantic search) as evidence of this more flexible approach.
  • The specific method of search (keyword, semantic vector search, hybrid) becomes an implementation detail under the broader "search" umbrella, rather than defining the core infrastructure category.

The Enduring Importance (and Mainstreaming) of Embeddings

  • While critical of the vector database category, Bergum strongly affirms the importance of embeddings.
  • Embeddings: Numerical representations (vectors) of data (text, images, etc.) that capture semantic meaning, allowing mathematical operations like similarity comparisons.
  • Embeddings allow representation of diverse data types and enable manipulation in vector space for domain adaptation.
  • The key shift was embeddings moving from specialized use in large tech companies to mainstream accessibility for all developers via APIs (like OpenAI's), fueling the initial vector database boom.
  • However, he cautions that effective search requires more than just vector similarity (e.g., cosine similarity), citing the need for signals like freshness, authority, and metadata filtering – hallmarks of traditional search systems.
  • Actionable Insight: Relying solely on semantic similarity via embeddings for retrieval is insufficient for robust applications. Researchers and builders must incorporate hybrid approaches and traditional search signals for relevance and quality.

Integrated vs. Dedicated Systems: The Database Dilemma

  • The conversation explores whether vector search should live within the primary database (like Postgres with pgvector) or a separate, dedicated search system (like Elasticsearch or Vespa).
  • Bergum acknowledges the significant improvements in pgvector (adding algorithms like HNSW, IVF, quantization support), making it a viable option for many use cases already using Postgres, especially at moderate scale.
  • HNSW (Hierarchical Navigable Small World): An efficient algorithm for approximate nearest neighbor search, commonly used in vector databases.
  • Strategic Consideration: For applications where search quality and performance at scale are critical business differentiators, a dedicated search/retrieval engine often provides more control and optimization capabilities than integrated database extensions. However, for simpler needs or existing Postgres users, pgvector can reduce infrastructure complexity.

RAG, Search, and Embeddings: An Intertwined Future

  • Embedding-based retrieval, long used in large-scale recommender systems (like TikTok's), is now converging with traditional search techniques.
  • Modern retrieval systems often use a multi-stage "cascade" approach: initial candidate retrieval (often embedding-based) followed by more sophisticated re-ranking layers.
  • Key Takeaway: RAG is fundamentally a search problem. Effective RAG systems will increasingly blend embedding-based (semantic) retrieval with keyword search, metadata filtering, and re-ranking, moving beyond simple vector lookups.

Practical Recommendations for Building RAG Systems

  • Bergum suggests a pragmatic sequence for building retrieval systems:
  • 1. Start with Data: Focus on cleaning and preparing your source data.
  • 2. Establish a Baseline: Use a strong classical algorithm like BM25 (keyword-based) for an initial performance benchmark.
  • BM25: A standard TF-IDF-like ranking function used in information retrieval to estimate relevance based on keyword matching and document statistics.
  • 3. Introduce Embeddings: Use an off-the-shelf embedding model and explore hybrid search capabilities offered by most engines.
  • 4. Add Re-ranking: If latency and cost permit, implement a re-ranking layer to refine the results from the initial retrieval stages.
  • He notes a personal adjustment from designing low-latency online systems to embracing easier-to-use (though potentially higher latency) API-based services for tasks like embedding generation in less demanding scenarios.

Critique of In-Database ML Processing

  • Bergum expresses skepticism about running complex ML logic (inference, agentic behavior, embedding generation) directly inside the database (e.g., via extensions like PostgresML).
  • He argues for keeping infrastructure components separate due to different scaling properties, potential complexity in expressing logic via SQL, and loss of control over performance and cost.
  • Investor Note: The trend of pushing ML workloads into databases presents potential developer experience challenges and operational complexities compared to more modular architectures. Evaluate claims of "all-in-one" database ML solutions critically.

Addressing Misconceptions: RAG is Not Dead

  • A key misunderstanding Bergum addresses is that his critique of the vector database category implies RAG itself is obsolete.
  • He firmly states: "RAG is definitely not dead... augmenting AI with retrieval or search is still going to be relevant... for a very long time."
  • The debate around long context windows (e.g., 10M tokens) versus RAG is nuanced. While very long contexts might negate the need for RAG in simple cases (e.g., chatting with a single PDF or a small set of documents), it doesn't scale for large knowledge bases.
  • He cites a small benchmark dataset (TREC COVID) having 170,000 documents equating to 36 million tokens – far too large to fit entirely into current context windows for a single query.

Perspective on Knowledge Graphs and Graph RAG

  • Bergum views Knowledge Graphs (KGs) as potentially powerful but highlights the primary challenge: building the graph (extracting entities and relationships accurately).
  • Knowledge Graph (KG): A data structure representing entities (nodes) and their relationships (edges).
  • He cautions against the assumption that "Graph RAG" necessitates a dedicated graph database. Graph exploration and retrieval can often be implemented using capable search engines.
  • While LLMs make KG creation easier than before, the bottleneck remains data quality and extraction, not necessarily the underlying storage technology.
  • Strategic Point: Focus on the data modeling and entity/relationship extraction pipeline quality before over-investing in specialized graph database infrastructure solely for RAG.

The Future of Embedding Models

  • There's a need for more domain-specific embedding models (e.g., for legal, finance, health) like those pioneered by Voyage (recently acquired by Nvidia).
  • Using Visual Language Models (VLMs) as a backbone for embeddings, processing visual document layouts directly without complex OCR pipelines, is a promising direction.
  • VLM (Visual Language Model): AI models capable of understanding and processing both text and image data simultaneously.
  • However, the business model for standalone embedding model providers is challenging due to compute costs and monetization difficulties, potentially leading to more acquisitions or integration into larger platforms. Gina AI is noted as doing good work, particularly with European languages.

Conclusion: Infrastructure Convergence and Hybrid Retrieval

  • The discussion highlights a clear trend away from standalone vector databases towards integrated vector capabilities within existing data stores or sophisticated, multi-modal search engines.
  • For Crypto AI investors and researchers, this signals a need to evaluate infrastructure choices based on scale and complexity, prioritizing robust, hybrid retrieval strategies over simplistic vector-only approaches.

Others You May Like