This episode dissects the rapid rise and potential fall of the dedicated vector database category, exploring why traditional search paradigms might be reclaiming their dominance in the AI retrieval landscape and what this means for AI infrastructure investment.
Meet Jo Kristian Bergum & Setting the Stage
- Jo Kristian Bergum, based in Trondheim, Norway (home to a strong technical university), brings two decades of experience in search infrastructure from roles at Yahoo, Fast Search & Transfer, and Vespa.
- His recent focus includes embeddings and neural search, leading up to the pivotal "ChatGPT moment" in late 2022.
- Bergum notes the influence of early OpenAI cookbooks that linked connecting LLMs to data primarily through embeddings and vector databases, setting the stage for a specific infrastructure trend.
The "Rise and Fall of Vector Databases" Thesis
- Bergum clarifies his motivation for writing his widely discussed piece: the framing of vector databases (pioneered by companies like Pinecone) as a new, essential infrastructure category for AI, particularly for handling embeddings.
- He argues that while embeddings are crucial, the idea that a separate vector database is mandatory for AI is flawed.
- His core thesis isn't that companies like Pinecone or Chroma are failing, but that the distinct category of "vector database" is dissolving as vector search capabilities become integrated into existing databases and search engines.
- "I'm not saying that the companies are dying, but I'm saying that the category is dying," Bergum emphasizes, distinguishing between company viability and category definition.
Market Dynamics and Convergence
- The rapid ascent and perceived cooling-off of pioneers like Pinecone are discussed, alongside their recent repositioning towards a more developer-focused message, away from broader "memory for AI" slogans.
- Competition from newer players (e.g., Turbopuffer) and the integration of vector search into established systems (Postgres via pgvector, Elasticsearch, Vespa, etc.) are driving this convergence.
- Strategic Insight: The rapid commoditization of basic vector search functionality within existing data platforms challenges the long-term defensibility of standalone vector database providers, signaling a potential market consolidation or pivot for these companies. Investors should scrutinize the unique value proposition beyond basic vector indexing.
Search as the Natural Abstraction for RAG
- Bergum posits that "search" is a more natural and robust abstraction than "vector database" for connecting AI models with knowledge, a process often termed Retrieval-Augmented Generation (RAG).
- RAG (Retrieval-Augmented Generation): A technique where an AI model retrieves relevant information from an external knowledge base before generating a response, improving accuracy and grounding answers in specific data.
- He references tools like Perplexity AI (using its "cascade mode") which employ various search tools (code search, web search, semantic search) as evidence of this more flexible approach.
- The specific method of search (keyword, semantic vector search, hybrid) becomes an implementation detail under the broader "search" umbrella, rather than defining the core infrastructure category.
The Enduring Importance (and Mainstreaming) of Embeddings
- While critical of the vector database category, Bergum strongly affirms the importance of embeddings.
- Embeddings: Numerical representations (vectors) of data (text, images, etc.) that capture semantic meaning, allowing mathematical operations like similarity comparisons.
- Embeddings allow representation of diverse data types and enable manipulation in vector space for domain adaptation.
- The key shift was embeddings moving from specialized use in large tech companies to mainstream accessibility for all developers via APIs (like OpenAI's), fueling the initial vector database boom.
- However, he cautions that effective search requires more than just vector similarity (e.g., cosine similarity), citing the need for signals like freshness, authority, and metadata filtering – hallmarks of traditional search systems.
- Actionable Insight: Relying solely on semantic similarity via embeddings for retrieval is insufficient for robust applications. Researchers and builders must incorporate hybrid approaches and traditional search signals for relevance and quality.
Integrated vs. Dedicated Systems: The Database Dilemma
- The conversation explores whether vector search should live within the primary database (like Postgres with pgvector) or a separate, dedicated search system (like Elasticsearch or Vespa).
- Bergum acknowledges the significant improvements in pgvector (adding algorithms like HNSW, IVF, quantization support), making it a viable option for many use cases already using Postgres, especially at moderate scale.
- HNSW (Hierarchical Navigable Small World): An efficient algorithm for approximate nearest neighbor search, commonly used in vector databases.
- Strategic Consideration: For applications where search quality and performance at scale are critical business differentiators, a dedicated search/retrieval engine often provides more control and optimization capabilities than integrated database extensions. However, for simpler needs or existing Postgres users, pgvector can reduce infrastructure complexity.
RAG, Search, and Embeddings: An Intertwined Future
- Embedding-based retrieval, long used in large-scale recommender systems (like TikTok's), is now converging with traditional search techniques.
- Modern retrieval systems often use a multi-stage "cascade" approach: initial candidate retrieval (often embedding-based) followed by more sophisticated re-ranking layers.
- Key Takeaway: RAG is fundamentally a search problem. Effective RAG systems will increasingly blend embedding-based (semantic) retrieval with keyword search, metadata filtering, and re-ranking, moving beyond simple vector lookups.
Practical Recommendations for Building RAG Systems
- Bergum suggests a pragmatic sequence for building retrieval systems:
- 1. Start with Data: Focus on cleaning and preparing your source data.
- 2. Establish a Baseline: Use a strong classical algorithm like BM25 (keyword-based) for an initial performance benchmark.
- BM25: A standard TF-IDF-like ranking function used in information retrieval to estimate relevance based on keyword matching and document statistics.
- 3. Introduce Embeddings: Use an off-the-shelf embedding model and explore hybrid search capabilities offered by most engines.
- 4. Add Re-ranking: If latency and cost permit, implement a re-ranking layer to refine the results from the initial retrieval stages.
- He notes a personal adjustment from designing low-latency online systems to embracing easier-to-use (though potentially higher latency) API-based services for tasks like embedding generation in less demanding scenarios.
Critique of In-Database ML Processing
- Bergum expresses skepticism about running complex ML logic (inference, agentic behavior, embedding generation) directly inside the database (e.g., via extensions like PostgresML).
- He argues for keeping infrastructure components separate due to different scaling properties, potential complexity in expressing logic via SQL, and loss of control over performance and cost.
- Investor Note: The trend of pushing ML workloads into databases presents potential developer experience challenges and operational complexities compared to more modular architectures. Evaluate claims of "all-in-one" database ML solutions critically.
Addressing Misconceptions: RAG is Not Dead
- A key misunderstanding Bergum addresses is that his critique of the vector database category implies RAG itself is obsolete.
- He firmly states: "RAG is definitely not dead... augmenting AI with retrieval or search is still going to be relevant... for a very long time."
- The debate around long context windows (e.g., 10M tokens) versus RAG is nuanced. While very long contexts might negate the need for RAG in simple cases (e.g., chatting with a single PDF or a small set of documents), it doesn't scale for large knowledge bases.
- He cites a small benchmark dataset (TREC COVID) having 170,000 documents equating to 36 million tokens – far too large to fit entirely into current context windows for a single query.
Perspective on Knowledge Graphs and Graph RAG
- Bergum views Knowledge Graphs (KGs) as potentially powerful but highlights the primary challenge: building the graph (extracting entities and relationships accurately).
- Knowledge Graph (KG): A data structure representing entities (nodes) and their relationships (edges).
- He cautions against the assumption that "Graph RAG" necessitates a dedicated graph database. Graph exploration and retrieval can often be implemented using capable search engines.
- While LLMs make KG creation easier than before, the bottleneck remains data quality and extraction, not necessarily the underlying storage technology.
- Strategic Point: Focus on the data modeling and entity/relationship extraction pipeline quality before over-investing in specialized graph database infrastructure solely for RAG.
The Future of Embedding Models
- There's a need for more domain-specific embedding models (e.g., for legal, finance, health) like those pioneered by Voyage (recently acquired by Nvidia).
- Using Visual Language Models (VLMs) as a backbone for embeddings, processing visual document layouts directly without complex OCR pipelines, is a promising direction.
- VLM (Visual Language Model): AI models capable of understanding and processing both text and image data simultaneously.
- However, the business model for standalone embedding model providers is challenging due to compute costs and monetization difficulties, potentially leading to more acquisitions or integration into larger platforms. Gina AI is noted as doing good work, particularly with European languages.
Conclusion: Infrastructure Convergence and Hybrid Retrieval
- The discussion highlights a clear trend away from standalone vector databases towards integrated vector capabilities within existing data stores or sophisticated, multi-modal search engines.
- For Crypto AI investors and researchers, this signals a need to evaluate infrastructure choices based on scale and complexity, prioritizing robust, hybrid retrieval strategies over simplistic vector-only approaches.