Semantic Vector Search and Other Topics to Win Friends and Lovers
The full search landscape: exact, fuzzy, semantic, hybrid — and when to layer all of them.
Search is not one thing, and semantic search is not a replacement for the rest of it.
“Find user with email dan@example.com” and “find me articles about debugging as a new engineer” are both described as search, but they have almost nothing in common as engineering problems. The first has a correct answer and an O(log n) index lookup. The second has no correct answer — only relevance — and requires understanding language, intent, and meaning.
The engineers who are most persuasive about search decisions — the ones who win the arguments and ship the right system — understand the whole landscape. They know which tool to reach for and why, and they can explain it clearly.
This article covers the semantic layer: what vector search actually does, when it wins, and where it should stay out of the way. The useful version is not “embed everything.” It is knowing when vectors belong beside lexical, fuzzy, and exact-match search in a hybrid architecture.
The lexical and fuzzy half of the picture — tsvector, pg_trgm, pg_search — is in Postgres Text Searching Guide 2026.
Terms at a Glance
Embedding — A dense list of floating-point numbers produced by a model, representing a piece of text (or image, audio, etc.) as a point in high-dimensional space. Semantically related content lands nearby; unrelated content lands far apart.
Lexical search — Search based on exact word and token matching. Fast, deterministic, and correct for known terms. Doesn’t understand synonyms, paraphrases, or cross-language equivalents.
Semantic search — Search based on meaning rather than tokens. A query for “how do I handle timeouts” can match a document titled “configuring retry policies” with no shared words, because their embeddings are geometrically close.
Vector — A list of numbers. In search contexts, the output of an embedding model. “Vector search” finds the vectors closest to a query vector by geometric distance.
FTS (Full-Text Search) — Postgres’s built-in lexical search, powered by tsvector / tsquery. Tokenizes, stems, and indexes text for keyword queries. Strong for prose and exact-term lookup; blind to meaning.
BM25 — A ranking algorithm for lexical search (used by Elasticsearch, Qdrant, and others). Scores results by term frequency weighted against how rare the term is across the corpus. Better than raw keyword matching; still lexical.
HNSW (Hierarchical Navigable Small World) — The standard approximate nearest-neighbor index for vector search. Builds a layered proximity graph for fast, high-recall similarity queries. pgvector, Qdrant, Weaviate, and most others use it.
RRF (Reciprocal Rank Fusion) — An algorithm for merging ranked result lists from multiple retrieval systems. Uses rank position only — no score normalization needed. A result that ranks high in both FTS and vector lists gets a stronger combined score than one that dominates only one.
What Semantic Search Actually Does
Vector embeddings convert text (or images, audio, etc.) into a list of numbers — a point in high-dimensional space. An embedding model is trained so that semantically related text lands nearby in that space. “Dog” and “canine” end up close. “Running a marathon” and “running a Python script” end up far apart despite sharing a word.
Similarity search in that space finds documents whose meaning is closest to the query’s meaning, regardless of exact word overlap.
This means:
- “How do I configure request timeouts?” can match an article titled “Setting connection limits and retry policies” — no overlapping keywords, high conceptual relevance
- “Something light for a summer evening” can match a wine recommendation without any keywords appearing in the product description
- A query in English can match relevant documents in French, Spanish, or Japanese if the embedding model was trained multilingually
Lexical search (tsvector, pg_trgm) can’t do any of this. It operates on words and characters, not meaning. The tools are not interchangeable — they solve different problems.
When pgvector Wins
Building RAG. Retrieval-Augmented Generation retrieves the document chunks whose meaning is closest to the user’s question, then passes them to a language model as context. This retrieval step is a vector operation. FTS will miss paraphrases, synonyms, and conceptual matches that a relevant chunk might express differently. The pgvector advantage over a standalone vector store: it runs inside your existing Postgres instance — no separate service to deploy, operate, or sync data into.
Users describe what they want, not what to search for. “Articles about building confidence as a new manager” has no keywords that reliably appear in the relevant posts. “A lightweight framework for handling side effects” may not use those exact words in the documentation. Vector search matches the intent, not the spelling.
Finding similar items. Related products, similar support tickets, duplicate bug reports, articles you might also like. “Find issues similar to this one” is a nearest-neighbor search — embed the item, find its geometric neighbors. One important caveat: vector search always returns results, even when nothing is genuinely similar. For dedup and recommendation use cases, filter by a minimum similarity threshold (e.g., cosine similarity ≥ 0.80) to avoid surfacing low-confidence matches as if they were meaningful.
Semantic deduplication. Before indexing content for RAG or search, you often need to identify near-duplicates in the corpus — articles revised multiple times, support tickets filed twice, knowledge base entries that overlap significantly. Embed the documents and threshold-filter by cosine similarity to flag or merge near-duplicates before they pollute your index. This prevents retrieval from returning multiple near-identical chunks and diluting the context window.
Multilingual search. Multilingual embedding models map semantically equivalent content across languages into nearby vectors. A query in Spanish for “perder peso” can match an English article on “sustainable weight loss habits” — no shared tokens, same underlying meaning. FTS requires per-language dictionary configuration and handles cross-language queries poorly. pg_trgm is language-agnostic but orthographic, not semantic.
Setting Up pgvector
From extension install to similarity query, the setup is a handful of SQL statements:
CREATE EXTENSION IF NOT EXISTS vector;
ALTER TABLE documents ADD COLUMN embedding vector(1536);
-- HNSW is usually the first index to try for moderate-size datasetsCREATE INDEX documents_embedding_idx ON documents USING hnsw (embedding vector_cosine_ops);
-- Semantic search querySELECT id, title, 1 - (embedding <=> $1::vector) AS similarityFROM documentsORDER BY embedding <=> $1::vectorLIMIT 10;<=> is cosine distance. 1 - cosine_distance gives cosine similarity (1.0 = identical, 0.0 = orthogonal). For ivfflat (the older, faster-to-build alternative), use lists = sqrt(row_count) as a starting point.
What pgvector Doesn’t Handle Well
- Exact token matching — product SKUs, error codes, function names.
ORD-12345is not semantically similar to anything. An embedding-based search may returnORD-12344or nothing relevant. Use FTS or a B-tree index. - Names and proper nouns. Embedding space organizes by meaning, not spelling. “Micheal Jordan” the user record doesn’t necessarily land near “Michael Jordan” in vector space.
- Short strings where character-level similarity matters more than meaning.
pg_trgmhandles this. - Queries where the exact term must appear. BM25 and FTS are more reliable for known-term matching.
Hybrid Search: The Case for Both
Technical documentation is the clearest example where neither tool is enough alone.
Users searching for “how to configure timeouts” need conceptual matching: an article titled “Setting retry policies and connection limits” has no overlapping keywords but is exactly what they need.
The same users also search for withRetry(), ECONNRESET, and ERR_SOCKET_TIMEOUT. These exact strings must appear — semantic matching may not find them reliably, and a false positive (conceptually similar but not the right API) is actively misleading.
Vector search handles the conceptual queries. FTS handles the exact terms. Neither handles both well alone.
The solution is hybrid search: run both and fuse the results.
Reciprocal Rank Fusion
Reciprocal Rank Fusion (RRF) is the standard algorithm for combining ranked lists from different retrieval systems. It doesn’t require normalizing scores across systems — it only uses rank positions. A result that appears high in both lists gets a stronger combined score than one that dominates only one.
WITH fts_results AS ( SELECT id, ROW_NUMBER() OVER (ORDER BY ts_rank(search_vector, query) DESC) AS rank FROM documents, to_tsquery('english', $1) query WHERE search_vector @@ query LIMIT 50),vector_results AS ( SELECT id, ROW_NUMBER() OVER (ORDER BY embedding <=> $2::vector) AS rank FROM documents ORDER BY embedding <=> $2::vector LIMIT 50),rrf AS ( SELECT COALESCE(f.id, v.id) AS id, COALESCE(1.0 / (60 + f.rank), 0) + COALESCE(1.0 / (60 + v.rank), 0) AS rrf_score FROM fts_results f FULL OUTER JOIN vector_results v ON f.id = v.id)SELECT d.id, d.title, rrf.rrf_scoreFROM rrfJOIN documents d ON d.id = rrf.idORDER BY rrf_score DESCLIMIT 10;The 60 in the denominator is the RRF constant. Higher values dampen rank-position differences; lower values amplify them. The default of 60 works well across most content types.
RRF avoids the harder problem of normalizing ts_rank (a log-frequency score) against cosine distance (a geometric measure). They’re not comparable. RRF only asks: “how high did this result appear in each list?”
Hybrid Search with Trigrams Too
For user-facing search over mixed content — where users might search for a person name, a concept, or an exact term in the same session — three-way fusion handles all of them:
WITH trgm_results AS ( SELECT id, ROW_NUMBER() OVER (ORDER BY similarity(title, $1) DESC) AS rank FROM documents WHERE title % $1 LIMIT 50),fts_results AS ( SELECT id, ROW_NUMBER() OVER (ORDER BY ts_rank(search_vector, to_tsquery('english', $1)) DESC) AS rank FROM documents WHERE search_vector @@ to_tsquery('english', $1) LIMIT 50),vector_results AS ( SELECT id, ROW_NUMBER() OVER (ORDER BY embedding <=> $2::vector) AS rank FROM documents ORDER BY embedding <=> $2::vector LIMIT 50),rrf AS ( SELECT COALESCE(t.id, f.id, v.id) AS id, COALESCE(1.0 / (60 + t.rank), 0) + COALESCE(1.0 / (60 + f.rank), 0) + COALESCE(1.0 / (60 + v.rank), 0) AS rrf_score FROM trgm_results t FULL OUTER JOIN fts_results f ON t.id = f.id FULL OUTER JOIN vector_results v ON COALESCE(t.id, f.id) = v.id)SELECT d.id, d.title, rrf.rrf_scoreFROM rrfJOIN documents d ON d.id = rrf.idORDER BY rrf_score DESCLIMIT 10;This handles: fuzzy name matches (trigrams), exact keyword matches (FTS), and conceptual queries (vector). A single search box can serve all three user intents.
Multi-Layer Hybrid Architectures
Real applications rarely have a single search surface. They have multiple, each with a different need:
| Surface | What users query | Recommended layers |
|---|---|---|
| Blog / documentation search | Keywords + concepts | FTS + pgvector (RRF) |
| User/customer name lookup | Names with typos | pg_trgm |
| Product search | Names, descriptions, “similar to” | pg_trgm + FTS + pgvector |
| Support ticket dedup | ”Issues similar to this one” | pgvector only |
| Internal SKU/order search | Exact identifiers | B-tree index |
| RAG over large knowledge base | Natural language questions | pgvector (chunked docs) |
| E-commerce “you might also like” | Behavioral + semantic similarity | pgvector |
| Autocomplete | Prefix, spelling-tolerant | pg_trgm |
These aren’t hypothetical. Most content-heavy applications need at least two distinct search surfaces with different query shapes. The temptation is to pick one approach and use it everywhere — usually vector search now, since it’s the fashionable choice. That leads to expensive embeddings for problems where a trigram index would have been faster, cheaper, and more correct.
The Rule of Thumb
Add a layer when a failure mode appears that the current layer can’t fix:
- Users complain about typos not matching → add
pg_trgm - Users search by concept and miss relevant results → add pgvector
- Users search for exact symbols or codes and get conceptual results instead → add FTS or check if you’re over-relying on vector search
- Latency becomes a problem → evaluate pre-filtering, approximate indexes, or a dedicated store
If You Do Need a Dedicated Vector Store
pgvector handles a lot of application search before you need another database. The rough cutoff depends on vector count, index settings, write rate, filters, hardware, and concurrency, so treat any “under 10M vectors” rule as a starting assumption to benchmark, not a product limit. When you genuinely outgrow it — very high concurrency, very low p99 latency requirements, billions of vectors, or serious multi-tenant isolation needs — the dedicated vector database landscape is wide and worth understanding.
What the Matrix Columns Actually Mean
Hybrid search means BM25 keyword search and vector similarity run in one query, merged via RRF. Without it, you either pick one search mode or fuse two queries yourself.
Sparse vectors go further than BM25. A SPLADE sparse vector has ~30,000 dimensions (one per vocabulary term), ~98% zeros. Non-zero positions tell you which terms matter and how much. A query for “dogs” also weights “canine” and “pet” — BM25-level precision plus term expansion inside a vector index. If this column is false, you need a separate FTS layer for exact-term queries.
# SPLADE: ~30,000 dims, ~60 non-zero — only relevant vocabulary positions firedef encode_splade(text: str) -> dict: tokens = tokenizer(text, return_tensors="pt", truncation=True, max_length=512) with torch.no_grad(): output = model(**tokens) vec = torch.log1p(torch.relu(output.logits)).max(dim=1).values.squeeze() return {"indices": vec.nonzero().squeeze().tolist(), "values": vec[vec != 0].tolist()}SQL / SQL-like is really about filtering. Vector search without filtering is a demo. You still need tenant scope, date ranges, permissions, and category filters. Full SQL (pgvector, LanceDB) expresses this beside your existing joins. Purpose-built databases use JSON filter objects (Qdrant, Pinecone), a query DSL (Elasticsearch, Milvus), or GraphQL (Weaviate). They work; SQL becomes more attractive as filter logic gets complex.
-- pgvector: vector similarity is just another expressionSELECT id, title, 1 - (embedding <=> $1) AS scoreFROM documentsWHERE tenant_id = $2 AND category = ANY($3::text[]) AND created_at > NOW() - INTERVAL '90 days'ORDER BY embedding <=> $1LIMIT 10;# Qdrant: equivalent filter as a Python object — same result, more ceremonyresults = client.query_points( collection_name="documents", query=query_embedding, query_filter=models.Filter(must=[ models.FieldCondition(key="tenant_id", match=models.MatchValue(value=tenant_id)), models.FieldCondition(key="category", match=models.MatchAny(any=categories)), models.FieldCondition(key="created_at", range=models.DatetimeRange(gte=cutoff)), ]), limit=10,)Multimodal native means the database ships embedding models for non-text content. You hand it a raw image URL; it handles vectorization. Most databases are embedding-agnostic — you own the embedding pipeline. Marqo and Weaviate (via CLIP/ImageBind modules) close this loop.
# Marqo: POST raw images, query with text — no external embedding stepmq.index("products").add_documents( [{"id": "shoe-001", "image": "https://cdn.example.com/shoes/001.jpg"}], tensor_fields=["image"])results = mq.index("products").search(q="lightweight shoes for summer")# Returns shoe-001 despite zero keyword overlap — CLIP handles the cross-modal matchDisk-based index is a cost lever. RAM-resident HNSW indexes can require several GB of RAM per million 1536-dimension vectors once raw vectors, graph overhead, and metadata are counted. Disk-native alternatives (Milvus DiskANN, Elasticsearch DiskBBQ, LanceDB’s Lance format, Turbopuffer’s object storage tier) often trade some query latency for lower infrastructure cost. For RAG workloads where model latency already dominates, that tradeoff is frequently worth benchmarking.
Max dimensions is a migration hiding in your architecture. text-embedding-3-large uses 3072 dims, Jina v3 can emit larger embeddings, and research models keep pushing higher. Some managed services publish hard dimension caps; others document high caps or no practical cap for typical embedding models. Check current docs before committing. Pick something with headroom; migrating a vector index because you hit a dimension ceiling is a painful sprint.
The Landscape
| Database | Deployment | License | Hybrid Search | Sparse Vectors | SQL / SQL-like | Multimodal | Disk Index | Max Dims | Sweet Spot |
|---|---|---|---|---|---|---|---|---|---|
| pgvector | Self-host / managed (Supabase, Neon, RDS) | OSS (PostgreSQL) | Manual (RRF via SQL) | ❌ | ✅ Full SQL | ❌ | ✅ HNSW on disk | 16,000 storage; 2,000 indexed vector | Already on Postgres; moderate vector counts |
| Qdrant | Self-host / Cloud | Apache 2.0 | ✅ Native BM25 | ✅ Mature support | ❌ (REST/gRPC) | ❌ | ✅ | 65,535 | Filtered queries at scale; complex metadata |
| Weaviate | Self-host / Cloud | BSD 3 | ✅ Native BM25 + RRF | ✅ | ❌ (GraphQL / gRPC) | ✅ via modules | ✅ | 65,535 | GraphQL access patterns; built-in vectorization |
| Pinecone | Cloud only | Proprietary | ✅ (added 2024) | ✅ | ❌ | ❌ | ✅ (serverless) | 20,000 | Managed simplicity; no ops team |
| Milvus / Zilliz | Self-host / Cloud (Zilliz) | Apache 2.0 | ✅ Native | ✅ | ✅ SQL-like (Milvus Query Language) | ✅ | ✅ DiskANN | 32,768 | Billion-scale; enterprise on-prem |
| Chroma | Embedded / self-host | Apache 2.0 | ❌ | ❌ | ❌ | ❌ | ❌ | 65,535 | Local dev and prototyping only |
| LanceDB | Embedded / Cloud | Apache 2.0 | ✅ | ❌ | ✅ SQL via DataFusion | ✅ Native | ✅ (Lance format) | Unlimited | Edge / serverless; multimodal lakehouse |
| Orama | Embedded / Cloud | Apache 2.0 | ✅ Full-text + vector | ❌ | ❌ | ❌ | ❌ | Varies | JS/edge apps; lightweight site/app search |
| Turbopuffer | Cloud only (serverless) | Proprietary | ✅ BM25 + vector | ❌ | ❌ | ❌ | ✅ (object storage) | 16,000 | Multi-tenant SaaS; millions of namespaces |
| Elasticsearch | Self-host / Elastic Cloud | SSPL / AGPLv3 | ✅ RRF + ELSER sparse | ✅ (ELSER) | ✅ Query DSL | ❌ | ✅ DiskBBQ | 4,096 | Already on Elastic stack; hybrid enterprise search |
| OpenSearch | Self-host / AWS managed | Apache 2.0 | ✅ RRF + Neural Search | ✅ | ✅ Query DSL | ❌ | ✅ FAISS + HNSW | 16,000 | AWS-native; open-source Elastic alternative |
| Vespa | Self-host / Cloud | Apache 2.0 | ✅ Native | ✅ Tensors / lexical ranking | ✅ YQL | ✅ Tensors | ✅ | Effectively unbounded | Search + ranking + recommendation systems |
| ClickHouse | Self-host / Cloud | Apache 2.0 | Manual | ❌ | ✅ Full SQL | ❌ | ✅ Columnar + HNSW | Varies | Analytics/logs with vector search beside OLAP |
| MongoDB Atlas | Cloud / self-host | SSPL | ✅ Built-in | ❌ | ✅ MQL + aggregation | ❌ | ✅ HNSW | 8,192 | Already on MongoDB; document + vector in one |
| Redis (VSS) | Self-host / Redis Cloud | RSALv2 / SSPL | ✅ (RediSearch) | ✅ | ❌ | ❌ | ❌ RAM-only | 32,768 | Ultra-low latency; cache-layer vector search |
| Marqo | Cloud / self-host | Apache 2.0 | ✅ | ❌ | ❌ | ✅ Native focus | ✅ | Varies | End-to-end multimodal: image + text + video |
A Few Things That Don’t Fit in the Table
Turbopuffer’s multi-tenancy is built around very high namespace counts. Its public positioning and customer stories emphasize workloads like Notion’s large, namespace-heavy corpus. If each user or organization needs isolated vector search, that architecture can change the economics, but still benchmark your own tenant shape.
LanceDB embedded mode is the closest thing to “SQLite for vector search.” It runs in-process, requires no server, and works in Lambda, Cloudflare Workers, and edge environments. The Lance columnar format makes embedded operation practical at real scale.
Chroma is strongest at dev/test and small app deployments. If you are aiming at very large corpora, HA, disk-heavy operation, or first-class hybrid search, evaluate a production-oriented store before promoting the prototype into infrastructure.
Vespa is what you reach for when retrieval is only half the product. It combines lexical retrieval, nearest-neighbor search, tensors, ranking expressions, grouping, and online serving. That power is real, but so is the operational and modeling complexity. It fits search/recommendation teams more than “add semantic search to my CRUD app.”
ClickHouse belongs in the conversation when search is attached to analytics. If your source of truth is events, logs, traces, or metrics, ClickHouse keeps vector distance, filtering, aggregation, and serious full-text indexing in one SQL engine. Not a purpose-built vector database, but often the boring-right answer for analytical retrieval.
Sparse vectors are how you get BM25-quality keyword matching inside a vector index — without running a separate full-text engine. Qdrant and Elasticsearch have especially mature implementations here. If hybrid search is critical and a two-system architecture is a deal-breaker, sparse vector support is what to look for.
Choosing When You’ve Outgrown pgvector
- SaaS product with per-tenant isolation → Turbopuffer
- Complex metadata filtering at scale → Qdrant
- Already on Elastic/ELK stack → Elasticsearch with DiskBBQ
- AWS shop that wants open-source → OpenSearch
- Search/recommendation platform with serious ranking needs → Vespa
- Analytics, observability, log/event search → ClickHouse
- Billion-scale on-prem / self-hosted → Milvus
- Edge / serverless / multimodal → LanceDB
- Small JS app, docs site, or edge-native search UX → Orama
- Zero ops, cost is secondary → Pinecone
- Multimodal-first (images, video, audio) → Marqo
- Already on MongoDB → Atlas Vector Search
- Already on Postgres, need more headroom → Supabase Vector or Neon (both pgvector managed, with better tooling)
The One Thing to Not Do
Don’t use vector search as fuzzy text search for things that have correct answers.
“Find me the user with email dan@example.com” is not a vector search problem. “Find the order with ID ORD-12345” is not either. Embedding ORD-12345 and searching by cosine similarity will return something — but it may be wrong. An identifier has a correct answer. An approximate match on an identifier is a bug.
Vector search returns the most similar thing in your dataset, even when nothing is actually relevant. It doesn’t know when no good answer exists. That’s fine for related documents. It’s a serious problem for exact record lookup, where a confident wrong answer is worse than an empty result.
The same applies in the other direction: don’t use FTS for queries where the user is describing a concept. “articles about making hard decisions under uncertainty” contains no reliable keywords. FTS will either return noise or nothing. Use the right tool for the query shape.
The Full Picture
Most production search systems need more than one layer:
pg_trgmfor names, typos, autocomplete- FTS /
pg_searchfor keyword-based prose search - pgvector for semantic and conceptual queries
- RRF fusion for surfaces where users mix query types
- Regular indexes for exact identifiers, filters, and sorted lists
These are not competing tools. They’re complementary. A well-built search system picks the right layer for each query shape — and when query shapes overlap, it runs multiple layers and fuses the results.
The teams that ship good search features understand the whole stack. The ones that don’t reach for a vector database, embed everything, and wonder why exact lookups sometimes return the wrong record.