DanLevy.net

Semantic Vector Search and Other Topics to Win Friends and Lovers

The full search landscape: exact, fuzzy, semantic, hybrid — and when to layer all of them.

Search is not one thing, and semantic search is not a replacement for the rest of it.

“Find user with email dan@example.com” and “find me articles about debugging as a new engineer” are both described as search, but they have almost nothing in common as engineering problems. The first has a correct answer and an O(log n) index lookup. The second has no correct answer — only relevance — and requires understanding language, intent, and meaning.

The engineers who are most persuasive about search decisions — the ones who win the arguments and ship the right system — understand the whole landscape. They know which tool to reach for and why, and they can explain it clearly.

This article covers the semantic layer: what vector search actually does, when it wins, and where it should stay out of the way. The useful version is not “embed everything.” It is knowing when vectors belong beside lexical, fuzzy, and exact-match search in a hybrid architecture.

The lexical and fuzzy half of the picture — tsvector, pg_trgm, pg_search — is in Postgres Text Searching Guide 2026.


Terms at a Glance

Embedding — A dense list of floating-point numbers produced by a model, representing a piece of text (or image, audio, etc.) as a point in high-dimensional space. Semantically related content lands nearby; unrelated content lands far apart.

Lexical search — Search based on exact word and token matching. Fast, deterministic, and correct for known terms. Doesn’t understand synonyms, paraphrases, or cross-language equivalents.

Semantic search — Search based on meaning rather than tokens. A query for “how do I handle timeouts” can match a document titled “configuring retry policies” with no shared words, because their embeddings are geometrically close.

Vector — A list of numbers. In search contexts, the output of an embedding model. “Vector search” finds the vectors closest to a query vector by geometric distance.

FTS (Full-Text Search) — Postgres’s built-in lexical search, powered by tsvector / tsquery. Tokenizes, stems, and indexes text for keyword queries. Strong for prose and exact-term lookup; blind to meaning.

BM25 — A ranking algorithm for lexical search (used by Elasticsearch, Qdrant, and others). Scores results by term frequency weighted against how rare the term is across the corpus. Better than raw keyword matching; still lexical.

HNSW (Hierarchical Navigable Small World) — The standard approximate nearest-neighbor index for vector search. Builds a layered proximity graph for fast, high-recall similarity queries. pgvector, Qdrant, Weaviate, and most others use it.

RRF (Reciprocal Rank Fusion) — An algorithm for merging ranked result lists from multiple retrieval systems. Uses rank position only — no score normalization needed. A result that ranks high in both FTS and vector lists gets a stronger combined score than one that dominates only one.


What Semantic Search Actually Does

Vector embeddings convert text (or images, audio, etc.) into a list of numbers — a point in high-dimensional space. An embedding model is trained so that semantically related text lands nearby in that space. “Dog” and “canine” end up close. “Running a marathon” and “running a Python script” end up far apart despite sharing a word.

Similarity search in that space finds documents whose meaning is closest to the query’s meaning, regardless of exact word overlap.

This means:

Lexical search (tsvector, pg_trgm) can’t do any of this. It operates on words and characters, not meaning. The tools are not interchangeable — they solve different problems.


When pgvector Wins

Building RAG. Retrieval-Augmented Generation retrieves the document chunks whose meaning is closest to the user’s question, then passes them to a language model as context. This retrieval step is a vector operation. FTS will miss paraphrases, synonyms, and conceptual matches that a relevant chunk might express differently. The pgvector advantage over a standalone vector store: it runs inside your existing Postgres instance — no separate service to deploy, operate, or sync data into.

Users describe what they want, not what to search for. “Articles about building confidence as a new manager” has no keywords that reliably appear in the relevant posts. “A lightweight framework for handling side effects” may not use those exact words in the documentation. Vector search matches the intent, not the spelling.

Finding similar items. Related products, similar support tickets, duplicate bug reports, articles you might also like. “Find issues similar to this one” is a nearest-neighbor search — embed the item, find its geometric neighbors. One important caveat: vector search always returns results, even when nothing is genuinely similar. For dedup and recommendation use cases, filter by a minimum similarity threshold (e.g., cosine similarity ≥ 0.80) to avoid surfacing low-confidence matches as if they were meaningful.

Semantic deduplication. Before indexing content for RAG or search, you often need to identify near-duplicates in the corpus — articles revised multiple times, support tickets filed twice, knowledge base entries that overlap significantly. Embed the documents and threshold-filter by cosine similarity to flag or merge near-duplicates before they pollute your index. This prevents retrieval from returning multiple near-identical chunks and diluting the context window.

Multilingual search. Multilingual embedding models map semantically equivalent content across languages into nearby vectors. A query in Spanish for “perder peso” can match an English article on “sustainable weight loss habits” — no shared tokens, same underlying meaning. FTS requires per-language dictionary configuration and handles cross-language queries poorly. pg_trgm is language-agnostic but orthographic, not semantic.

Setting Up pgvector

From extension install to similarity query, the setup is a handful of SQL statements:

CREATE EXTENSION IF NOT EXISTS vector;
ALTER TABLE documents ADD COLUMN embedding vector(1536);
-- HNSW is usually the first index to try for moderate-size datasets
CREATE INDEX documents_embedding_idx
ON documents USING hnsw (embedding vector_cosine_ops);
-- Semantic search query
SELECT id, title, 1 - (embedding <=> $1::vector) AS similarity
FROM documents
ORDER BY embedding <=> $1::vector
LIMIT 10;

<=> is cosine distance. 1 - cosine_distance gives cosine similarity (1.0 = identical, 0.0 = orthogonal). For ivfflat (the older, faster-to-build alternative), use lists = sqrt(row_count) as a starting point.

What pgvector Doesn’t Handle Well


Hybrid Search: The Case for Both

Technical documentation is the clearest example where neither tool is enough alone.

Users searching for “how to configure timeouts” need conceptual matching: an article titled “Setting retry policies and connection limits” has no overlapping keywords but is exactly what they need.

The same users also search for withRetry(), ECONNRESET, and ERR_SOCKET_TIMEOUT. These exact strings must appear — semantic matching may not find them reliably, and a false positive (conceptually similar but not the right API) is actively misleading.

Vector search handles the conceptual queries. FTS handles the exact terms. Neither handles both well alone.

The solution is hybrid search: run both and fuse the results.

Reciprocal Rank Fusion

Reciprocal Rank Fusion (RRF) is the standard algorithm for combining ranked lists from different retrieval systems. It doesn’t require normalizing scores across systems — it only uses rank positions. A result that appears high in both lists gets a stronger combined score than one that dominates only one.

WITH fts_results AS (
SELECT id,
ROW_NUMBER() OVER (ORDER BY ts_rank(search_vector, query) DESC) AS rank
FROM documents, to_tsquery('english', $1) query
WHERE search_vector @@ query
LIMIT 50
),
vector_results AS (
SELECT id,
ROW_NUMBER() OVER (ORDER BY embedding <=> $2::vector) AS rank
FROM documents
ORDER BY embedding <=> $2::vector
LIMIT 50
),
rrf AS (
SELECT
COALESCE(f.id, v.id) AS id,
COALESCE(1.0 / (60 + f.rank), 0) +
COALESCE(1.0 / (60 + v.rank), 0) AS rrf_score
FROM fts_results f
FULL OUTER JOIN vector_results v ON f.id = v.id
)
SELECT d.id, d.title, rrf.rrf_score
FROM rrf
JOIN documents d ON d.id = rrf.id
ORDER BY rrf_score DESC
LIMIT 10;

The 60 in the denominator is the RRF constant. Higher values dampen rank-position differences; lower values amplify them. The default of 60 works well across most content types.

RRF avoids the harder problem of normalizing ts_rank (a log-frequency score) against cosine distance (a geometric measure). They’re not comparable. RRF only asks: “how high did this result appear in each list?”

Hybrid Search with Trigrams Too

For user-facing search over mixed content — where users might search for a person name, a concept, or an exact term in the same session — three-way fusion handles all of them:

WITH trgm_results AS (
SELECT id,
ROW_NUMBER() OVER (ORDER BY similarity(title, $1) DESC) AS rank
FROM documents
WHERE title % $1
LIMIT 50
),
fts_results AS (
SELECT id,
ROW_NUMBER() OVER (ORDER BY ts_rank(search_vector, to_tsquery('english', $1)) DESC) AS rank
FROM documents
WHERE search_vector @@ to_tsquery('english', $1)
LIMIT 50
),
vector_results AS (
SELECT id,
ROW_NUMBER() OVER (ORDER BY embedding <=> $2::vector) AS rank
FROM documents
ORDER BY embedding <=> $2::vector
LIMIT 50
),
rrf AS (
SELECT
COALESCE(t.id, f.id, v.id) AS id,
COALESCE(1.0 / (60 + t.rank), 0) +
COALESCE(1.0 / (60 + f.rank), 0) +
COALESCE(1.0 / (60 + v.rank), 0) AS rrf_score
FROM trgm_results t
FULL OUTER JOIN fts_results f ON t.id = f.id
FULL OUTER JOIN vector_results v ON COALESCE(t.id, f.id) = v.id
)
SELECT d.id, d.title, rrf.rrf_score
FROM rrf
JOIN documents d ON d.id = rrf.id
ORDER BY rrf_score DESC
LIMIT 10;

This handles: fuzzy name matches (trigrams), exact keyword matches (FTS), and conceptual queries (vector). A single search box can serve all three user intents.


Multi-Layer Hybrid Architectures

Real applications rarely have a single search surface. They have multiple, each with a different need:

SurfaceWhat users queryRecommended layers
Blog / documentation searchKeywords + conceptsFTS + pgvector (RRF)
User/customer name lookupNames with typospg_trgm
Product searchNames, descriptions, “similar to”pg_trgm + FTS + pgvector
Support ticket dedup”Issues similar to this one”pgvector only
Internal SKU/order searchExact identifiersB-tree index
RAG over large knowledge baseNatural language questionspgvector (chunked docs)
E-commerce “you might also like”Behavioral + semantic similaritypgvector
AutocompletePrefix, spelling-tolerantpg_trgm

These aren’t hypothetical. Most content-heavy applications need at least two distinct search surfaces with different query shapes. The temptation is to pick one approach and use it everywhere — usually vector search now, since it’s the fashionable choice. That leads to expensive embeddings for problems where a trigram index would have been faster, cheaper, and more correct.

The Rule of Thumb

Add a layer when a failure mode appears that the current layer can’t fix:


If You Do Need a Dedicated Vector Store

pgvector handles a lot of application search before you need another database. The rough cutoff depends on vector count, index settings, write rate, filters, hardware, and concurrency, so treat any “under 10M vectors” rule as a starting assumption to benchmark, not a product limit. When you genuinely outgrow it — very high concurrency, very low p99 latency requirements, billions of vectors, or serious multi-tenant isolation needs — the dedicated vector database landscape is wide and worth understanding.

What the Matrix Columns Actually Mean

Hybrid search means BM25 keyword search and vector similarity run in one query, merged via RRF. Without it, you either pick one search mode or fuse two queries yourself.

Sparse vectors go further than BM25. A SPLADE sparse vector has ~30,000 dimensions (one per vocabulary term), ~98% zeros. Non-zero positions tell you which terms matter and how much. A query for “dogs” also weights “canine” and “pet” — BM25-level precision plus term expansion inside a vector index. If this column is false, you need a separate FTS layer for exact-term queries.

# SPLADE: ~30,000 dims, ~60 non-zero — only relevant vocabulary positions fire
def encode_splade(text: str) -> dict:
tokens = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
output = model(**tokens)
vec = torch.log1p(torch.relu(output.logits)).max(dim=1).values.squeeze()
return {"indices": vec.nonzero().squeeze().tolist(), "values": vec[vec != 0].tolist()}

SQL / SQL-like is really about filtering. Vector search without filtering is a demo. You still need tenant scope, date ranges, permissions, and category filters. Full SQL (pgvector, LanceDB) expresses this beside your existing joins. Purpose-built databases use JSON filter objects (Qdrant, Pinecone), a query DSL (Elasticsearch, Milvus), or GraphQL (Weaviate). They work; SQL becomes more attractive as filter logic gets complex.

-- pgvector: vector similarity is just another expression
SELECT id, title, 1 - (embedding <=> $1) AS score
FROM documents
WHERE tenant_id = $2
AND category = ANY($3::text[])
AND created_at > NOW() - INTERVAL '90 days'
ORDER BY embedding <=> $1
LIMIT 10;
# Qdrant: equivalent filter as a Python object — same result, more ceremony
results = client.query_points(
collection_name="documents", query=query_embedding,
query_filter=models.Filter(must=[
models.FieldCondition(key="tenant_id", match=models.MatchValue(value=tenant_id)),
models.FieldCondition(key="category", match=models.MatchAny(any=categories)),
models.FieldCondition(key="created_at", range=models.DatetimeRange(gte=cutoff)),
]),
limit=10,
)

Multimodal native means the database ships embedding models for non-text content. You hand it a raw image URL; it handles vectorization. Most databases are embedding-agnostic — you own the embedding pipeline. Marqo and Weaviate (via CLIP/ImageBind modules) close this loop.

# Marqo: POST raw images, query with text — no external embedding step
mq.index("products").add_documents(
[{"id": "shoe-001", "image": "https://cdn.example.com/shoes/001.jpg"}],
tensor_fields=["image"]
)
results = mq.index("products").search(q="lightweight shoes for summer")
# Returns shoe-001 despite zero keyword overlap — CLIP handles the cross-modal match

Disk-based index is a cost lever. RAM-resident HNSW indexes can require several GB of RAM per million 1536-dimension vectors once raw vectors, graph overhead, and metadata are counted. Disk-native alternatives (Milvus DiskANN, Elasticsearch DiskBBQ, LanceDB’s Lance format, Turbopuffer’s object storage tier) often trade some query latency for lower infrastructure cost. For RAG workloads where model latency already dominates, that tradeoff is frequently worth benchmarking.

Max dimensions is a migration hiding in your architecture. text-embedding-3-large uses 3072 dims, Jina v3 can emit larger embeddings, and research models keep pushing higher. Some managed services publish hard dimension caps; others document high caps or no practical cap for typical embedding models. Check current docs before committing. Pick something with headroom; migrating a vector index because you hit a dimension ceiling is a painful sprint.

The Landscape

DatabaseDeploymentLicenseHybrid SearchSparse VectorsSQL / SQL-likeMultimodalDisk IndexMax DimsSweet Spot
pgvectorSelf-host / managed (Supabase, Neon, RDS)OSS (PostgreSQL)Manual (RRF via SQL)✅ Full SQL✅ HNSW on disk16,000 storage; 2,000 indexed vectorAlready on Postgres; moderate vector counts
QdrantSelf-host / CloudApache 2.0✅ Native BM25✅ Mature support❌ (REST/gRPC)65,535Filtered queries at scale; complex metadata
WeaviateSelf-host / CloudBSD 3✅ Native BM25 + RRF❌ (GraphQL / gRPC)✅ via modules65,535GraphQL access patterns; built-in vectorization
PineconeCloud onlyProprietary✅ (added 2024)✅ (serverless)20,000Managed simplicity; no ops team
Milvus / ZillizSelf-host / Cloud (Zilliz)Apache 2.0✅ Native✅ SQL-like (Milvus Query Language)✅ DiskANN32,768Billion-scale; enterprise on-prem
ChromaEmbedded / self-hostApache 2.065,535Local dev and prototyping only
LanceDBEmbedded / CloudApache 2.0✅ SQL via DataFusion✅ Native✅ (Lance format)UnlimitedEdge / serverless; multimodal lakehouse
OramaEmbedded / CloudApache 2.0✅ Full-text + vectorVariesJS/edge apps; lightweight site/app search
TurbopufferCloud only (serverless)Proprietary✅ BM25 + vector✅ (object storage)16,000Multi-tenant SaaS; millions of namespaces
ElasticsearchSelf-host / Elastic CloudSSPL / AGPLv3✅ RRF + ELSER sparse✅ (ELSER)✅ Query DSL✅ DiskBBQ4,096Already on Elastic stack; hybrid enterprise search
OpenSearchSelf-host / AWS managedApache 2.0✅ RRF + Neural Search✅ Query DSL✅ FAISS + HNSW16,000AWS-native; open-source Elastic alternative
VespaSelf-host / CloudApache 2.0✅ Native✅ Tensors / lexical ranking✅ YQL✅ TensorsEffectively unboundedSearch + ranking + recommendation systems
ClickHouseSelf-host / CloudApache 2.0Manual✅ Full SQL✅ Columnar + HNSWVariesAnalytics/logs with vector search beside OLAP
MongoDB AtlasCloud / self-hostSSPL✅ Built-in✅ MQL + aggregation✅ HNSW8,192Already on MongoDB; document + vector in one
Redis (VSS)Self-host / Redis CloudRSALv2 / SSPL✅ (RediSearch)❌ RAM-only32,768Ultra-low latency; cache-layer vector search
MarqoCloud / self-hostApache 2.0✅ Native focusVariesEnd-to-end multimodal: image + text + video

A Few Things That Don’t Fit in the Table

Turbopuffer’s multi-tenancy is built around very high namespace counts. Its public positioning and customer stories emphasize workloads like Notion’s large, namespace-heavy corpus. If each user or organization needs isolated vector search, that architecture can change the economics, but still benchmark your own tenant shape.

LanceDB embedded mode is the closest thing to “SQLite for vector search.” It runs in-process, requires no server, and works in Lambda, Cloudflare Workers, and edge environments. The Lance columnar format makes embedded operation practical at real scale.

Chroma is strongest at dev/test and small app deployments. If you are aiming at very large corpora, HA, disk-heavy operation, or first-class hybrid search, evaluate a production-oriented store before promoting the prototype into infrastructure.

Vespa is what you reach for when retrieval is only half the product. It combines lexical retrieval, nearest-neighbor search, tensors, ranking expressions, grouping, and online serving. That power is real, but so is the operational and modeling complexity. It fits search/recommendation teams more than “add semantic search to my CRUD app.”

ClickHouse belongs in the conversation when search is attached to analytics. If your source of truth is events, logs, traces, or metrics, ClickHouse keeps vector distance, filtering, aggregation, and serious full-text indexing in one SQL engine. Not a purpose-built vector database, but often the boring-right answer for analytical retrieval.

Sparse vectors are how you get BM25-quality keyword matching inside a vector index — without running a separate full-text engine. Qdrant and Elasticsearch have especially mature implementations here. If hybrid search is critical and a two-system architecture is a deal-breaker, sparse vector support is what to look for.

Choosing When You’ve Outgrown pgvector


The One Thing to Not Do

Don’t use vector search as fuzzy text search for things that have correct answers.

“Find me the user with email dan@example.com” is not a vector search problem. “Find the order with ID ORD-12345” is not either. Embedding ORD-12345 and searching by cosine similarity will return something — but it may be wrong. An identifier has a correct answer. An approximate match on an identifier is a bug.

Vector search returns the most similar thing in your dataset, even when nothing is actually relevant. It doesn’t know when no good answer exists. That’s fine for related documents. It’s a serious problem for exact record lookup, where a confident wrong answer is worse than an empty result.

The same applies in the other direction: don’t use FTS for queries where the user is describing a concept. “articles about making hard decisions under uncertainty” contains no reliable keywords. FTS will either return noise or nothing. Use the right tool for the query shape.


The Full Picture

Most production search systems need more than one layer:

These are not competing tools. They’re complementary. A well-built search system picks the right layer for each query shape — and when query shapes overlap, it runs multiple layers and fuses the results.

The teams that ship good search features understand the whole stack. The ones that don’t reach for a vector database, embed everything, and wonder why exact lookups sometimes return the wrong record.