Semantic Vector Search and Other Topics to Win Friends and Lovers

The full search landscape: exact, fuzzy, semantic, hybrid — and when to layer all of them.

Search is not one thing, and semantic search is not a replacement for the rest of it.

“Find user with email dan@example.com” and “find me articles about debugging as a new engineer” are both described as search, but they have almost nothing in common as engineering problems. The first has a correct answer and an O(log n) index lookup. The second has no correct answer — only relevance — and requires understanding language, intent, and meaning.

The engineers who are most persuasive about search decisions — the ones who win the arguments and ship the right system — understand the whole landscape. They know which tool to reach for and why, and they can explain it clearly.

This article covers the semantic layer: what vector search actually does, when it wins, and where it should stay out of the way. The useful version is not “embed everything.” It is knowing when vectors belong beside lexical, fuzzy, and exact-match search in a hybrid architecture.

The lexical and fuzzy half of the picture — tsvector, pg_trgm, pg_search — is in Postgres Text Searching Guide 2026.

Terms at a Glance

Embedding — A dense list of floating-point numbers produced by a model, representing a piece of text (or image, audio, etc.) as a point in high-dimensional space. Semantically related content lands nearby; unrelated content lands far apart.

Lexical search — Search based on exact word and token matching. Fast, deterministic, and correct for known terms. Doesn’t understand synonyms, paraphrases, or cross-language equivalents.

Semantic search — Search based on meaning rather than tokens. A query for “how do I handle timeouts” can match a document titled “configuring retry policies” with no shared words, because their embeddings are geometrically close.

Vector — A list of numbers. In search contexts, the output of an embedding model. “Vector search” finds the vectors closest to a query vector by geometric distance.

FTS (Full-Text Search) — Postgres’s built-in lexical search, powered by tsvector / tsquery. Tokenizes, stems, and indexes text for keyword queries. Strong for prose and exact-term lookup; blind to meaning.

BM25 — A ranking algorithm for lexical search (used by Elasticsearch, Qdrant, and others). Scores results by term frequency weighted against how rare the term is across the corpus. Better than raw keyword matching; still lexical.

HNSW (Hierarchical Navigable Small World) — The standard approximate nearest-neighbor index for vector search. Builds a layered proximity graph for fast, high-recall similarity queries. pgvector, Qdrant, Weaviate, and most others use it.

RRF (Reciprocal Rank Fusion) — An algorithm for merging ranked result lists from multiple retrieval systems. Uses rank position only — no score normalization needed. A result that ranks high in both FTS and vector lists gets a stronger combined score than one that dominates only one.

What Semantic Search Actually Does

Vector embeddings convert text (or images, audio, etc.) into a list of numbers — a point in high-dimensional space. An embedding model is trained so that semantically related text lands nearby in that space. “Dog” and “canine” end up close. “Running a marathon” and “running a Python script” end up far apart despite sharing a word.

Similarity search in that space finds documents whose meaning is closest to the query’s meaning, regardless of exact word overlap.

This means:

“How do I configure request timeouts?” can match an article titled “Setting connection limits and retry policies” — no overlapping keywords, high conceptual relevance
“Something light for a summer evening” can match a wine recommendation without any keywords appearing in the product description
A query in English can match relevant documents in French, Spanish, or Japanese if the embedding model was trained multilingually

Lexical search (tsvector, pg_trgm) can’t do any of this. It operates on words and characters, not meaning. The tools are not interchangeable — they solve different problems.

When pgvector Wins

Building RAG. Retrieval-Augmented Generation retrieves the document chunks whose meaning is closest to the user’s question, then passes them to a language model as context. This retrieval step is a vector operation. FTS will miss paraphrases, synonyms, and conceptual matches that a relevant chunk might express differently. The pgvector advantage over a standalone vector store: it runs inside your existing Postgres instance — no separate service to deploy, operate, or sync data into.

Users describe what they want, not what to search for. “Articles about building confidence as a new manager” has no keywords that reliably appear in the relevant posts. “A lightweight framework for handling side effects” may not use those exact words in the documentation. Vector search matches the intent, not the spelling.

Finding similar items. Related products, similar support tickets, duplicate bug reports, articles you might also like. “Find issues similar to this one” is a nearest-neighbor search — embed the item, find its geometric neighbors. One important caveat: vector search always returns results, even when nothing is genuinely similar. For dedup and recommendation use cases, filter by a minimum similarity threshold (e.g., cosine similarity ≥ 0.80) to avoid surfacing low-confidence matches as if they were meaningful.

Semantic deduplication. Before indexing content for RAG or search, you often need to identify near-duplicates in the corpus — articles revised multiple times, support tickets filed twice, knowledge base entries that overlap significantly. Embed the documents and threshold-filter by cosine similarity to flag or merge near-duplicates before they pollute your index. This prevents retrieval from returning multiple near-identical chunks and diluting the context window.

Multilingual search. Multilingual embedding models map semantically equivalent content across languages into nearby vectors. A query in Spanish for “perder peso” can match an English article on “sustainable weight loss habits” — no shared tokens, same underlying meaning. FTS requires per-language dictionary configuration and handles cross-language queries poorly. pg_trgm is language-agnostic but orthographic, not semantic.

Setting Up pgvector

From extension install to similarity query, the setup is a handful of SQL statements:

CREATE EXTENSION IF NOT EXISTS vector;

ALTER TABLE documents ADD COLUMN embedding vector(1536);

-- HNSW is usually the first index to try for moderate-size datasets
CREATE INDEX documents_embedding_idx
  ON documents USING hnsw (embedding vector_cosine_ops);

-- Semantic search query
SELECT id, title, 1 - (embedding <=> $1::vector) AS similarity
FROM documents
ORDER BY embedding <=> $1::vector
LIMIT 10;

<=> is cosine distance. 1 - cosine_distance gives cosine similarity (1.0 = identical, 0.0 = orthogonal). For ivfflat (the older, faster-to-build alternative), use lists = sqrt(row_count) as a starting point.

What pgvector Doesn’t Handle Well

Exact token matching — product SKUs, error codes, function names. ORD-12345 is not semantically similar to anything. An embedding-based search may return ORD-12344 or nothing relevant. Use FTS or a B-tree index.
Names and proper nouns. Embedding space organizes by meaning, not spelling. “Micheal Jordan” the user record doesn’t necessarily land near “Michael Jordan” in vector space.
Short strings where character-level similarity matters more than meaning. pg_trgm handles this.
Queries where the exact term must appear. BM25 and FTS are more reliable for known-term matching.

Hybrid Search: The Case for Both

Technical documentation is the clearest example where neither tool is enough alone.

Users searching for “how to configure timeouts” need conceptual matching: an article titled “Setting retry policies and connection limits” has no overlapping keywords but is exactly what they need.

The same users also search for withRetry(), ECONNRESET, and ERR_SOCKET_TIMEOUT. These exact strings must appear — semantic matching may not find them reliably, and a false positive (conceptually similar but not the right API) is actively misleading.

Vector search handles the conceptual queries. FTS handles the exact terms. Neither handles both well alone.

The solution is hybrid search: run both and fuse the results.

Reciprocal Rank Fusion

Reciprocal Rank Fusion (RRF) is the standard algorithm for combining ranked lists from different retrieval systems. It doesn’t require normalizing scores across systems — it only uses rank positions. A result that appears high in both lists gets a stronger combined score than one that dominates only one.

WITH fts_results AS (
  SELECT id,
    ROW_NUMBER() OVER (ORDER BY ts_rank(search_vector, query) DESC) AS rank
  FROM documents, to_tsquery('english', $1) query
  WHERE search_vector @@ query
  LIMIT 50
),
vector_results AS (
  SELECT id,
    ROW_NUMBER() OVER (ORDER BY embedding <=> $2::vector) AS rank
  FROM documents
  ORDER BY embedding <=> $2::vector
  LIMIT 50
),
rrf AS (
  SELECT
    COALESCE(f.id, v.id) AS id,
    COALESCE(1.0 / (60 + f.rank), 0) +
    COALESCE(1.0 / (60 + v.rank), 0) AS rrf_score
  FROM fts_results f
  FULL OUTER JOIN vector_results v ON f.id = v.id
)
SELECT d.id, d.title, rrf.rrf_score
FROM rrf
JOIN documents d ON d.id = rrf.id
ORDER BY rrf_score DESC
LIMIT 10;

The 60 in the denominator is the RRF constant. Higher values dampen rank-position differences; lower values amplify them. The default of 60 works well across most content types.

RRF avoids the harder problem of normalizing ts_rank (a log-frequency score) against cosine distance (a geometric measure). They’re not comparable. RRF only asks: “how high did this result appear in each list?”

Hybrid Search with Trigrams Too

For user-facing search over mixed content — where users might search for a person name, a concept, or an exact term in the same session — three-way fusion handles all of them:

WITH trgm_results AS (
  SELECT id,
    ROW_NUMBER() OVER (ORDER BY similarity(title, $1) DESC) AS rank
  FROM documents
  WHERE title % $1
  LIMIT 50
),
fts_results AS (
  SELECT id,
    ROW_NUMBER() OVER (ORDER BY ts_rank(search_vector, to_tsquery('english', $1)) DESC) AS rank
  FROM documents
  WHERE search_vector @@ to_tsquery('english', $1)
  LIMIT 50
),
vector_results AS (
  SELECT id,
    ROW_NUMBER() OVER (ORDER BY embedding <=> $2::vector) AS rank
  FROM documents
  ORDER BY embedding <=> $2::vector
  LIMIT 50
),
rrf AS (
  SELECT
    COALESCE(t.id, f.id, v.id) AS id,
    COALESCE(1.0 / (60 + t.rank), 0) +
    COALESCE(1.0 / (60 + f.rank), 0) +
    COALESCE(1.0 / (60 + v.rank), 0) AS rrf_score
  FROM trgm_results t
  FULL OUTER JOIN fts_results f ON t.id = f.id
  FULL OUTER JOIN vector_results v ON COALESCE(t.id, f.id) = v.id
)
SELECT d.id, d.title, rrf.rrf_score
FROM rrf
JOIN documents d ON d.id = rrf.id
ORDER BY rrf_score DESC
LIMIT 10;

This handles: fuzzy name matches (trigrams), exact keyword matches (FTS), and conceptual queries (vector). A single search box can serve all three user intents.

Multi-Layer Hybrid Architectures

Real applications rarely have a single search surface. They have multiple, each with a different need:

Surface	What users query	Recommended layers
Blog / documentation search	Keywords + concepts	FTS + pgvector (RRF)
User/customer name lookup	Names with typos	`pg_trgm`
Product search	Names, descriptions, “similar to”	`pg_trgm` + FTS + pgvector
Support ticket dedup	”Issues similar to this one”	pgvector only
Internal SKU/order search	Exact identifiers	B-tree index
RAG over large knowledge base	Natural language questions	pgvector (chunked docs)
E-commerce “you might also like”	Behavioral + semantic similarity	pgvector
Autocomplete	Prefix, spelling-tolerant	`pg_trgm`

These aren’t hypothetical. Most content-heavy applications need at least two distinct search surfaces with different query shapes. The temptation is to pick one approach and use it everywhere — usually vector search now, since it’s the fashionable choice. That leads to expensive embeddings for problems where a trigram index would have been faster, cheaper, and more correct.

The Rule of Thumb

Add a layer when a failure mode appears that the current layer can’t fix:

Users complain about typos not matching → add pg_trgm
Users search by concept and miss relevant results → add pgvector
Users search for exact symbols or codes and get conceptual results instead → add FTS or check if you’re over-relying on vector search
Latency becomes a problem → evaluate pre-filtering, approximate indexes, or a dedicated store

If You Do Need a Dedicated Vector Store

pgvector handles a lot of application search before you need another database. The rough cutoff depends on vector count, index settings, write rate, filters, hardware, and concurrency, so treat any “under 10M vectors” rule as a starting assumption to benchmark, not a product limit. When you genuinely outgrow it — very high concurrency, very low p99 latency requirements, billions of vectors, or serious multi-tenant isolation needs — the dedicated vector database landscape is wide and worth understanding.

What the Matrix Columns Actually Mean

Hybrid search means BM25 keyword search and vector similarity run in one query, merged via RRF. Without it, you either pick one search mode or fuse two queries yourself.

Sparse vectors go further than BM25. A SPLADE sparse vector has ~30,000 dimensions (one per vocabulary term), ~98% zeros. Non-zero positions tell you which terms matter and how much. A query for “dogs” also weights “canine” and “pet” — BM25-level precision plus term expansion inside a vector index. If this column is false, you need a separate FTS layer for exact-term queries.

# SPLADE: ~30,000 dims, ~60 non-zero — only relevant vocabulary positions fire
def encode_splade(text: str) -> dict:
    tokens = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
    with torch.no_grad():
        output = model(**tokens)
    vec = torch.log1p(torch.relu(output.logits)).max(dim=1).values.squeeze()
    return {"indices": vec.nonzero().squeeze().tolist(), "values": vec[vec != 0].tolist()}

SQL / SQL-like is really about filtering. Vector search without filtering is a demo. You still need tenant scope, date ranges, permissions, and category filters. Full SQL (pgvector, LanceDB) expresses this beside your existing joins. Purpose-built databases use JSON filter objects (Qdrant, Pinecone), a query DSL (Elasticsearch, Milvus), or GraphQL (Weaviate). They work; SQL becomes more attractive as filter logic gets complex.

-- pgvector: vector similarity is just another expression
SELECT id, title, 1 - (embedding <=> $1) AS score
FROM documents
WHERE tenant_id = $2
  AND category = ANY($3::text[])
  AND created_at > NOW() - INTERVAL '90 days'
ORDER BY embedding <=> $1
LIMIT 10;

# Qdrant: equivalent filter as a Python object — same result, more ceremony
results = client.query_points(
    collection_name="documents", query=query_embedding,
    query_filter=models.Filter(must=[
        models.FieldCondition(key="tenant_id", match=models.MatchValue(value=tenant_id)),
        models.FieldCondition(key="category",  match=models.MatchAny(any=categories)),
        models.FieldCondition(key="created_at", range=models.DatetimeRange(gte=cutoff)),
    ]),
    limit=10,
)

Multimodal native means the database ships embedding models for non-text content. You hand it a raw image URL; it handles vectorization. Most databases are embedding-agnostic — you own the embedding pipeline. Marqo and Weaviate (via CLIP/ImageBind modules) close this loop.

# Marqo: POST raw images, query with text — no external embedding step
mq.index("products").add_documents(
    [{"id": "shoe-001", "image": "https://cdn.example.com/shoes/001.jpg"}],
    tensor_fields=["image"]
)
results = mq.index("products").search(q="lightweight shoes for summer")
# Returns shoe-001 despite zero keyword overlap — CLIP handles the cross-modal match

Disk-based index is a cost lever. RAM-resident HNSW indexes can require several GB of RAM per million 1536-dimension vectors once raw vectors, graph overhead, and metadata are counted. Disk-native alternatives (Milvus DiskANN, Elasticsearch DiskBBQ, LanceDB’s Lance format, Turbopuffer’s object storage tier) often trade some query latency for lower infrastructure cost. For RAG workloads where model latency already dominates, that tradeoff is frequently worth benchmarking.

Max dimensions is a migration hiding in your architecture. text-embedding-3-large uses 3072 dims, Jina v3 can emit larger embeddings, and research models keep pushing higher. Some managed services publish hard dimension caps; others document high caps or no practical cap for typical embedding models. Check current docs before committing. Pick something with headroom; migrating a vector index because you hit a dimension ceiling is a painful sprint.

The Landscape

Database	Deployment	License	Hybrid Search	Sparse Vectors	SQL / SQL-like	Multimodal	Disk Index	Max Dims	Sweet Spot
pgvector	Self-host / managed (Supabase, Neon, RDS)	OSS (PostgreSQL)	Manual (RRF via SQL)	❌	✅ Full SQL	❌	✅ HNSW on disk	16,000 storage; 2,000 indexed `vector`	Already on Postgres; moderate vector counts
Qdrant	Self-host / Cloud	Apache 2.0	✅ Native BM25	✅ Mature support	❌ (REST/gRPC)	❌	✅	65,535	Filtered queries at scale; complex metadata
Weaviate	Self-host / Cloud	BSD 3	✅ Native BM25 + RRF	✅	❌ (GraphQL / gRPC)	✅ via modules	✅	65,535	GraphQL access patterns; built-in vectorization
Pinecone	Cloud only	Proprietary	✅ (added 2024)	✅	❌	❌	✅ (serverless)	20,000	Managed simplicity; no ops team
Milvus / Zilliz	Self-host / Cloud (Zilliz)	Apache 2.0	✅ Native	✅	✅ SQL-like (Milvus Query Language)	✅	✅ DiskANN	32,768	Billion-scale; enterprise on-prem
Chroma	Embedded / self-host	Apache 2.0	❌	❌	❌	❌	❌	65,535	Local dev and prototyping only
LanceDB	Embedded / Cloud	Apache 2.0	✅	❌	✅ SQL via DataFusion	✅ Native	✅ (Lance format)	Unlimited	Edge / serverless; multimodal lakehouse
Orama	Embedded / Cloud	Apache 2.0	✅ Full-text + vector	❌	❌	❌	❌	Varies	JS/edge apps; lightweight site/app search
Turbopuffer	Cloud only (serverless)	Proprietary	✅ BM25 + vector	❌	❌	❌	✅ (object storage)	16,000	Multi-tenant SaaS; millions of namespaces
Elasticsearch	Self-host / Elastic Cloud	SSPL / AGPLv3	✅ RRF + ELSER sparse	✅ (ELSER)	✅ Query DSL	❌	✅ DiskBBQ	4,096	Already on Elastic stack; hybrid enterprise search
OpenSearch	Self-host / AWS managed	Apache 2.0	✅ RRF + Neural Search	✅	✅ Query DSL	❌	✅ FAISS + HNSW	16,000	AWS-native; open-source Elastic alternative
Vespa	Self-host / Cloud	Apache 2.0	✅ Native	✅ Tensors / lexical ranking	✅ YQL	✅ Tensors	✅	Effectively unbounded	Search + ranking + recommendation systems
ClickHouse	Self-host / Cloud	Apache 2.0	Manual	❌	✅ Full SQL	❌	✅ Columnar + HNSW	Varies	Analytics/logs with vector search beside OLAP
MongoDB Atlas	Cloud / self-host	SSPL	✅ Built-in	❌	✅ MQL + aggregation	❌	✅ HNSW	8,192	Already on MongoDB; document + vector in one
Redis (VSS)	Self-host / Redis Cloud	RSALv2 / SSPL	✅ (RediSearch)	✅	❌	❌	❌ RAM-only	32,768	Ultra-low latency; cache-layer vector search
Marqo	Cloud / self-host	Apache 2.0	✅	❌	❌	✅ Native focus	✅	Varies	End-to-end multimodal: image + text + video

A Few Things That Don’t Fit in the Table

Turbopuffer’s multi-tenancy is built around very high namespace counts. Its public positioning and customer stories emphasize workloads like Notion’s large, namespace-heavy corpus. If each user or organization needs isolated vector search, that architecture can change the economics, but still benchmark your own tenant shape.

LanceDB embedded mode is the closest thing to “SQLite for vector search.” It runs in-process, requires no server, and works in Lambda, Cloudflare Workers, and edge environments. The Lance columnar format makes embedded operation practical at real scale.

Chroma is strongest at dev/test and small app deployments. If you are aiming at very large corpora, HA, disk-heavy operation, or first-class hybrid search, evaluate a production-oriented store before promoting the prototype into infrastructure.

Vespa is what you reach for when retrieval is only half the product. It combines lexical retrieval, nearest-neighbor search, tensors, ranking expressions, grouping, and online serving. That power is real, but so is the operational and modeling complexity. It fits search/recommendation teams more than “add semantic search to my CRUD app.”

ClickHouse belongs in the conversation when search is attached to analytics. If your source of truth is events, logs, traces, or metrics, ClickHouse keeps vector distance, filtering, aggregation, and serious full-text indexing in one SQL engine. Not a purpose-built vector database, but often the boring-right answer for analytical retrieval.

Sparse vectors are how you get BM25-quality keyword matching inside a vector index — without running a separate full-text engine. Qdrant and Elasticsearch have especially mature implementations here. If hybrid search is critical and a two-system architecture is a deal-breaker, sparse vector support is what to look for.

Choosing When You’ve Outgrown pgvector

SaaS product with per-tenant isolation → Turbopuffer
Complex metadata filtering at scale → Qdrant
Already on Elastic/ELK stack → Elasticsearch with DiskBBQ
AWS shop that wants open-source → OpenSearch
Search/recommendation platform with serious ranking needs → Vespa
Analytics, observability, log/event search → ClickHouse
Billion-scale on-prem / self-hosted → Milvus
Edge / serverless / multimodal → LanceDB
Small JS app, docs site, or edge-native search UX → Orama
Zero ops, cost is secondary → Pinecone
Multimodal-first (images, video, audio) → Marqo
Already on MongoDB → Atlas Vector Search
Already on Postgres, need more headroom → Supabase Vector or Neon (both pgvector managed, with better tooling)

The One Thing to Not Do

Don’t use vector search as fuzzy text search for things that have correct answers.

“Find me the user with email dan@example.com” is not a vector search problem. “Find the order with ID ORD-12345” is not either. Embedding ORD-12345 and searching by cosine similarity will return something — but it may be wrong. An identifier has a correct answer. An approximate match on an identifier is a bug.

Vector search returns the most similar thing in your dataset, even when nothing is actually relevant. It doesn’t know when no good answer exists. That’s fine for related documents. It’s a serious problem for exact record lookup, where a confident wrong answer is worse than an empty result.

The same applies in the other direction: don’t use FTS for queries where the user is describing a concept. “articles about making hard decisions under uncertainty” contains no reliable keywords. FTS will either return noise or nothing. Use the right tool for the query shape.

The Full Picture

Most production search systems need more than one layer:

pg_trgm for names, typos, autocomplete
FTS / pg_search for keyword-based prose search
pgvector for semantic and conceptual queries
RRF fusion for surfaces where users mix query types
Regular indexes for exact identifiers, filters, and sorted lists

These are not competing tools. They’re complementary. A well-built search system picks the right layer for each query shape — and when query shapes overlap, it runs multiple layers and fuses the results.

The teams that ship good search features understand the whole stack. The ones that don’t reach for a vector database, embed everything, and wonder why exact lookups sometimes return the wrong record.