DanLevy.net

Postgres Text Searching Guide 2026

The search tools already in your database, and when each one earns its keep.

Most teams use one Postgres search tool. Teams that know all three ship better search with less complexity — and avoid the costly detour to a dedicated search service they didn’t need yet.

This guide covers the full set of Postgres-native options: what each one does, when it’s the right fit, and how to layer them.


The Three Tools

Full-text search (tsvector / GIN index) is lexical. It tokenizes text into lexemes, stems them, and matches queries against the index. “Running” and “runs” collapse to the same lexeme. So do “dog” and “dogs.” The ranking function (ts_rank) rewards documents where query terms appear often or prominently.

Trigrams (pg_trgm) break strings into overlapping 3-character slices and measure how many slices two strings share. “Dan” → " da", "dan", "an ". “Micheal” and “Michael” share most of their trigrams, so similarity is high. This makes pg_trgm excellent at fuzzy name matching, typo tolerance, and autocomplete — the space where FTS performs poorly.

Exact-match indexes (B-tree, hash) handle primary keys, email addresses, IDs, SKUs, and anything where the answer is binary: it matches or it doesn’t. These don’t feel like “search,” but they belong in this conversation because the worst pattern is using fuzzy or semantic search for problems that have correct answers.

The choice isn’t about sophistication. It’s about matching the tool to the shape of the query.

Postgres search tool mapA comparison of pg_trgm, full-text search, pgvector, and hybrid search by input shape and query intent.Pick the search primitive by input shapeThe same Postgres table can support all four. The trick is matching the query to the text.Exact words matterMeaning mattersShort / structured textLong prose / chunksfuzzypg_trgmNames, addresses, titles, typos,autocomplete, partial strings.Orthographic similarity: spelling distance.similarpgvectorRelated items, duplicate tickets,recommendations from short descriptions.Embedding similarity: meaning distance.lexicalFull-text searchArticles, docs, logs, support contentwhere query words should appear.Lexemes, stemming, ranking, boolean filters.hybridFTS + pgvectorTechnical docs and RAG where users askconceptual questions plus exact symbols.Run both, fuse ranks with RRF.Start with query intent, then check text shape
The four Postgres search primitives mapped by query intent (exact vs. semantic) and text shape (structured vs. prose). The same table can carry all four indexes — the choice is per query, not per table.

When Full-Text Search Wins

Searching prose for keywords. Blog posts, documentation, product descriptions, support tickets, legal documents. FTS was designed for this shape of content: indexed, ranked retrieval over natural-language text.

Keyword-based user queries. Users type a search term, filter by tag, or browse by keyword. FTS handles that intent natively without any embedding infrastructure.

Ranked results without external dependencies. FTS indexes are fast, deterministic, and require no API calls. The relevance signal comes from term frequency weighted by field position.

Boolean filtering alongside search. FTS composes naturally with your existing query logic:

SELECT * FROM posts
WHERE search_vector @@ to_tsquery('english', 'postgres & performance')
AND category = 'tutorial'
AND published_at > NOW() - INTERVAL '6 months';

Setting Up FTS

-- Generated column keeps the index current automatically
ALTER TABLE posts ADD COLUMN search_vector tsvector
GENERATED ALWAYS AS (
setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
setweight(to_tsvector('english', coalesce(body, '')), 'B')
) STORED;
CREATE INDEX posts_search_idx ON posts USING GIN (search_vector);
-- Query
SELECT title, ts_rank(search_vector, query) AS rank
FROM posts, to_tsquery('english', 'postgres & performance') query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 10;

setweight assigns importance: A (title) outranks B (body). That’s the entire relevance model for most content-search use cases.

What FTS Doesn’t Handle Well


When Trigrams Win (pg_trgm)

pg_trgm covers the awkward middle that FTS consistently fumbles.

FTS tokenizes text into lexemes and stems them. For prose this is correct. For names and short identifiers it often isn’t:

pg_trgm is also language-agnostic, which matters for names from diverse linguistic backgrounds. FTS requires dictionary configuration per language.

CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX users_name_trgm_idx ON users USING GIN (name gin_trgm_ops);
-- Finds "Micheal Jordan" when searching "Michael Jordan"
SELECT id, name, similarity(name, $1) AS score
FROM users
WHERE name % $1 -- % operator = similarity threshold (default 0.3)
ORDER BY score DESC
LIMIT 10;

The % operator uses pg_trgm.similarity_threshold (default 0.3, range 0–1). For name search, 0.3–0.4 catches typos while keeping noise low.

-- Prefix matching for autocomplete. A trigram GIN index can help,
-- but a B-tree pattern index may be better for pure left-anchored prefixes.
SELECT name FROM users
WHERE name ILIKE $1 || '%'
ORDER BY name
LIMIT 10;
-- word_similarity for partial matches within longer strings
-- ("Johnson" within "Andrew Johnson III")
SELECT id, name, word_similarity($1, name) AS score
FROM users
WHERE $1 <% name
ORDER BY score DESC
LIMIT 10;

The trigram GIN index is especially useful for ILIKE '%pattern%' contains queries and typo-tolerant matching — patterns that are usually full-table scans without a trigram index.

When to Reach for pg_trgm over FTS

ScenarioUse
Person/company name search with typospg_trgm
Autocomplete / prefix searchpg_trgm (or FTS with prefix queries)
Short strings, identifiers, codespg_trgm
Prose articles, documentation, ticketsFTS
Log messages for keywordsFTS
Multilingual name searchpg_trgm (language-agnostic)

When Exact-Match SQL Wins

Some “search” problems aren’t search at all.

“Find the user with email dan@example.com” is an equality check. “Find order ORD-12345” is a primary key lookup. “List posts in the tutorial category sorted by date” is a filtered query. These belong on B-tree or hash indexes.

Using FTS or trigrams here adds complexity without improving correctness — and for exact identifiers, a near-match is worse than no match.

CREATE INDEX users_email_idx ON users (email);
-- Exact lookup: fast and unambiguous
SELECT id, name FROM users WHERE email = $1;

The broader lesson: approximate search for problems with correct answers is a category error. It returns something — which may be confidently wrong.


Combining These Tools

These tools compose cleanly. You don’t pick exactly one.

FTS + pg_trgm for a search box that tolerates typos in keywords:

-- Trigram similarity on title catches typos; ts_rank handles body relevance
SELECT id, title,
ts_rank(search_vector, to_tsquery('simple', $1)) AS fts_rank,
similarity(title, $1) AS trgm_score
FROM posts
WHERE search_vector @@ to_tsquery('simple', $1)
OR title % $1
ORDER BY (ts_rank(search_vector, to_tsquery('simple', $1)) + similarity(title, $1)) DESC
LIMIT 10;

FTS + unaccent for international content:

-- Strip diacritical marks so "José" matches "Jose"
CREATE EXTENSION IF NOT EXISTS unaccent;
CREATE TEXT SEARCH CONFIGURATION public.simple_unaccent (COPY = pg_catalog.simple);
ALTER TEXT SEARCH CONFIGURATION public.simple_unaccent
ALTER MAPPING FOR hword, hword_part, word
WITH unaccent, simple;
ALTER TABLE posts ADD COLUMN search_vector tsvector;
CREATE TRIGGER posts_search_vector_refresh
BEFORE INSERT OR UPDATE OF title, body ON posts
FOR EACH ROW EXECUTE FUNCTION
tsvector_update_trigger(search_vector, 'public.simple_unaccent', title, body);

unaccent + pg_trgm for international name search:

ALTER TABLE users ADD COLUMN name_search text;
CREATE FUNCTION users_name_search_refresh()
RETURNS trigger LANGUAGE plpgsql AS $$
BEGIN
NEW.name_search := unaccent(coalesce(NEW.name, ''));
RETURN NEW;
END;
$$;
CREATE TRIGGER users_name_search_refresh
BEFORE INSERT OR UPDATE OF name ON users
FOR EACH ROW EXECUTE FUNCTION users_name_search_refresh();
CREATE INDEX users_name_search_trgm_idx
ON users USING GIN (name_search gin_trgm_ops);
SELECT id, name
FROM users
WHERE name_search % unaccent($1)
ORDER BY similarity(name_search, unaccent($1)) DESC
LIMIT 10;

The trigger examples avoid using unaccent() inside generated-column or index expressions, where PostgreSQL’s immutability rules matter. If you wrap unaccent() in your own immutable function, document that you are accepting upgrade/configuration risk.


Noteworthy Extensions

pg_trgm is bundled with most Postgres distributions but requires explicit enablement. The foundation for fuzzy string matching in Postgres.

unaccent strips diacritical marks before indexing and querying. Pairs well with both pg_trgm and FTS for European-language content. Bundled with Postgres.

pg_bigm extends the trigram approach to bigrams (2-character slices), which significantly improves results for CJK (Chinese, Japanese, Korean) languages where pg_trgm underperforms. Must be installed separately; not bundled.

pg_search (from ParadeDB) replaces the standard GIN / tsvector stack with a Tantivy-based BM25 index. This gives you BM25 scoring (often better than ts_rank), fuzzy matching within FTS queries, faceted search, and dramatically faster indexing on large tables. It’s a drop-in upgrade path when standard FTS starts showing ranking or performance limits.

-- pg_search: BM25 full-text search with fuzzy matching
CREATE INDEX posts_bm25_idx ON posts
USING bm25 (id, title, body)
WITH (key_field = 'id', text_fields = '{"title": {}, "body": {}}');
-- Query with BM25 scoring + fuzzy matching (catches "javascipt")
SELECT id, title, paradedb.score(id) AS rank
FROM posts
WHERE posts @@@ paradedb.fuzzy_phrase(field => 'title', value => 'postgres performnce')
ORDER BY rank DESC
LIMIT 10;

pgvector adds dense vector storage and similarity search. It’s the right tool when users describe what they want rather than name it — semantic search, RAG, related-content recommendations, multilingual queries. Covered in depth in Semantic Vector Search and Hybrid Strategies.


Decision Table

What you’re searchingRecommended
Prose articles, docs, ticketsFTS
Person/company names with typospg_trgm
Autocomplete, prefix searchpg_trgm
Short codes, identifierspg_trgm
Log messages for keywordsFTS
International namespg_trgm + unaccent
Large content, better rankingpg_search (ParadeDB BM25)
Primary keys, exact emails, IDsB-tree index
Dates, ranges, sorted listsB-tree index
Permissions, categories, filtersRegular WHERE clause
Questions, paraphrases, conceptspgvector (see next article)

When in doubt: short strings with spelling variation → trigrams. Long prose for keyword queries → FTS. Structured identifiers → regular indexes. Conceptual or natural-language queries → pgvector.


Hybrid Search: Two Signals, One Rank

When a query like "withRetry timeout errors" hits a search box, it carries two kinds of intent: exact symbol names the user knows (withRetry) and a conceptual description (timeout errors). No single primitive covers both. Running FTS and vector search in parallel — then merging their ranked lists with Reciprocal Rank Fusion — does.

RRF scores each result as 1 / (60 + rank) in each list and sums across lists. The constant 60 dampens the advantage of top ranks, so a result that places second in both lists can beat a result that wins one list and misses the other entirely. Crucially, RRF never averages raw scores across methods — FTS rank and cosine distance are different currencies and cannot be combined arithmetically.

Hybrid search with Reciprocal Rank FusionA query fans out to full-text search and vector search, each produces ranks, and Reciprocal Rank Fusion combines them into one result list.Hybrid search is two honest signals, then one merged rankDo not average raw scores. FTS rank and cosine distance are different currencies.User query”withRetrytimeout errors”FTS / BM25Exact symbols and words1. API reference2. Retry guidepgvectorConceptual neighbors1. Network failures2. Retry guideRRF mergeGive each result credit forwhere it ranked in each list.1 / (60 + rank)Final resultsThe top hit is where exact termsand semantic meaning agree.
A query fans out to FTS and pgvector in parallel. Each produces its own ranked list. RRF scores every document by its position in each list and sums the scores — the result surfaces documents that both signals agree on.
-- Hybrid search: FTS + pgvector merged with RRF
WITH fts AS (
SELECT id, ts_rank(search_vector, query) AS score,
ROW_NUMBER() OVER (ORDER BY ts_rank(search_vector, query) DESC) AS rank
FROM docs, to_tsquery('english', 'withRetry & timeout') query
WHERE search_vector @@ query
LIMIT 60
),
vec AS (
SELECT id,
ROW_NUMBER() OVER (ORDER BY embedding <=> $embedding) AS rank
FROM docs
ORDER BY embedding <=> $embedding
LIMIT 60
)
SELECT COALESCE(fts.id, vec.id) AS id,
(COALESCE(1.0 / (60 + fts.rank), 0) +
COALESCE(1.0 / (60 + vec.rank), 0)) AS rrf_score
FROM fts FULL JOIN vec ON fts.id = vec.id
ORDER BY rrf_score DESC
LIMIT 10;

The 60-document candidate pool per branch (LIMIT 60) is a common starting point. Widen it if recall is low; narrow it for speed.


What’s Next

Postgres text search covers a lot of ground, but it has a ceiling. When users describe what they want instead of naming it — “something to help me sleep on a flight,” “articles about debugging confidence as a new engineer” — lexical and trigram search both fail.

That’s the territory of vector embeddings, semantic search, and hybrid architectures. Covered in Semantic Vector Search and Hybrid Strategies.