Dan Levy's Avatar DanLevy.net

2025's Wave of Database Innovation

You can thank AI.

Not another Vector DB article

The database landscape has fundamentally shifted, driven by AI applications requiring vector-based RAG (Retrieval-Augmented Generation) and multi-modal search. While Postgres/PGVector and OpenSearch/Elasticsearch remain enterprise standards, they come with significant operational overhead. Only critical business features justify this complexity.

What if you could get enterprise-grade search features with zero servers and minimal engineering resources? A new class of databases built for object-based storage over HTTP (AWS S3, Cloudflare R2, Backblaze B2) makes this possible!

A database by any other name

These serverless and CDN-capable datastores are transforming search for mid-scale use cases (1,000-1,000,000 records, or several GBs) that previously couldn’t justify traditional database infrastructure:

Most of these tools leverage simple S3 API calls and existing IAM policies, eliminating traditional database complexity.

Battle of the Checkboxes

FeaturePagefindOramaChromaLanceDBDuckDB-WASM
Full-Text Search✅ Advanced stemming✅ BM25, 30 languages✅ SQLite FTS✅ Tantivy✅ Full SQL
Vector Search✅ Cosine similarity✅ HNSW✅ IVF_PQ, HNSW, GPU⚠️ Extensions
AI/RAG IntegrationsNone✅ Built-in pipeline✅ LangChain, LlamaIndex✅ Advanced reranking⚠️ Manual setup
StorageStatic JSON/WASMMemory + S3 pluginsServer-based*S3-compatible LanceWASM + S3/HTTP
Write SupportBuild-time onlyFull CRUDFull CRUDFull CRUDFull SQL CRUD
PerformanceSub-100ms0.0001ms - 100msSub-100ms3-5ms vector, 50ms FTS10ms-1s (complex SQL)

*Note: Chroma requires a server runtime and doesn’t support direct S3 object storage (issue #1736)

Implementation examples

The syntax differences reveal each tool’s philosophy and target use case:

Static site search with Pagefind

<link href="/pagefind/pagefind-ui.css" rel="stylesheet">
<script src="/pagefind/pagefind-ui.js"></script>
<div id="search"></div>
<script>new PagefindUI({ element: "#search" });</script>

Enterprise-grade multimodal with LanceDB

Code to create a LanceDB table with automatic OpenAI embeddings:

import * as lancedb from "@lancedb/lancedb";
import "@lancedb/lancedb/embedding/openai";
import { LanceSchema, getRegistry } from "@lancedb/lancedb/embedding";
import { Utf8 } from "apache-arrow";
const db = await lancedb.connect("data/multimodal-db");
const func = getRegistry()
.get("openai")
?.create({ model: "text-embedding-ada-002" });
// Schema with automatic embedding generation
const documentsSchema = LanceSchema({
text: func.sourceField(new Utf8()),
vector: func.vectorField(),
category: new Utf8()
});
const table = await db.createEmptyTable("documents", documentsSchema);
await table.add([
{ text: "machine learning concepts", category: "research" },
{ text: "deep learning fundamentals", category: "research" }
]);

Example of querying a LanceDB table:

import * as lancedb from "@lancedb/lancedb";
import "@lancedb/lancedb/embedding/openai";
// "Connect" to a URL path
const db = await lancedb.connect("data/multimodal-db");
const table = db.getTable("documents");
// SQL + vector search combination
const results = await table.search("machine learning concepts")
.where("category = 'research'")
.limit(10)
.toArray();
console.log(results);

Universal search with Orama

import { create, insert, search } from '@orama/orama'
const db = create({
schema: {
title: 'string',
content: 'string',
embedding: 'vector[1536]'
}
})
await insert(db, {
title: 'Getting Started',
content: 'Learn the basics',
embedding: await generateEmbedding('Learn the basics')
})
const results = await search(db, {
term: 'basics',
mode: 'hybrid' // Combines text + vector search
})

DuckDB-WASM:

import * as duckdb from "https://cdn.jsdelivr.net/npm/@duckdb/duckdb-wasm@latest/dist/duckdb-browser.mjs";
const bundle = await duckdb.selectBundle(duckdb.getJsDelivrBundles());
const worker = new Worker(bundle.mainWorker);
const db = new duckdb.AsyncDuckDB(new duckdb.ConsoleLogger(), worker);
await db.instantiate(bundle.mainModule, bundle.pthreadWorker);
const conn = await db.connect();
await conn.query(`create table t as select * from (values (1,'hybrid search'),(2,'edge sql')) as v(id,txt);`);
// Optional full-text:
await conn.query(`install fts; load fts; select * from t where match_bm25(txt, 'hybrid');`);

AI-native search with Chroma

import { ChromaClient } from "chromadb";
const client = new ChromaClient();
const collection = await client.createCollection({ name: "knowledge-base" });
await collection.add({
documents: ["AI will transform software development"],
metadatas: [{ source: "tech-blog", category: "AI" }],
ids: ["doc1"]
});
const results = await collection.query({
queryTexts: ["future of programming"],
where: { category: "AI" },
nResults: 5
});

Use Cases Guide

Choose Pagefind when:

Choose Orama when:

Choose Chroma when:

Choose LanceDB when:

Choose DuckDB-WASM when:

The bigger picture

These tools democratize capabilities previously exclusive to tech giants. In 2020, implementing semantic search required ML teams, GPU clusters, and months of infrastructure work. Today, it’s a few lines of code and a $5/month hosting bill.

Small teams can now ship search experiences rivaling hundred-person engineering teams. The moat around “sophisticated search” is evaporating.

Even AWS recognizes this trend—their upcoming S3 Vector feature will enable native vector search directly within S3, further validating the object storage approach to modern databases.

That “nice-to-have” search feature you’ve been postponing? It’s probably easier to implement than your login system now. User expectations have shifted—every search box is compared to Google or Notion. These tools make those expectations achievable without enterprise budgets.

Start experimenting

  1. This weekend: Pick a tool matching your current project and build a prototype
  2. Think beyond text: Consider how semantic search might unlock new user experiences
  3. Start simple: Begin with static approaches, but architect for future migration to dynamic solutions

Check out my practical Pagefind guide for hands-on implementation, or explore the growing ecosystem of edge-native databases reshaping data at scale.

Disclaimer: I’ve used Pagefind for years and became a contributor in 2025. I’ve experimented with Orama and Chroma for smaller projects and am exploring LanceDB for larger AI applications. No financial ties to these projects—just keen interest in the evolving database landscape.

Edit on GitHubGitHub