2025's Wave of Database Innovation
You can thank AI.

Not another Vector DB article
The database landscape has fundamentally shifted, driven by AI applications requiring vector-based RAG (Retrieval-Augmented Generation) and multi-modal search. While Postgres/PGVector and OpenSearch/Elasticsearch remain enterprise standards, they come with significant operational overhead. Only critical business features justify this complexity.
What if you could get enterprise-grade search features with zero servers and minimal engineering resources? A new class of databases built for object-based storage over HTTP (AWS S3, Cloudflare R2, Backblaze B2) makes this possible!
A database by any other name
These serverless and CDN-capable datastores are transforming search for mid-scale use cases (1,000-1,000,000 records, or several GBs) that previously couldn’t justify traditional database infrastructure:
- Pagefind (2022, ~4.5K ⭐): Pure static approach - compile once, search forever, zero backend requirements
- Orama (2023, ~8K ⭐): Universal solution running everywhere from browsers to serverless functions
- Chroma (2022, ~14K ⭐): AI-native, purpose-built for RAG applications
- LanceDB (2023, ~4K ⭐): Enterprise multimodal capabilities with disk-based architecture
- DuckDB-WASM (2019, ~23K ⭐): Full SQL analytics database running in browsers via WebAssembly
Most of these tools leverage simple S3 API calls and existing IAM policies, eliminating traditional database complexity.
Battle of the Checkboxes
Feature | Pagefind | Orama | Chroma | LanceDB | DuckDB-WASM |
---|---|---|---|---|---|
Full-Text Search | ✅ Advanced stemming | ✅ BM25, 30 languages | ✅ SQLite FTS | ✅ Tantivy | ✅ Full SQL |
Vector Search | ❌ | ✅ Cosine similarity | ✅ HNSW | ✅ IVF_PQ, HNSW, GPU | ⚠️ Extensions |
AI/RAG Integrations | None | ✅ Built-in pipeline | ✅ LangChain, LlamaIndex | ✅ Advanced reranking | ⚠️ Manual setup |
Storage | Static JSON/WASM | Memory + S3 plugins | Server-based* | S3-compatible Lance | WASM + S3/HTTP |
Write Support | Build-time only | Full CRUD | Full CRUD | Full CRUD | Full SQL CRUD |
Performance | Sub-100ms | 0.0001ms - 100ms | Sub-100ms | 3-5ms vector, 50ms FTS | 10ms-1s (complex SQL) |
*Note: Chroma requires a server runtime and doesn’t support direct S3 object storage (issue #1736)
Implementation examples
The syntax differences reveal each tool’s philosophy and target use case:
Static site search with Pagefind
<link href="/pagefind/pagefind-ui.css" rel="stylesheet"><script src="/pagefind/pagefind-ui.js"></script><div id="search"></div><script>new PagefindUI({ element: "#search" });</script>
Enterprise-grade multimodal with LanceDB
Code to create a LanceDB table with automatic OpenAI embeddings:
import * as lancedb from "@lancedb/lancedb";import "@lancedb/lancedb/embedding/openai";import { LanceSchema, getRegistry } from "@lancedb/lancedb/embedding";import { Utf8 } from "apache-arrow";
const db = await lancedb.connect("data/multimodal-db");const func = getRegistry() .get("openai") ?.create({ model: "text-embedding-ada-002" });
// Schema with automatic embedding generationconst documentsSchema = LanceSchema({ text: func.sourceField(new Utf8()), vector: func.vectorField(), category: new Utf8()});
const table = await db.createEmptyTable("documents", documentsSchema);await table.add([ { text: "machine learning concepts", category: "research" }, { text: "deep learning fundamentals", category: "research" }]);
Example of querying a LanceDB table:
import * as lancedb from "@lancedb/lancedb";import "@lancedb/lancedb/embedding/openai";// "Connect" to a URL pathconst db = await lancedb.connect("data/multimodal-db");const table = db.getTable("documents");
// SQL + vector search combinationconst results = await table.search("machine learning concepts") .where("category = 'research'") .limit(10) .toArray();
console.log(results);
Universal search with Orama
import { create, insert, search } from '@orama/orama'
const db = create({ schema: { title: 'string', content: 'string', embedding: 'vector[1536]' }})
await insert(db, { title: 'Getting Started', content: 'Learn the basics', embedding: await generateEmbedding('Learn the basics')})
const results = await search(db, { term: 'basics', mode: 'hybrid' // Combines text + vector search})
DuckDB-WASM:
import * as duckdb from "https://cdn.jsdelivr.net/npm/@duckdb/duckdb-wasm@latest/dist/duckdb-browser.mjs";const bundle = await duckdb.selectBundle(duckdb.getJsDelivrBundles());const worker = new Worker(bundle.mainWorker);const db = new duckdb.AsyncDuckDB(new duckdb.ConsoleLogger(), worker);await db.instantiate(bundle.mainModule, bundle.pthreadWorker);
const conn = await db.connect();await conn.query(`create table t as select * from (values (1,'hybrid search'),(2,'edge sql')) as v(id,txt);`);// Optional full-text:await conn.query(`install fts; load fts; select * from t where match_bm25(txt, 'hybrid');`);
AI-native search with Chroma
import { ChromaClient } from "chromadb";
const client = new ChromaClient();const collection = await client.createCollection({ name: "knowledge-base" });
await collection.add({ documents: ["AI will transform software development"], metadatas: [{ source: "tech-blog", category: "AI" }], ids: ["doc1"]});
const results = await collection.query({ queryTexts: ["future of programming"], where: { category: "AI" }, nResults: 5});
Use Cases Guide
Choose Pagefind when:
- Building documentation, blogs, or knowledge bases
- Content updates weekly or less
- Need zero operational overhead and perfect CDN caching
- Example: Company docs with 10K+ pages updating monthly
Choose Orama when:
- Building dashboards, e-commerce, or dynamic applications
- Need real-time updates and sub-100ms performance
- Want deployment flexibility from browsers to edge functions
- Example: SaaS with dynamic product catalogs
Choose Chroma when:
- Building RAG applications or AI knowledge bases
- Need LangChain/LlamaIndex integrations
- Semantic search is core functionality
- Example: AI customer support bot
Choose LanceDB when:
- Working with multimodal data (images, audio, video)
- Need enterprise performance at massive scale
- Complex analytics and reranking required
- Example: Media platform with semantic video search
Choose DuckDB-WASM when:
- Need full SQL capabilities in browsers or edge functions
- Working with analytical workloads and complex queries
- Want to process CSV/Parquet files directly from S3
- Example: Business intelligence dashboard with ad-hoc SQL queries
The bigger picture
These tools democratize capabilities previously exclusive to tech giants. In 2020, implementing semantic search required ML teams, GPU clusters, and months of infrastructure work. Today, it’s a few lines of code and a $5/month hosting bill.
Small teams can now ship search experiences rivaling hundred-person engineering teams. The moat around “sophisticated search” is evaporating.
Even AWS recognizes this trend—their upcoming S3 Vector feature will enable native vector search directly within S3, further validating the object storage approach to modern databases.
That “nice-to-have” search feature you’ve been postponing? It’s probably easier to implement than your login system now. User expectations have shifted—every search box is compared to Google or Notion. These tools make those expectations achievable without enterprise budgets.
Start experimenting
- This weekend: Pick a tool matching your current project and build a prototype
- Think beyond text: Consider how semantic search might unlock new user experiences
- Start simple: Begin with static approaches, but architect for future migration to dynamic solutions
Check out my practical Pagefind guide for hands-on implementation, or explore the growing ecosystem of edge-native databases reshaping data at scale.
Disclaimer: I’ve used Pagefind for years and became a contributor in 2025. I’ve experimented with Orama and Chroma for smaller projects and am exploring LanceDB for larger AI applications. No financial ties to these projects—just keen interest in the evolving database landscape.