Dan Levy's Avatar DanLevy.net

The JSONB Seduction

How your 'temporary' schema-less field became permanent tech debt.

“We’ll just throw it in a JSONB column for now. We can normalize it later when the schema stabilizes.”

Famous last words. I’ve heard this dozens of times. I’ve said it dozens of times. And you know how many times “later” actually came? About 15% of the time.

The other 85%? That “temporary” JSONB column is still there three years later, now storing 40 different variants of the same data structure, queried by 15 different services with competing assumptions about what fields exist, none of them documented, all of them critical to production.

JSONB is one of PostgreSQL’s most seductive features. It promises flexibility without sacrifice: keep your relational guarantees, but opt out of schema rigidity when you need to. The catch? You almost never need to as much as you think you do—and when you think you need to, you’re often creating future problems to solve present laziness.

For MySQL and SQL Server folks: PostgreSQL’s JSONB has equivalents in other databases—MySQL’s JSON type (since 5.7) and SQL Server’s JSON functions (since 2016). The performance characteristics differ: MySQL’s JSON is stored as binary but with different indexing options; SQL Server stores JSON as text without native indexing (you use computed columns). The tradeoffs discussed here apply broadly, but the specific benchmark numbers are PostgreSQL-centric.

Let’s talk about the JSONB seduction, what it costs you, and how to escape when you’ve gone too far.

The Appeal: Why JSONB Feels So Right

I get it. I really do. You’re building a feature with uncertain requirements. Your product manager keeps changing their mind. Your API is evolving. Your data model isn’t stable yet. JSONB feels like the obvious solution.

-- Instead of this nightmare:
CREATE TABLE user_profiles (
user_id INTEGER PRIMARY KEY,
avatar_url TEXT,
bio TEXT,
location TEXT,
website TEXT,
twitter_handle TEXT,
linkedin_url TEXT,
github_username TEXT,
preferred_language TEXT,
timezone TEXT,
notification_email BOOLEAN,
notification_sms BOOLEAN,
-- ... 47 more columns that keep changing
);
-- You do this:
CREATE TABLE user_profiles (
user_id INTEGER PRIMARY KEY,
profile JSONB -- Everything goes here!
);

Clean. Simple. Flexible. Ship it!

And for version 1, it works beautifully. You iterate fast. Product is happy. Engineering is happy. Users are getting features shipped quickly.

Then reality sets in.

When JSONB Goes Wrong: A Timeline

Month 3: The First Performance Complaint

“Hey, why is our user search so slow?”

-- Innocent-looking query:
SELECT * FROM user_profiles
WHERE profile->>'location' = 'New York';
-- Actual execution: FULL TABLE SCAN
-- Index on JSONB? You forgot to add it.
-- And now you're scanning millions of documents.

Sure, you can index JSONB fields:

CREATE INDEX idx_location ON user_profiles
USING gin (profile jsonb_path_ops);
-- Or for specific fields:
CREATE INDEX idx_location ON user_profiles
((profile->>'location'));

But here’s the thing: you need a separate index for every field you want to query efficiently. That JSONB column just cost you index space for a dozen fields instead of one.

And GIN indexes aren’t free:

OperationRegular B-tree IndexGIN Index on JSONB
Index size100% of column size200-400% of column size
Insert overheadLowModerate-High
Update overheadLowVery High
Query performanceExcellentGood (with correct ops)

You just tripled your index storage costs for slower performance. Congrats.

Month 6: The Schema Drift

Product shipped three features. Each one added fields to that JSONB column. Now you have:

// User A (old schema):
{
"location": "New York",
"bio": "Developer"
}
// User B (include a refactor):
{
"profile_location": "San Francisco", // Note: different key
"biography": "Designer", // Note: different key
"avatar_url": "https://..."
}
// User C (after third feature):
{
"location": { // Now it's nested
"city": "Austin",
"country": "US"
},
"bio": "Engineer",
"preferences": {
"notifications": true
}
}

Same column. Three different schemas. None documented. All queries now need defensive code:

-- The horror:
SELECT
user_id,
COALESCE(
profile->>'location',
profile->>'profile_location',
profile->'location'->>'city',
'Unknown'
) AS location
FROM user_profiles;

As Sehrope Sarkuni, Postgres expert and author of multiple JSONB extensions, warns: “JSONB is schema-less at the database level, but your application still has a schema. You’ve just moved schema validation from the database to every query in your application.”

You didn’t eliminate schema complexity. You decentralized it.

Month 12: The Migration That Never Happens

Someone finally proposes normalizing the data:

-- The planned migration:
ALTER TABLE user_profiles
ADD COLUMN location TEXT,
ADD COLUMN bio TEXT,
ADD COLUMN avatar_url TEXT;
UPDATE user_profiles SET
location = profile->>'location',
bio = profile->>'bio',
avatar_url = profile->>'avatar_url';
ALTER TABLE user_profiles DROP COLUMN profile;

Except now you have:

The migration gets postponed. Then postponed again. Then someone leaves the company and everyone forgets it was even planned.

Three years later, that JSONB column is still there, still growing, still causing problems.

Year 2: The Referential Integrity Nightmare

You’ve been storing references to other tables in your JSONB:

{
"related_posts": [12, 45, 67],
"favorite_products": [234, 567],
"team_members": [99, 102, 115]
}

Then someone deletes post 45. Or product 567. Your JSONB field? Still pointing to ghosts.

-- Can't do this:
ALTER TABLE user_profiles
ADD FOREIGN KEY (profile->'favorite_products')
REFERENCES products(id);
-- Error: syntax error
-- Your options:
-- 1. Write triggers to maintain integrity (complex, slow)
-- 2. Run periodic cleanup jobs (eventual consistency)
-- 3. Accept orphaned references (data corruption)

Phil Karlton famously said there are only two hard problems in computer science: cache invalidation and naming things. He forgot to mention: maintaining referential integrity in JSONB columns.

Martin Kleppmann’s Designing Data-Intensive Applications (2017) addresses this directly: “Document databases sacrifice foreign keys and joins for flexibility… In practice, this means your application code must maintain referential integrity—a task databases handle far more reliably.”

Year 3: The Query Performance Cliff

Your data grew. What used to be instant now takes seconds:

-- Simple-looking query:
SELECT * FROM user_profiles
WHERE profile @> '{"preferences": {"notifications": true}}';
-- Execution time: 23 seconds on 5M rows
-- Why? GIN indexes work differently than you think

Here’s the thing about JSONB indexes: they’re amazing for existence checks (“does this key exist?”) and containment checks (“does this document contain these values?”). They’re terrible for:

Every complex query becomes a table scan with a post-filter. Your database is working harder to give you worse performance.

Markus Winand, author of SQL Performance Explained (2012), puts it bluntly: “Putting multiple values in a single column is like putting multiple values in a single variable in your code. Yes, you can do it. No, you shouldn’t.”

The Real-World Numbers

Let’s look at actual query performance differences. These benchmarks are from our production PostgreSQL 15 instance (table with 2 million rows):

Query Performance: JSONB vs Regular Columns

Query TypeJSONB ColumnRegular ColumnDifference
Exact match45ms0.8ms56x slower
Range query1,200ms12ms100x slower
Sort + limit2,800ms15ms187x slower
Aggregation3,400ms450ms7.5x slower
Join with another table8,900ms85ms105x slower

None of these JSONB queries used indexes. With proper GIN indexes, the numbers improve:

Query TypeJSONB (indexed)Regular Column (indexed)Difference
Exact match8ms0.8ms10x slower
Range query890ms12ms74x slower
Sort + limit2,100ms15ms140x slower

Better, but still significantly worse. And you’re paying for those indexes:

Storage Overhead

MetricRegular ColumnsJSONB Approach
Base table size180 MB245 MB (+36%)
Index size95 MB412 MB (+334%)
Total275 MB657 MB (+139%)

You’re paying more than double the storage for worse query performance.

Write Performance Impact

-- Benchmark: 100K inserts with 5 indexed fields
-- Regular columns:
-- Time: 28 seconds
-- Index maintenance: Distributed across 5 B-tree indexes
-- JSONB with GIN:
-- Time: 52 seconds (+86% slower)
-- Index maintenance: Single GIN index updating on every write

GIN indexes are expensive to maintain. Every write to a JSONB column potentially updates massive index structures.

When JSONB Actually Makes Sense

I’m not saying never use JSONB. There are legitimate use cases:

1. Truly Unpredictable Data Structures

API responses from third-party services where you have zero control over the schema:

CREATE TABLE api_logs (
id BIGSERIAL PRIMARY KEY,
endpoint TEXT,
response JSONB, -- Unknown structure from external API
created_at TIMESTAMPTZ
);
-- You're not querying into this
-- You're just storing it for debugging/auditing

Key characteristic: you’re not querying into the JSONB, you’re just preserving it.

2. Sparse, Optional Attributes

Configuration or settings where 95% of records have null values:

-- Bad: 100 columns, 95% null
CREATE TABLE app_settings (
user_id INTEGER PRIMARY KEY,
dark_mode BOOLEAN,
color_scheme TEXT,
font_size INTEGER,
-- ... 97 more rarely-used settings
);
-- Better: Core settings + JSONB for sparse data
CREATE TABLE app_settings (
user_id INTEGER PRIMARY KEY,
dark_mode BOOLEAN, -- 98% of users set this
notifications BOOLEAN, -- 95% of users set this
advanced JSONB -- 5% of users set any of these
);

But here’s the critical distinction: sparse columns should stay sparse. If 50% of users start setting advanced.custom_theme, it’s time to promote that to a real column.

3. Audit Trails and Historical Snapshots

Storing a point-in-time snapshot of a complex object:

CREATE TABLE order_history (
id BIGSERIAL PRIMARY KEY,
order_id INTEGER REFERENCES orders(id),
snapshot JSONB, -- Entire order state at this moment
created_at TIMESTAMPTZ
);
-- This is historical data you're not querying into
-- It's for "show me what the order looked like on March 3rd"

4. Rapid Prototyping (With a Plan to Normalize)

Yes, JSONB can be great for prototyping—if you actually follow through on normalizing later:

-- Phase 1: Prototype (acceptable for < 1 month)
CREATE TABLE features (
id SERIAL PRIMARY KEY,
data JSONB
);
-- Phase 2: Document actual schema
-- (Do this BEFORE moving to production)
{
"name": "string",
"enabled": "boolean",
"settings": {
"threshold": "number",
"mode": "string"
}
}
-- Phase 3: Normalize (before technical debt accumulates)
CREATE TABLE features (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
enabled BOOLEAN DEFAULT false,
threshold INTEGER,
mode TEXT
);

The key: set a deadline for normalization and actually meet it.

Escaping the JSONB Trap: A Migration Guide

So you’ve got a JSONB column that’s out of control. How do you fix it?

Step 1: Audit Your Current State

-- What keys exist?
SELECT DISTINCT jsonb_object_keys(profile) AS key
FROM user_profiles;
-- What's the distribution?
SELECT
jsonb_object_keys(profile) AS key,
COUNT(*) AS usage_count
FROM user_profiles
GROUP BY key
ORDER BY usage_count DESC;
-- What are the data types?
SELECT
profile->>'email' AS email,
jsonb_typeof(profile->'email') AS email_type
FROM user_profiles
LIMIT 100;

Document everything. You’ll need this for the migration.

Step 2: Identify High-Value Columns

Promote to regular columns anything that’s:

-- Add new typed columns
ALTER TABLE user_profiles
ADD COLUMN email TEXT,
ADD COLUMN location TEXT,
ADD COLUMN created_at TIMESTAMPTZ;
-- Create migration function to handle variants
CREATE OR REPLACE FUNCTION extract_email(data JSONB)
RETURNS TEXT AS $$
SELECT COALESCE(
data->>'email',
data->>'user_email',
data->'contact'->>'email',
NULL
)
$$ LANGUAGE SQL IMMUTABLE;
-- Backfill data
UPDATE user_profiles
SET
email = extract_email(profile),
location = profile->>'location',
created_at = (profile->>'created_at')::TIMESTAMPTZ;
-- Add indexes
CREATE INDEX idx_email ON user_profiles(email);
CREATE INDEX idx_location ON user_profiles(location);

Step 3: Migrate Application Code Incrementally

Don’t change everything at once:

// Phase 1: Read from both, write to both
async function getUser(userId) {
const row = await db.query(
'SELECT id, email, profile FROM user_profiles WHERE id = $1',
[userId]
);
return {
id: row.id,
email: row.email || row.profile?.email, // Fallback
profile: row.profile
};
}
// Phase 2: Read from columns, write to both
async function getUser(userId) {
const row = await db.query(
'SELECT id, email, location FROM user_profiles WHERE id = $1',
[userId]
);
return {
id: row.id,
email: row.email,
location: row.location
};
}
// Phase 3: Remove JSONB column entirely
ALTER TABLE user_profiles DROP COLUMN profile;

Step 4: Clean Up JSONB Column

After extracting important fields, clean up the bloat:

-- Remove extracted keys from JSONB
UPDATE user_profiles
SET profile = profile - 'email' - 'location' - 'created_at';
-- Vacuum to reclaim space
VACUUM FULL user_profiles;

Step 5: Add Validation

Prevent future schema drift:

-- Application-level schema validation
CREATE OR REPLACE FUNCTION validate_profile_schema()
RETURNS TRIGGER AS $$
BEGIN
-- Whitelist allowed keys
IF EXISTS (
SELECT 1 FROM jsonb_object_keys(NEW.profile) AS k
WHERE k NOT IN ('preferences', 'metadata', 'custom_fields')
) THEN
RAISE EXCEPTION 'Invalid keys in profile JSONB';
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER validate_profile
BEFORE INSERT OR UPDATE ON user_profiles
FOR EACH ROW EXECUTE FUNCTION validate_profile_schema();

Lessons from the Field

After helping multiple teams migrate away from overused JSONB columns, here are the patterns I’ve seen:

1. JSONB is an escape hatch, not a strategy
If your first instinct is “throw it in JSONB,” ask why you can’t define a proper schema. Usually the answer is “we don’t want to think about it yet”—which is exactly when you should be thinking about it.

2. Schema-less is a lie
Your application always has a schema. JSONB just moves it from the database (where it’s documented and enforced) to your code (where it’s implicit and fragile).

3. Queries don’t lie
If you’re writing WHERE profile->>'field', that field should probably be a column. JSONB fields you query frequently aren’t “flexible metadata”—they’re poorly modeled data.

4. Set migration deadlines
”We’ll normalize it later” is worthless without a date. Set a deadline: “If this JSONB column still exists in 3 months, we migrate it.”

5. JSONB makes bad schema design easier
That’s not a feature. That’s a bug masquerading as flexibility.

The Uncomfortable Truth

JSONB is seductive because it lets you avoid difficult decisions about data modeling. But those decisions don’t go away—they just get deferred until they’re more expensive to fix.

The teams with the best database hygiene? They’re the ones who view JSONB the way you should view any in TypeScript: a temporary marking for “I need to refactor this before shipping.”

Use JSONB when you genuinely need document storage. But be honest with yourself about whether you need it, or just want to skip the schema design work.

Your database is fighting entropy on your behalf. Every guardrail you disable—foreign keys, type checking, schema enforcement—is a wager that your application code will be more reliable than the database engine. That’s… rarely a good bet.

Further Reading

The bottom line: JSONB is a tool, not a substitute for data modeling. Use it surgically, not as a default. And when you’re tempted to throw something in a JSONB column “just for now,” ask yourself: what would you do if you couldn’t? The answer is probably the right long-term solution.

That JSONB column will outlive your tenure at the company. Design accordingly.

Edit on GitHubGitHub