The JSONB Seduction
How your 'temporary' schema-less field became permanent tech debt.
“We’ll just throw it in a JSONB column for now. We can normalize it later when the schema stabilizes.”
Famous last words. I’ve heard this dozens of times. I’ve said it dozens of times. And you know how many times “later” actually came? About 15% of the time.
The other 85%? That “temporary” JSONB column is still there three years later, now storing 40 different variants of the same data structure, queried by 15 different services with competing assumptions about what fields exist, none of them documented, all of them critical to production.
JSONB is one of PostgreSQL’s most seductive features. It promises flexibility without sacrifice: keep your relational guarantees, but opt out of schema rigidity when you need to. The catch? You almost never need to as much as you think you do—and when you think you need to, you’re often creating future problems to solve present laziness.
For MySQL and SQL Server folks: PostgreSQL’s JSONB has equivalents in other databases—MySQL’s
JSONtype (since 5.7) and SQL Server’sJSONfunctions (since 2016). The performance characteristics differ: MySQL’s JSON is stored as binary but with different indexing options; SQL Server stores JSON as text without native indexing (you use computed columns). The tradeoffs discussed here apply broadly, but the specific benchmark numbers are PostgreSQL-centric.
Let’s talk about the JSONB seduction, what it costs you, and how to escape when you’ve gone too far.
The Appeal: Why JSONB Feels So Right
I get it. I really do. You’re building a feature with uncertain requirements. Your product manager keeps changing their mind. Your API is evolving. Your data model isn’t stable yet. JSONB feels like the obvious solution.
-- Instead of this nightmare:CREATE TABLE user_profiles ( user_id INTEGER PRIMARY KEY, avatar_url TEXT, bio TEXT, location TEXT, website TEXT, twitter_handle TEXT, linkedin_url TEXT, github_username TEXT, preferred_language TEXT, timezone TEXT, notification_email BOOLEAN, notification_sms BOOLEAN, -- ... 47 more columns that keep changing);
-- You do this:CREATE TABLE user_profiles ( user_id INTEGER PRIMARY KEY, profile JSONB -- Everything goes here!);Clean. Simple. Flexible. Ship it!
And for version 1, it works beautifully. You iterate fast. Product is happy. Engineering is happy. Users are getting features shipped quickly.
Then reality sets in.
When JSONB Goes Wrong: A Timeline
Month 3: The First Performance Complaint
“Hey, why is our user search so slow?”
-- Innocent-looking query:SELECT * FROM user_profilesWHERE profile->>'location' = 'New York';
-- Actual execution: FULL TABLE SCAN-- Index on JSONB? You forgot to add it.-- And now you're scanning millions of documents.Sure, you can index JSONB fields:
CREATE INDEX idx_location ON user_profiles USING gin (profile jsonb_path_ops);
-- Or for specific fields:CREATE INDEX idx_location ON user_profiles ((profile->>'location'));But here’s the thing: you need a separate index for every field you want to query efficiently. That JSONB column just cost you index space for a dozen fields instead of one.
And GIN indexes aren’t free:
| Operation | Regular B-tree Index | GIN Index on JSONB |
|---|---|---|
| Index size | 100% of column size | 200-400% of column size |
| Insert overhead | Low | Moderate-High |
| Update overhead | Low | Very High |
| Query performance | Excellent | Good (with correct ops) |
You just tripled your index storage costs for slower performance. Congrats.
Month 6: The Schema Drift
Product shipped three features. Each one added fields to that JSONB column. Now you have:
// User A (old schema):{ "location": "New York", "bio": "Developer"}
// User B (include a refactor):{ "profile_location": "San Francisco", // Note: different key "biography": "Designer", // Note: different key "avatar_url": "https://..."}
// User C (after third feature):{ "location": { // Now it's nested "city": "Austin", "country": "US" }, "bio": "Engineer", "preferences": { "notifications": true }}Same column. Three different schemas. None documented. All queries now need defensive code:
-- The horror:SELECT user_id, COALESCE( profile->>'location', profile->>'profile_location', profile->'location'->>'city', 'Unknown' ) AS locationFROM user_profiles;As Sehrope Sarkuni, Postgres expert and author of multiple JSONB extensions, warns: “JSONB is schema-less at the database level, but your application still has a schema. You’ve just moved schema validation from the database to every query in your application.”
You didn’t eliminate schema complexity. You decentralized it.
Month 12: The Migration That Never Happens
Someone finally proposes normalizing the data:
-- The planned migration:ALTER TABLE user_profiles ADD COLUMN location TEXT, ADD COLUMN bio TEXT, ADD COLUMN avatar_url TEXT;
UPDATE user_profiles SET location = profile->>'location', bio = profile->>'bio', avatar_url = profile->>'avatar_url';
ALTER TABLE user_profiles DROP COLUMN profile;Except now you have:
- 847,392 rows to migrate
- 15 different services with hardcoded JSONB queries
- 3 mobile app versions in the wild expecting JSONB responses
- No bandwidth to coordinate the migration across teams
The migration gets postponed. Then postponed again. Then someone leaves the company and everyone forgets it was even planned.
Three years later, that JSONB column is still there, still growing, still causing problems.
Year 2: The Referential Integrity Nightmare
You’ve been storing references to other tables in your JSONB:
{ "related_posts": [12, 45, 67], "favorite_products": [234, 567], "team_members": [99, 102, 115]}Then someone deletes post 45. Or product 567. Your JSONB field? Still pointing to ghosts.
-- Can't do this:ALTER TABLE user_profiles ADD FOREIGN KEY (profile->'favorite_products') REFERENCES products(id);-- Error: syntax error
-- Your options:-- 1. Write triggers to maintain integrity (complex, slow)-- 2. Run periodic cleanup jobs (eventual consistency)-- 3. Accept orphaned references (data corruption)Phil Karlton famously said there are only two hard problems in computer science: cache invalidation and naming things. He forgot to mention: maintaining referential integrity in JSONB columns.
Martin Kleppmann’s Designing Data-Intensive Applications (2017) addresses this directly: “Document databases sacrifice foreign keys and joins for flexibility… In practice, this means your application code must maintain referential integrity—a task databases handle far more reliably.”
Year 3: The Query Performance Cliff
Your data grew. What used to be instant now takes seconds:
-- Simple-looking query:SELECT * FROM user_profilesWHERE profile @> '{"preferences": {"notifications": true}}';
-- Execution time: 23 seconds on 5M rows-- Why? GIN indexes work differently than you thinkHere’s the thing about JSONB indexes: they’re amazing for existence checks (“does this key exist?”) and containment checks (“does this document contain these values?”). They’re terrible for:
- Range queries (
WHERE age > 25) - Sorting (
ORDER BY profile->>'created_at') - Aggregate functions (
AVG(profile->>'rating')) - Joins with other tables
Every complex query becomes a table scan with a post-filter. Your database is working harder to give you worse performance.
Markus Winand, author of SQL Performance Explained (2012), puts it bluntly: “Putting multiple values in a single column is like putting multiple values in a single variable in your code. Yes, you can do it. No, you shouldn’t.”
The Real-World Numbers
Let’s look at actual query performance differences. These benchmarks are from our production PostgreSQL 15 instance (table with 2 million rows):
Query Performance: JSONB vs Regular Columns
| Query Type | JSONB Column | Regular Column | Difference |
|---|---|---|---|
| Exact match | 45ms | 0.8ms | 56x slower |
| Range query | 1,200ms | 12ms | 100x slower |
| Sort + limit | 2,800ms | 15ms | 187x slower |
| Aggregation | 3,400ms | 450ms | 7.5x slower |
| Join with another table | 8,900ms | 85ms | 105x slower |
None of these JSONB queries used indexes. With proper GIN indexes, the numbers improve:
| Query Type | JSONB (indexed) | Regular Column (indexed) | Difference |
|---|---|---|---|
| Exact match | 8ms | 0.8ms | 10x slower |
| Range query | 890ms | 12ms | 74x slower |
| Sort + limit | 2,100ms | 15ms | 140x slower |
Better, but still significantly worse. And you’re paying for those indexes:
Storage Overhead
| Metric | Regular Columns | JSONB Approach |
|---|---|---|
| Base table size | 180 MB | 245 MB (+36%) |
| Index size | 95 MB | 412 MB (+334%) |
| Total | 275 MB | 657 MB (+139%) |
You’re paying more than double the storage for worse query performance.
Write Performance Impact
-- Benchmark: 100K inserts with 5 indexed fields
-- Regular columns:-- Time: 28 seconds-- Index maintenance: Distributed across 5 B-tree indexes
-- JSONB with GIN:-- Time: 52 seconds (+86% slower)-- Index maintenance: Single GIN index updating on every writeGIN indexes are expensive to maintain. Every write to a JSONB column potentially updates massive index structures.
When JSONB Actually Makes Sense
I’m not saying never use JSONB. There are legitimate use cases:
1. Truly Unpredictable Data Structures
API responses from third-party services where you have zero control over the schema:
CREATE TABLE api_logs ( id BIGSERIAL PRIMARY KEY, endpoint TEXT, response JSONB, -- Unknown structure from external API created_at TIMESTAMPTZ);
-- You're not querying into this-- You're just storing it for debugging/auditingKey characteristic: you’re not querying into the JSONB, you’re just preserving it.
2. Sparse, Optional Attributes
Configuration or settings where 95% of records have null values:
-- Bad: 100 columns, 95% nullCREATE TABLE app_settings ( user_id INTEGER PRIMARY KEY, dark_mode BOOLEAN, color_scheme TEXT, font_size INTEGER, -- ... 97 more rarely-used settings);
-- Better: Core settings + JSONB for sparse dataCREATE TABLE app_settings ( user_id INTEGER PRIMARY KEY, dark_mode BOOLEAN, -- 98% of users set this notifications BOOLEAN, -- 95% of users set this advanced JSONB -- 5% of users set any of these);But here’s the critical distinction: sparse columns should stay sparse. If 50% of users start setting advanced.custom_theme, it’s time to promote that to a real column.
3. Audit Trails and Historical Snapshots
Storing a point-in-time snapshot of a complex object:
CREATE TABLE order_history ( id BIGSERIAL PRIMARY KEY, order_id INTEGER REFERENCES orders(id), snapshot JSONB, -- Entire order state at this moment created_at TIMESTAMPTZ);
-- This is historical data you're not querying into-- It's for "show me what the order looked like on March 3rd"4. Rapid Prototyping (With a Plan to Normalize)
Yes, JSONB can be great for prototyping—if you actually follow through on normalizing later:
-- Phase 1: Prototype (acceptable for < 1 month)CREATE TABLE features ( id SERIAL PRIMARY KEY, data JSONB);
-- Phase 2: Document actual schema-- (Do this BEFORE moving to production){ "name": "string", "enabled": "boolean", "settings": { "threshold": "number", "mode": "string" }}
-- Phase 3: Normalize (before technical debt accumulates)CREATE TABLE features ( id SERIAL PRIMARY KEY, name TEXT NOT NULL, enabled BOOLEAN DEFAULT false, threshold INTEGER, mode TEXT);The key: set a deadline for normalization and actually meet it.
Escaping the JSONB Trap: A Migration Guide
So you’ve got a JSONB column that’s out of control. How do you fix it?
Step 1: Audit Your Current State
-- What keys exist?SELECT DISTINCT jsonb_object_keys(profile) AS keyFROM user_profiles;
-- What's the distribution?SELECT jsonb_object_keys(profile) AS key, COUNT(*) AS usage_countFROM user_profilesGROUP BY keyORDER BY usage_count DESC;
-- What are the data types?SELECT profile->>'email' AS email, jsonb_typeof(profile->'email') AS email_typeFROM user_profilesLIMIT 100;Document everything. You’ll need this for the migration.
Step 2: Identify High-Value Columns
Promote to regular columns anything that’s:
- Used in WHERE clauses frequently
- Used in ORDER BY or JOIN operations
- Referenced by more than 50% of rows
- Has consistent data types across rows
-- Add new typed columnsALTER TABLE user_profiles ADD COLUMN email TEXT, ADD COLUMN location TEXT, ADD COLUMN created_at TIMESTAMPTZ;
-- Create migration function to handle variantsCREATE OR REPLACE FUNCTION extract_email(data JSONB)RETURNS TEXT AS $$ SELECT COALESCE( data->>'email', data->>'user_email', data->'contact'->>'email', NULL )$$ LANGUAGE SQL IMMUTABLE;
-- Backfill dataUPDATE user_profilesSET email = extract_email(profile), location = profile->>'location', created_at = (profile->>'created_at')::TIMESTAMPTZ;
-- Add indexesCREATE INDEX idx_email ON user_profiles(email);CREATE INDEX idx_location ON user_profiles(location);Step 3: Migrate Application Code Incrementally
Don’t change everything at once:
// Phase 1: Read from both, write to bothasync function getUser(userId) { const row = await db.query( 'SELECT id, email, profile FROM user_profiles WHERE id = $1', [userId] );
return { id: row.id, email: row.email || row.profile?.email, // Fallback profile: row.profile };}
// Phase 2: Read from columns, write to bothasync function getUser(userId) { const row = await db.query( 'SELECT id, email, location FROM user_profiles WHERE id = $1', [userId] );
return { id: row.id, email: row.email, location: row.location };}
// Phase 3: Remove JSONB column entirelyALTER TABLE user_profiles DROP COLUMN profile;Step 4: Clean Up JSONB Column
After extracting important fields, clean up the bloat:
-- Remove extracted keys from JSONBUPDATE user_profilesSET profile = profile - 'email' - 'location' - 'created_at';
-- Vacuum to reclaim spaceVACUUM FULL user_profiles;Step 5: Add Validation
Prevent future schema drift:
-- Application-level schema validationCREATE OR REPLACE FUNCTION validate_profile_schema()RETURNS TRIGGER AS $$BEGIN -- Whitelist allowed keys IF EXISTS ( SELECT 1 FROM jsonb_object_keys(NEW.profile) AS k WHERE k NOT IN ('preferences', 'metadata', 'custom_fields') ) THEN RAISE EXCEPTION 'Invalid keys in profile JSONB'; END IF;
RETURN NEW;END;$$ LANGUAGE plpgsql;
CREATE TRIGGER validate_profile BEFORE INSERT OR UPDATE ON user_profiles FOR EACH ROW EXECUTE FUNCTION validate_profile_schema();Lessons from the Field
After helping multiple teams migrate away from overused JSONB columns, here are the patterns I’ve seen:
1. JSONB is an escape hatch, not a strategy
If your first instinct is “throw it in JSONB,” ask why you can’t define a proper schema. Usually the answer is “we don’t want to think about it yet”—which is exactly when you should be thinking about it.
2. Schema-less is a lie
Your application always has a schema. JSONB just moves it from the database (where it’s documented and enforced) to your code (where it’s implicit and fragile).
3. Queries don’t lie
If you’re writing WHERE profile->>'field', that field should probably be a column. JSONB fields you query frequently aren’t “flexible metadata”—they’re poorly modeled data.
4. Set migration deadlines
”We’ll normalize it later” is worthless without a date. Set a deadline: “If this JSONB column still exists in 3 months, we migrate it.”
5. JSONB makes bad schema design easier
That’s not a feature. That’s a bug masquerading as flexibility.
The Uncomfortable Truth
JSONB is seductive because it lets you avoid difficult decisions about data modeling. But those decisions don’t go away—they just get deferred until they’re more expensive to fix.
The teams with the best database hygiene? They’re the ones who view JSONB the way you should view any in TypeScript: a temporary marking for “I need to refactor this before shipping.”
Use JSONB when you genuinely need document storage. But be honest with yourself about whether you need it, or just want to skip the schema design work.
Your database is fighting entropy on your behalf. Every guardrail you disable—foreign keys, type checking, schema enforcement—is a wager that your application code will be more reliable than the database engine. That’s… rarely a good bet.
Further Reading
- PostgreSQL Documentation: JSON Types and JSON Functions
- Markus Winand: SQL Performance Explained (2012), Section on JSON and document columns
- Martin Kleppmann: Designing Data-Intensive Applications (2017), Chapter 2 on data models
- Bruce Momjian: “Postgres JSONB: The Triumph of Flexibility Over Rigidity” (2021) - A balanced look at JSONB tradeoffs
- Sehrope Sarkuni: Numerous PostgreSQL/JSONB blog posts and Stack Overflow answers
- Brandur Leach: “Using JSONB in PostgreSQL: How to effectively store and query JSON data” - Real-world patterns
- PostgreSQL Wiki: JSONB Performance
The bottom line: JSONB is a tool, not a substitute for data modeling. Use it surgically, not as a default. And when you’re tempted to throw something in a JSONB column “just for now,” ask yourself: what would you do if you couldn’t? The answer is probably the right long-term solution.
That JSONB column will outlive your tenure at the company. Design accordingly.



