DanLevy.net

JSONB: The Best Way to Ruin Your Database

JSONB is powerful, useful, and very easy to misuse when you let a blob become your real schema.

PostgreSQL added JSONB to let you store semi-structured data without defining rigid schemas upfront. The idea was sound: sometimes you genuinely don’t know what the data will look like, or it changes too frequently for traditional columns to make sense.

That matters because JSONB is not a mistake. In plenty of systems it is the cleanest representation of the problem space. If you’re storing third-party webhook payloads, versioned event bodies, feature flags, or LLM configuration objects where every provider and model exposes a slightly different and constantly changing option set, forcing everything into first-class columns can be more awkward than helpful.

The problem is that JSONB is also the easiest way to defer schema decisions without admitting you’re deferring them. Somewhere between intention and implementation, it became the database equivalent of “I’ll clean my room later.” That temporary solution you reached for six months ago? It’s still there, and now production depends on it.

I keep seeing the same pattern. A team adds a JSONB column because they’re unsure about requirements. They promise themselves they’ll normalize it once things settle down. Three years later, that column contains forty different versions of what was supposed to be a user profile, queried by fifteen services that each make different assumptions about what’s inside.

The technical debt isn’t the JSONB itself. It’s the gap between what you told yourself you were building and what you actually built: an undocumented schema-on-read system.

What Usually Happens

You’re adding a feature and you’re not sure whether users need a twitter_handle or a bluesky_handle or something else entirely. Rather than think through the schema, you do this:

CREATE TABLE users (
id SERIAL PRIMARY KEY,
profile JSONB
);

It works. You ship the feature, move on to the next one, then the next one. The JSONB column grows quietly in the background.

This is the fork in the road. If profile stays an opaque blob fetched by user.id, you’re probably fine. If it starts becoming the primary place business data lives, the tradeoffs change fast.

Product asks: “How many users are in New York?”

You write:

SELECT count(*) FROM users WHERE profile->>'location' = 'New York';

Postgres conducts a full table scan. Every single row.

So you add a GIN index. Maybe that’s still acceptable. Sometimes it is. But now you’re paying real complexity and storage cost because a field that behaves like first-class relational data never became a first-class column.

Year 1: Schema Drift

You have three versions of data in the same column.

Your application code now looks like this:

const city = user.location || user.city || user.address?.city || "Unknown";

You didn’t remove the schema. You just moved validation and consistency checks from the database into scattered application code.


When to Actually Use JSONB

JSONB has valid use cases. Many times it’s perfectly fine, and sometimes it’s the best choice available.

The critical distinction is not “structured good, JSON bad.” It’s closer to this:

Legitimate JSONB Use Cases

  1. Webhook Payloads: You receive data from Stripe, Slack, or GitHub. You have zero control over the schema. You may never query it. You just need to store it for debugging or replay. Perfect for JSONB.

  2. Logging & Event Streams: Application logs, audit trails, error contexts. These are write-heavy, rarely queried by specific fields, and often analyzed in bulk or exported to analytics platforms. JSONB is fine here.

  3. User Preferences & Settings: Settings objects where you have 100+ boolean flags, most are false, and you’re always fetching the entire blob by user ID. You’re not running WHERE preferences->>'theme' = 'dark'. JSONB works.

  4. LLM Provider / Model Config: This is one of the clearest modern examples. OpenAI, Anthropic, Gemini, open-weight local models, and vendor-specific gateways all expose overlapping but different parameters. Even within one provider, model capabilities and option names evolve. A JSONB config blob is often much more honest than pretending temperature, top_p, reasoning_effort, json_schema, tool_choice, and twenty other knobs should all be universal columns. JSONB is often the right abstraction here.

  5. API Response Caching: You’re caching entire API responses. The database is just a faster Redis. You fetch by cache key, never by nested properties. JSONB is appropriate.

  6. Event Sourcing: You’re storing immutable event payloads. Your queries are always “give me all events for aggregate X” ordered by time. You never run WHERE clauses on event properties. JSONB fits.

  7. Extensibility Surfaces: Integrations, plugin settings, per-tenant overrides, marketplace metadata, provider capabilities, or “extras” fields where you explicitly expect the shape to vary by subtype. JSONB can be the right contract, not a compromise.

Rule of thumb: if the application fetches the document by a known key and understands how to validate/version it, JSONB can be excellent. If the business keeps asking relational questions about nested keys, those fields are trying to become columns.

The Best Pattern Is Often Hybrid

A lot of mature systems land here:

CREATE TABLE llm_requests (
id UUID PRIMARY KEY,
provider TEXT NOT NULL,
model TEXT NOT NULL,
status TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
config JSONB NOT NULL
);

This is usually better than either extreme.

That is not “failing to normalize.” That is drawing the line in the right place.

At Scale: Object Versioning > Normalization

Here’s where it gets interesting. At sufficiently large scale, the “right” solution isn’t normalization—it’s object versioning.

If you have billions of rows and frequent schema evolution, migrating columns becomes expensive. Companies like Stripe, GitHub, and Netflix don’t normalize everything. Instead:

CREATE TABLE entities (
id UUID PRIMARY KEY,
version INT NOT NULL,
data JSONB NOT NULL
);

Your application knows how to read version: 1, version: 2, version: 3. No database migrations for new fields. Code handles backward compatibility.

This is an architectural decision, not laziness. It trades database complexity for application complexity. Sometimes that’s exactly the right trade, especially when the document is naturally versioned and the app is the canonical interpreter.

The failure mode is not “using JSONB.” The failure mode is using JSONB without versioning, validation, promotion rules, or a clear boundary between document data and relational data.

The Questions That Actually Matter

Before you add a JSONB column, ask:

  1. Will we query nested fields in WHERE, JOIN, GROUP BY, or ORDER BY regularly?
  2. Do we control this schema, or is it externally defined and volatile?
  3. Is the shape intentionally heterogeneous across records?
  4. Do we have application-level validation and versioning?
  5. Which fields are likely to become operational dimensions later?

If the answer to #1 is “yes, constantly,” that’s a strong signal for columns.

If the answers to #2 and #3 are “yes,” JSONB is probably doing real work for you.


Escaping the Trap

If you are already in this hole, stop digging.

  1. Audit: Run jsonb_object_keys and inspect actual shape drift, not the shape you assume exists.
  2. Promote: Identify the fields you filter, join, sort, or report on most often. Make those real columns.
  3. Validate: Add application or database-level validation for whatever remains in JSONB.
  4. Version: If the blob is real domain data, version it explicitly.
  5. Trim: Remove duplicated keys from the blob once the promoted columns are established.

Don’t tell yourself every blob must be normalized. Also don’t tell yourself a blob with permanent business semantics is “temporary.”

JSONB is great when the document is genuinely document-shaped. It’s dangerous when it’s a relational schema wearing a fake mustache.

Resources