Quiz: Deep Postgres: Pt. 2

Are you SQL Steele? Or a SQL Softie?

Part 2 of 2. Back to Part 1

Welcome to Part 2 of my Deep Postres Quiz!

This quiz covers a mix of familiar and lesser-known PostgreSQL features and gotchas: JOIN, NULL, ANSI SQL, Transactions, Internals, Indexing, Data Sampling… and more!

Good luck! 🍀

Warmup: Joins

Which is not ❌ a valid PostgreSQL JOIN expression?

You knew about CROSS JOIN, right?

Did the JOIN ALL get you thinking about another operation, UNION ALL?

The answer here is JOIN ALL ❌.

The correct JOIN types are:

INNER JOIN (default JOIN)
LEFT JOIN (or LEFT OUTER JOIN)
RIGHT JOIN (or RIGHT OUTER JOIN)
FULL JOIN (or FULL OUTER JOIN)
CROSS JOIN (aka “The Self Join,” aka “The Lonely JOIN,” famously known-at-parties as the Cartesian product!)

JOIN ALL INNER JOIN CROSS JOIN LEFT OUTER JOIN FULL OUTER JOIN

Identity Columns

What is the (SQL Standard) syntax for an auto-incrementing IDENTITY column in PostgreSQL (v10+)?

While SERIAL is still commonly used, the SQL Standard way is:

id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY

id INT IDENTITY(1,1) is SQL Server syntax.
id INT AUTO_INCREMENT is MySQL syntax.
id INTEGER AUTO_INCREMENT PRIMARY KEY is MySQL syntax.

This has several advantages over SERIAL:

It’s SQL Standard compliant
Better handling of sequence ownership
Clearer semantics around manual inserts
Works better with replication

SERIAL is actually just shorthand for creating a sequence and setting a default value, which can lead to issues with dumps and restores.

id SERIAL PRIMARY KEY id INT IDENTITY(1,1)id INT AUTO_INCREMENT id NUMBER GENERATED ALWAYS AS IDENTITY id BIGINT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY

ANSI SQL

Which of these PostgreSQL features is NOT ❌ part of the ANSI SQL standard?

Examples of PostgreSQL extensions to ANSI SQL:

RETURNING clause.
ILIKE for case-insensitive matching.
Array types and operations.
JSON/JSONB types.

The original ANSI SQL standard includes:

WITH (Common Table Expressions).
LIKE for pattern matching.
LATERAL joins.
Window functions.

INTO WITH LATERAL RETURNING

Transactions

Except for “Depends on side effects” What happens after an error?

BEGIN;
UPDATE students
  SET status = 'alumni'
  WHERE grade = 12;
UPDATE students
  SET graduation_date = CURRENT_DATE
  WHERE grade = 12;
UPDATE students
  SET grade = 'GRADUATED'; -- 🚨 Error: type mismatch
COMMIT;

When an error occurs here:

The entire transaction is rolled back
No changes are committed
The transaction is marked as failed

To partially commit:

BEGIN;
UPDATE ...;
SAVEPOINT my_save;
UPDATE ...; -- Error
ROLLBACK TO my_save;
COMMIT;

Depends on side effects Rolls back all statements Commits first statement only Commits all successful statements

SQL Internals

Which ON condition triggers a HASH JOIN in PostgreSQL?

SELECT *
FROM students a
JOIN grades b
ON /* What goes here? ⁉ */;

A hash join is triggered by an equality (=) condition in the ON clause.

= is the most common hash join trigger
> and >= trigger a merge join
HASH JOIN is not a valid syntax in PostgreSQL

Hash joins are efficient for large tables, but require enough memory to build the hash table.

ON a.id = b.id ON a.id > b.id ON a.id >= b.id HASH JOIN ON a.id = b.id ...

Indexing

Which index is better for this query?

-- Query:
SELECT * FROM students
WHERE grade_level = 42
  AND last_name IN ('Levy', 'Smith');

To me, the surprising thing here is that a multi-column index can be used for queries that only use a subset of the columns; and that the order of columns in the query DO NOT need to match the index definition!

Note: For faster queries, put the most selective column(s) first. In this case, grade_level is more selective than last_name - meaning it will likely filter out more rows.

The faster SQL can select the fewest rows, the better the performance.

For example, if you have a task table, and you know that status is more selective than created_at, you should put status first in the query. This will make querying status = 'done' AND created_at > '2024-01-01' faster than querying created_at > '2024-01-01' AND status = 'done'.

A B+Tree index can be used for:

Any subset of columns, =, >, <, BETWEEN, IN.
Prefix matching, LIKE 'prefix%'.

Thanks to u/mwdb2 for pointing out an earlier mistake in this explanation!

CREATE INDEX ON students.grade_level; CREATE ... students.last_name;CREATE INDEX ON students(grade_level);CREATE INDEX ON students(last_name, age);CREATE INDEX ON students(first_name, last_name, grade_level);

Quoting 1

Which is correct for case-sensitive column names?

CREATE TABLE grades (
  "Student Name" VARCHAR(100),
  'Student Grade' INTEGER
);

In PostgreSQL:

Double quotes (") are for identifiers (table names, column names)
Single quotes (') are for string literals

Best practices:

Avoid mixed case or spaces in names
Use snake_case for identifiers
Only use quoted identifiers when absolutely necessary

Both work the same Double quotes are required Single quotes are required Syntax error: Invalid user literal All names MUST be lowercase

Quoting 2

How do you specify a column name containing a period?

CREATE TABLE teachers (
  id INT GENERATED BY DEFAULT AS IDENTITY,
  first.name TEXT,
  -- How do you include the column: first.name?
  salary NUMERIC
);

PostgreSQL uses double quotes for identifier quoting:

Double quotes "..." for identifiers (column names, table names)
Single quotes '...' for string literals
The form first.name is interpreted as table_name.column_name

Without quotes, a period in a column name would be interpreted as a schema/table separator! Additionally, all unquoted identifiers are lower-cased by default.

In order to reduce surprises, it’s best to use snake_case and avoid using special characters.

first.name "first.name"'first.name'[first.name][first].[name]

Data Sampling

Which is valid syntax to randomly sample 10% of students?

This one is tricky! It features roughly 6 traps, some include:

High level description + fragments of syntax. Tests how you adapt & assemble incomplete info.
Requires familiar knowledge of TABLESAMPLE syntax, since none of the TABLESAMPLE options use the correct syntax! (Sinister! I know! Keep going, I believe in you!!! ❤️)
The ROW_NUMBER() OVER (ORDER BY RAND()) and ORDER BY RANDOM() > 10 are more red herrings. They look like they could be correct, but they’re not. RAND() isn’t a thing. And RANDOM() > 10 is a boolean expression that will always be false since RANDOM() always is in the range 0.0-1.0. One more problem, the ORDER BY RANDOM() > 10 would also sort on a boolean value, not a random number. But that > 10 sure felt like it could be a percentage, right?
BERNOULLI is valid, however SAMPLETABLE is not.
WHERE RANDOM() >= 0.1 is a trick! RANDOM() returns a value between 0 and 1, so >= 0.1 would return 90% of the rows, not 10%.

As for the TABLESAMPLE method, PostgreSQL supports two sampling methods:

BERNOULLI: Each row has equal probability
SYSTEM: Block-level sampling (faster but less random)

To learn more about sampling, check out this article from Render.

Example:

-- Sample 10% of rows
SELECT * FROM students TABLESAMPLE BERNOULLI (0.1);

-- For repeatable results, use REPEATABLE:
SELECT * FROM students
TABLESAMPLE BERNOULLI (10) REPEATABLE (42);

Also worth mentioning, TABLESAMPLE has been reported to not be very random. If you need a more random sample, consider using ORDER BY RANDOM() LIMIT 10. But be aware that this can be slow on large tables.

There are also some third-party extensions that provide better random sampling methods, but that’s out of scope for this! 😅

...students SAMPLETABLE BERNOULLI (0.10)...students TABLESAMPLE USER (10)...students TABLESAMPLE RANDOM (10)...students SAMPLE 10 PERCENT ROW_NUMBER() OVER (ORDER BY RAND())ORDER BY RANDOM() >= 10 WHERE RANDOM() >= 0.1 WHERE RANDOM() >= 0.9

10.

Indexing

What does this incantation do?

CREATE INDEX idx_active_students
ON students(last_name)
WHERE status = 'active';

This trick is called a Partial Index.

They are useful for reducing disk use & speeding queries when only a subset of rows are typically queried.

Postgres will only use the index when the status = 'active' condition is present in the query.

Reminder: Postgres doesn’t support ‘query hints’ like other RDBMSs. The query planner will decide when to use any appropriate index(es).

Invalid syntax Creates duplicate indexes Only indexes active students Indexes active students first Provides a hint to the query planner

11.

NULL Behavior

How many rows will this return?

SELECT COUNT(*)
FROM students
WHERE grade = NULL;

Ok, this was a little tricky. The gotcha is that x = NULL will never match any rows, and it will ALWAYS return one row with a count of 0.

Since NULL is an unknown value, it can’t be compared to anything, not even itself.

The correct way to check for NULL values is with IS NULL.

Example:

postgres=# SELECT count(*)
  FROM students
  WHERE age = null;
count
-------
    0
(1 row)

postgres=#
postgres=# SELECT count(*)
  FROM students
  WHERE age is null;
count
-------
    3
(1 row)

100 90 10 1 0

12.

Performance

What does EXPLAIN ANALYZE do for UPDATE statements?

EXPLAIN ANALYZE
UPDATE students
SET grade = grade + 1
WHERE status = 'active';

EXPLAIN ANALYZE actually executes the query! Be careful analyzing a write operation, specifically UPDATE, DELETE, and INSERT statements. You may have to run cleanup operations after each EXPLAIN ANALYZE to get repeatable results.

Best practices:

Use EXPLAIN (without ANALYZE) for UPDATE/DELETE/INSERT
Wrap modifications in transactions:

BEGIN;
EXPLAIN ANALYZE ...;
ROLLBACK;

Lite plan estimate Only works on SELECT Modifies data & emits plan Simulates execution to get a plan

Quiz Score:

Congrats! Quiz completed.

All Quizzes

Well done! You went deep on several areas of PostgreSQL! 🐘

I hope you learned something new, or at least got a score to gloat about! 🏆

Want more thrills in life? Check out my Quiz Collection for endless* fun!

DanLevy.net

Quiz: Deep Postgres: Pt. 2

Are you SQL Steele? Or a SQL Softie?

Warmup: Joins

Identity Columns

ANSI SQL

Transactions

SQL Internals

Indexing

Quoting 1

Quoting 2

Data Sampling

Indexing

NULL Behavior

Performance

Congrats! Quiz completed.

Read More Shit, by Dan Levy ✨

One weird trick to speed up feature teams!

Quiz: Do You know Modern CSS? (for 2025)

How to lose a billion dollars super fast

The Unassuming Power of Multiple-Choice Questions