Quiz: Deep Postgres: Pt. 2
Are you SQL Steele? Or a SQL Softie?
Part 2 of 2. Back to Part 1
Welcome to Part 2 of my Deep Postres Quiz!
This quiz covers a mix of familiar and lesser-known PostgreSQL features and gotchas: JOIN
, NULL
, ANSI SQL, Transactions, Internals, Indexing, Data Sampling… and more!
Good luck! 🍀
Warmup: Joins
Which is not ❌ a valid PostgreSQL JOIN expression?
You knew about CROSS JOIN
, right?
Did the JOIN ALL
get you thinking about another operation, UNION ALL
?
The answer here is JOIN ALL
❌.
The correct JOIN types are:
INNER JOIN
(default JOIN)LEFT JOIN
(orLEFT OUTER JOIN
)RIGHT JOIN
(orRIGHT OUTER JOIN
)FULL JOIN
(orFULL OUTER JOIN
)CROSS JOIN
(aka “The Self Join,” aka “The Lonely JOIN,” famously known-at-parties as theCartesian product
!)
Identity Columns
What is the SQL Standard compliant way to create an auto-incrementing primary key in PostgreSQL 10+?
We’re creating a table to track students:
While SERIAL
is still commonly used, the SQL Standard way is:
This has several advantages over SERIAL
:
- It’s SQL Standard compliant
- Better handling of sequence ownership
- Clearer semantics around manual inserts
- Works better with replication
SERIAL
is actually just shorthand for creating a sequence and setting a default value, which can lead to issues with dumps and restores.
ANSI SQL
Which of these PostgreSQL features is NOT ❌ part of the ANSI SQL standard?
Examples of PostgreSQL extensions to ANSI SQL:
RETURNING
clause.ILIKE
for case-insensitive matching.- Array types and operations.
- JSON/JSONB types.
The original ANSI SQL standard includes:
WITH
(Common Table Expressions).LIKE
for pattern matching.LATERAL
joins.- Window functions.
Transactions
Except for “Depends on side effects” What happens after an error?
When an error occurs here:
- The entire transaction is rolled back
- No changes are committed
- The transaction is marked as failed
To partially commit:
SQL Internals
Which ON
condition triggers a HASH JOIN
in PostgreSQL?
A hash join is triggered by an equality (=
) condition in the ON
clause.
=
is the most common hash join trigger>
and>=
trigger a merge joinHASH JOIN
is not a valid syntax in PostgreSQL
Hash joins are efficient for large tables, but require enough memory to build the hash table.
Indexing
Which index is better for this query?
Put the most selective column first in multi-column indexes. Selectivity
is roughly referring to the order the columns appear in the query.
A B+Tree index can be used for:
- Any subset of columns.
- Right-hand-side wildcard searches (
LIKE 'Levy%'
).
Consider query patterns and column cardinality when designing indexes!
Quoting 1
Which is correct for case-sensitive column names?
In PostgreSQL:
- Double quotes (
"
) are for identifiers (table names, column names) - Single quotes (
'
) are for string literals
Best practices:
- Avoid mixed case or spaces in names
- Use snake_case for identifiers
- Only use quoted identifiers when absolutely necessary
Quoting 2
How do you specify a column name containing a period?
PostgreSQL uses double quotes for identifier quoting:
- Double quotes
"..."
for identifiers (column names, table names) - Single quotes
'...'
for string literals - The form
first.name
is interpreted astable_name.column_name
Without quotes, a period in a column name would be interpreted as a schema/table separator! Additionally, all unquoted identifiers are lower-cased by default.
In order to reduce surprises, it’s best to use snake_case
and avoid using special characters.
Data Sampling
Which is valid syntax to randomly sample 10% of students?
This one is tricky! It features roughly 6 traps, some include:
- High level description + fragments of syntax. Tests how you adapt & assemble incomplete info.
- Requires familiar knowledge of
TABLESAMPLE
syntax, since none of theTABLESAMPLE
options use the correct syntax! (Sinister! I know! Keep going, I believe in you!!! ❤️) - The
ROW_NUMBER() OVER (ORDER BY RAND())
andORDER BY RANDOM() > 10
are more red herrings. They look like they could be correct, but they’re not.RAND()
isn’t a thing. AndRANDOM() > 10
is a boolean expression that will always be false sinceRANDOM()
always is in the range0.0-1.0
. One more problem, theORDER BY RANDOM() > 10
would also sort on a boolean value, not a random number. But that> 10
sure felt like it could be a percentage, right? BERNOULLI
is valid, howeverBERNOULLI(0.10)
is not what we want. It should beBERNOULLI(10)
for 10% sample rate.
As for the TABLESAMPLE
method, PostgreSQL supports two sampling methods:
BERNOULLI
: Each row has equal probabilitySYSTEM
: Block-level sampling (faster but less random)
Example:
Also worth mentioning, TABLESAMPLE
has been reported to not be very random. If you need a more random sample, consider using ORDER BY RANDOM() LIMIT 10
. But be aware that this can be slow on large tables.
There are also some third-party extensions that provide better random sampling methods, but that’s out of scope for this! 😅
Indexing
What does this incantation do?
This trick is called a Partial Index
.
They are useful for reducing disk use & speeding queries when only a subset of rows are typically queried.
Postgres will only use the index when the status = 'active'
condition is present in the query.
Reminder: Postgres doesn’t support ‘query hints’ like other RDBMSs. The query planner will decide when to use any appropriate index(es).
NULL Behavior
How many rows will this return?
This returns 0 because:
NULL = NULL
is never true- Use
IS NULL
orIS NOT NULL
to check for NULL values
Correct query would be:
Performance
What does EXPLAIN ANALYZE
do for UPDATE
statements?
EXPLAIN ANALYZE
actually executes the query! Be careful analyzing a write operation, specifically UPDATE
, DELETE
, and INSERT
statements.
Best practices:
- Use
EXPLAIN
(without ANALYZE) for UPDATE/DELETE/INSERT - Wrap modifications in transactions:
Well done! You went deep on several areas of PostgreSQL! 🐘
I hope you learned something new, or at least got a score to gloat about! 🏆
Want more thrills in life? Check out my Quiz Collection for endless* fun!