Quiz: Deep Postgres: Pt. 2
Are you SQL Steele? Or a SQL Softie?
data:image/s3,"s3://crabby-images/c3279/c3279ad12bebc623854b20361e343648aebc0883" alt="Quiz: Deep Postgres: Pt. 2"
Part 2 of 2. Back to Part 1
Welcome to Part 2 of my Deep Postres Quiz!
This quiz covers a mix of familiar and lesser-known PostgreSQL features and gotchas: JOIN
, NULL
, ANSI SQL, Transactions, Internals, Indexing, Data Sampling… and more!
Good luck! 🍀
Which is not ❌ a valid PostgreSQL JOIN expression?
You knew about CROSS JOIN
, right?
Did the JOIN ALL
get you thinking about another operation, UNION ALL
?
The answer here is JOIN ALL
❌.
The correct JOIN types are:
INNER JOIN
(default JOIN)LEFT JOIN
(orLEFT OUTER JOIN
)RIGHT JOIN
(orRIGHT OUTER JOIN
)FULL JOIN
(orFULL OUTER JOIN
)CROSS JOIN
(aka “The Self Join,” aka “The Lonely JOIN,” famously known-at-parties as theCartesian product
!)
What is the (SQL Standard) syntax for an auto-incrementing IDENTITY column in PostgreSQL (v10+)?
While SERIAL
is still commonly used, the SQL Standard way is:
id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY
id INT IDENTITY(1,1)
is SQL Server syntax.id INT AUTO_INCREMENT
is MySQL syntax.id INTEGER AUTO_INCREMENT PRIMARY KEY
is MySQL syntax.
This has several advantages over SERIAL
:
- It’s SQL Standard compliant
- Better handling of sequence ownership
- Clearer semantics around manual inserts
- Works better with replication
SERIAL
is actually just shorthand for creating a sequence and setting a default value, which can lead to issues with dumps and restores.
Which of these PostgreSQL features is NOT ❌ part of the ANSI SQL standard?
Examples of PostgreSQL extensions to ANSI SQL:
RETURNING
clause.ILIKE
for case-insensitive matching.- Array types and operations.
- JSON/JSONB types.
The original ANSI SQL standard includes:
WITH
(Common Table Expressions).LIKE
for pattern matching.LATERAL
joins.- Window functions.
Except for “Depends on side effects” What happens after an error?
BEGIN;UPDATE students SET status = 'alumni' WHERE grade = 12;UPDATE students SET graduation_date = CURRENT_DATE WHERE grade = 12;UPDATE students SET grade = 'GRADUATED'; -- 🚨 Error: type mismatchCOMMIT;
When an error occurs here:
- The entire transaction is rolled back
- No changes are committed
- The transaction is marked as failed
To partially commit:
BEGIN;UPDATE ...;SAVEPOINT my_save;UPDATE ...; -- ErrorROLLBACK TO my_save;COMMIT;
Which ON
condition triggers a HASH JOIN
in PostgreSQL?
SELECT *FROM students aJOIN grades bON /* What goes here? ⁉ */;
A hash join is triggered by an equality (=
) condition in the ON
clause.
=
is the most common hash join trigger>
and>=
trigger a merge joinHASH JOIN
is not a valid syntax in PostgreSQL
Hash joins are efficient for large tables, but require enough memory to build the hash table.
Which index is better for this query?
-- Query:SELECT * FROM studentsWHERE grade_level = 42 AND last_name IN ('Levy', 'Smith');
To me, the surprising thing here is that a multi-column index can be used for queries that only use a subset of the columns; and that the order of columns in the query DO NOT need to match the index definition!
Note: For faster queries, put the most selective column(s) first. In this case, grade_level
is more selective than last_name
- meaning it will likely filter out more rows.
The faster SQL can select the fewest rows, the better the performance.
For example, if you have a task
table, and you know that status
is more selective than created_at
, you should put status
first in the query. This will make querying status = 'done' AND created_at > '2024-01-01'
faster than querying created_at > '2024-01-01' AND status = 'done'
.
A B+Tree index can be used for:
- Any subset of columns,
=
,>
,<
,BETWEEN
,IN
. - Prefix matching,
LIKE 'prefix%'
.
Thanks to u/mwdb2 for pointing out an earlier mistake in this explanation!
Which is correct for case-sensitive column names?
CREATE TABLE grades ( "Student Name" VARCHAR(100), 'Student Grade' INTEGER);
In PostgreSQL:
- Double quotes (
"
) are for identifiers (table names, column names) - Single quotes (
'
) are for string literals
Best practices:
- Avoid mixed case or spaces in names
- Use snake_case for identifiers
- Only use quoted identifiers when absolutely necessary
How do you specify a column name containing a period?
CREATE TABLE teachers ( id INT GENERATED BY DEFAULT AS IDENTITY, first.name TEXT, -- How do you include the column: first.name? salary NUMERIC);
PostgreSQL uses double quotes for identifier quoting:
- Double quotes
"..."
for identifiers (column names, table names) - Single quotes
'...'
for string literals - The form
first.name
is interpreted astable_name.column_name
Without quotes, a period in a column name would be interpreted as a schema/table separator! Additionally, all unquoted identifiers are lower-cased by default.
In order to reduce surprises, it’s best to use snake_case
and avoid using special characters.
Which is valid syntax to randomly sample 10% of students?
This one is tricky! It features roughly 6 traps, some include:
- High level description + fragments of syntax. Tests how you adapt & assemble incomplete info.
- Requires familiar knowledge of
TABLESAMPLE
syntax, since none of theTABLESAMPLE
options use the correct syntax! (Sinister! I know! Keep going, I believe in you!!! ❤️) - The
ROW_NUMBER() OVER (ORDER BY RAND())
andORDER BY RANDOM() > 10
are more red herrings. They look like they could be correct, but they’re not.RAND()
isn’t a thing. AndRANDOM() > 10
is a boolean expression that will always be false sinceRANDOM()
always is in the range0.0-1.0
. One more problem, theORDER BY RANDOM() > 10
would also sort on a boolean value, not a random number. But that> 10
sure felt like it could be a percentage, right? BERNOULLI
is valid, howeverSAMPLETABLE
is not.WHERE RANDOM() >= 0.1
is a trick!RANDOM()
returns a value between 0 and 1, so>= 0.1
would return 90% of the rows, not 10%.
As for the TABLESAMPLE
method, PostgreSQL supports two sampling methods:
BERNOULLI
: Each row has equal probabilitySYSTEM
: Block-level sampling (faster but less random)
To learn more about sampling, check out this article from Render.
Example:
-- Sample 10% of rowsSELECT * FROM students TABLESAMPLE BERNOULLI (0.1);
-- For repeatable results, use REPEATABLE:SELECT * FROM studentsTABLESAMPLE BERNOULLI (10) REPEATABLE (42);
Also worth mentioning, TABLESAMPLE
has been reported to not be very random. If you need a more random sample, consider using ORDER BY RANDOM() LIMIT 10
. But be aware that this can be slow on large tables.
There are also some third-party extensions that provide better random sampling methods, but that’s out of scope for this! 😅
What does this incantation do?
CREATE INDEX idx_active_studentsON students(last_name)WHERE status = 'active';
This trick is called a Partial Index
.
They are useful for reducing disk use & speeding queries when only a subset of rows are typically queried.
Postgres will only use the index when the status = 'active'
condition is present in the query.
Reminder: Postgres doesn’t support ‘query hints’ like other RDBMSs. The query planner will decide when to use any appropriate index(es).
How many rows will this return?
SELECT COUNT(*)FROM studentsWHERE grade = NULL;
Ok, this was a little tricky. The gotcha is that x = NULL
will never match any rows, and it will ALWAYS return one row with a count of 0
.
Since NULL
is an unknown value, it can’t be compared to anything, not even itself.
The correct way to check for NULL
values is with IS NULL
.
Example:
postgres=# SELECT count(*) FROM students WHERE age = null;count------- 0(1 row)
postgres=#postgres=# SELECT count(*) FROM students WHERE age is null;count------- 3(1 row)
What does EXPLAIN ANALYZE
do for UPDATE
statements?
EXPLAIN ANALYZEUPDATE studentsSET grade = grade + 1WHERE status = 'active';
EXPLAIN ANALYZE
actually executes the query! Be careful analyzing a write operation, specifically UPDATE
, DELETE
, and INSERT
statements. You may have to run cleanup operations after each EXPLAIN ANALYZE
to get repeatable results.
Best practices:
- Use
EXPLAIN
(without ANALYZE) for UPDATE/DELETE/INSERT - Wrap modifications in transactions:
BEGIN;EXPLAIN ANALYZE ...;ROLLBACK;
Well done! You went deep on several areas of PostgreSQL! 🐘
I hope you learned something new, or at least got a score to gloat about! 🏆
Want more thrills in life? Check out my Quiz Collection for endless* fun!