Quiz: Deep Postgres: Pt. 2
Are you SQL Steele? Or a SQL Softie?
Part 2 of 2. Back to Part 1
Welcome to Part 2 of my Deep Postres Quiz!
This quiz covers a mix of familiar and lesser-known PostgreSQL features and gotchas: JOIN, NULL, ANSI SQL, Transactions, Internals, Indexing, Data Sampling… and more!
Good luck! 🍀
Which is not ❌ a valid PostgreSQL JOIN expression?
You knew about CROSS JOIN, right?
Did the JOIN ALL get you thinking about another operation, UNION ALL?
The answer here is JOIN ALL ❌.
The correct JOIN types are:
INNER JOIN(default JOIN)LEFT JOIN(orLEFT OUTER JOIN)RIGHT JOIN(orRIGHT OUTER JOIN)FULL JOIN(orFULL OUTER JOIN)CROSS JOIN(the Cartesian product, unless a laterWHEREclause filters it)
What is the (SQL Standard) syntax for an auto-incrementing IDENTITY column in PostgreSQL (v10+)?
While SERIAL is still commonly used, the SQL Standard way is:
id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEYid INT IDENTITY(1,1)is SQL Server syntax.id INT AUTO_INCREMENTis MySQL syntax.id INTEGER AUTO_INCREMENT PRIMARY KEYis MySQL syntax.
This has several advantages over SERIAL:
- It’s SQL Standard compliant
- Better handling of sequence ownership
- Clearer semantics around manual inserts
- Works better with replication
SERIAL is actually just shorthand for creating a sequence and setting a default value, which can lead to issues with dumps and restores.
Which of these PostgreSQL features is NOT ❌ part of the ANSI SQL standard?
Examples of PostgreSQL extensions to ANSI SQL:
RETURNINGclause.ILIKEfor case-insensitive matching.- Array types and operations.
- JSON/JSONB types.
The original ANSI SQL standard includes:
WITH(Common Table Expressions).LIKEfor pattern matching.LATERALjoins.- Window functions.
Except for “Depends on side effects” What happens after an error?
BEGIN;UPDATE students SET status = 'alumni' WHERE grade = 12;UPDATE students SET graduation_date = CURRENT_DATE WHERE grade = 12;UPDATE students SET grade = 'GRADUATED'; -- 🚨 Error: type mismatchCOMMIT;When an error occurs here:
- The entire transaction is rolled back
- No changes are committed
- The transaction is marked as failed
To partially commit:
BEGIN;UPDATE ...;SAVEPOINT my_save;UPDATE ...; -- ErrorROLLBACK TO my_save;COMMIT;Which ON condition can PostgreSQL use for a HASH JOIN?
SELECT *FROM students aJOIN grades bON /* What goes here? ⁉ */;A hash join is possible with an equality (=) condition in the ON clause.
=is the key predicate shape for hash joins>and>=are not hash-join predicatesHASH JOINis not a valid syntax in PostgreSQL
The planner may still choose a nested loop or merge join depending on table size, statistics, indexes, and memory settings.
Which index is better for this query?
-- Query:SELECT * FROM studentsWHERE grade_level = 42 AND last_name IN ('Levy', 'Smith');To me, the surprising thing here is that a multi-column index can be used even when the column order in the query text doesn’t match the index definition.
Note: With B-tree indexes, the leading column matters. Since this query filters by grade_level and last_name, an index beginning with grade_level and then last_name is a good fit.
The faster SQL can select the fewest rows, the better the performance.
For example, if you have a task table, and you know that status is more selective than created_at, you might choose an index beginning with status for a query like status = 'done' AND created_at > '2024-01-01'. The order of conditions in the WHERE clause text is not what matters; the index column order and predicate shapes do.
A B+Tree index can be used for:
- The leading column, or a leftmost prefix of columns.
- Operators such as
=,>,<,BETWEEN, andIN, depending on the query shape. - Prefix matching,
LIKE 'prefix%'.
Thanks to u/mwdb2 for pointing out an earlier mistake in this explanation!
Which is correct for case-sensitive column names?
CREATE TABLE grades ( "Student Name" VARCHAR(100), 'Student Grade' INTEGER);In PostgreSQL:
- Double quotes (
") are for identifiers (table names, column names) - Single quotes (
') are for string literals
Best practices:
- Avoid mixed case or spaces in names
- Use snake_case for identifiers
- Only use quoted identifiers when absolutely necessary
How do you specify a column name containing a period?
CREATE TABLE teachers ( id INT GENERATED BY DEFAULT AS IDENTITY, first.name TEXT, -- How do you include the column: first.name? salary NUMERIC);PostgreSQL uses double quotes for identifier quoting:
- Double quotes
"..."for identifiers (column names, table names) - Single quotes
'...'for string literals - The form
first.nameis interpreted astable_name.column_name
Without quotes, a period in a column name would be interpreted as a schema/table separator! Additionally, all unquoted identifiers are lower-cased by default.
In order to reduce surprises, it’s best to use snake_case and avoid using special characters.
Which is valid syntax to randomly sample 10% of students?
This one is tricky! It features roughly 6 traps, some include:
- High level description + fragments of syntax. Tests how you adapt & assemble incomplete info.
- Requires familiar knowledge of
TABLESAMPLEsyntax, since none of theTABLESAMPLEoptions use the correct syntax! (Sinister! I know! Keep going, I believe in you!!! ❤️) - The
ROW_NUMBER() OVER (ORDER BY RAND())andORDER BY RANDOM() > 10are more red herrings. They look like they could be correct, but they’re not.RAND()isn’t a thing. AndRANDOM() > 10is a boolean expression that will always be false sinceRANDOM()always is in the range0.0-1.0. One more problem, theORDER BY RANDOM() > 10would also sort on a boolean value, not a random number. But that> 10sure felt like it could be a percentage, right? BERNOULLIis valid, howeverSAMPLETABLEis not.WHERE RANDOM() >= 0.1is a trick!RANDOM()returns a value in the range0.0 <= x < 1.0, so>= 0.1would return about 90% of the rows, not 10%.
As for the TABLESAMPLE method, PostgreSQL supports two sampling methods:
BERNOULLI: Each row has equal probabilitySYSTEM: Block-level sampling (faster but less random)
To learn more about sampling, check out this article from Render.
Example:
-- Sample 10% of rowsSELECT * FROM students TABLESAMPLE BERNOULLI (10);
-- For repeatable results, use REPEATABLE:SELECT * FROM studentsTABLESAMPLE BERNOULLI (10) REPEATABLE (42);Also worth mentioning, TABLESAMPLE has been reported to not be very random. If you need a more random fixed-size sample, consider using ORDER BY RANDOM() LIMIT 10. But be aware that this returns 10 rows, not 10%, and can be slow on large tables.
There are also some third-party extensions that provide better random sampling methods, but that’s out of scope for this! 😅
What does this incantation do?
CREATE INDEX idx_active_studentsON students(last_name)WHERE status = 'active';This trick is called a Partial Index.
They are useful for reducing disk use & speeding queries when only a subset of rows are typically queried.
Postgres will only use the index when the status = 'active' condition is present in the query.
Reminder: Postgres doesn’t support ‘query hints’ like other RDBMSs. The query planner will decide when to use any appropriate index(es).
How many rows will this return?
SELECT COUNT(*)FROM studentsWHERE grade = NULL;Ok, this was a little tricky. The gotcha is that x = NULL will never match any rows, and it will ALWAYS return one row with a count of 0.
Since NULL is an unknown value, it can’t be compared to anything, not even itself.
The correct way to check for NULL values is with IS NULL.
Example:
postgres=# SELECT count(*) FROM students WHERE age = null;count------- 0(1 row)
postgres=#postgres=# SELECT count(*) FROM students WHERE age is null;count------- 3(1 row)What does EXPLAIN ANALYZE do for UPDATE statements?
EXPLAIN ANALYZEUPDATE studentsSET grade = grade + 1WHERE status = 'active';EXPLAIN ANALYZE actually executes the query! Be careful analyzing a write operation, specifically UPDATE, DELETE, and INSERT statements. You may have to run cleanup operations after each EXPLAIN ANALYZE to get repeatable results.
Best practices:
- Use
EXPLAIN(without ANALYZE) for UPDATE/DELETE/INSERT - Wrap modifications in transactions:
BEGIN;EXPLAIN ANALYZE ...;ROLLBACK;Well done! You went deep on several areas of PostgreSQL! 🐘
I hope you learned something new, or at least got a score to gloat about! 🏆
Want more thrills in life? Check out my Quiz Collection for endless* fun!