Dan Levy's Avatar DanLevy.net

Quiz: Deep Postgres: Pt. 2

Are you SQL Steele? Or a SQL Softie?

Quiz: Deep Postgres: Pt. 2

Part 2 of 2. Back to Part 1

Welcome to Part 2 of my Deep Postres Quiz!

This quiz covers a mix of familiar and lesser-known PostgreSQL features and gotchas: JOIN, NULL, ANSI SQL, Transactions, Internals, Indexing, Data Sampling… and more!

Good luck! 🍀

1. 

Warmup: Joins

Which is not ❌ a valid PostgreSQL JOIN expression?

You knew about CROSS JOIN, right?

Did the JOIN ALL get you thinking about another operation, UNION ALL?

The answer here is JOIN ALL ❌.

The correct JOIN types are:

  • INNER JOIN (default JOIN)
  • LEFT JOIN (or LEFT OUTER JOIN)
  • RIGHT JOIN (or RIGHT OUTER JOIN)
  • FULL JOIN (or FULL OUTER JOIN)
  • CROSS JOIN (aka “The Self Join,” aka “The Lonely JOIN,” famously known-at-parties as the Cartesian product!)

2. 

Identity Columns

What is the SQL Standard compliant way to create an auto-incrementing primary key in PostgreSQL 10+?

We’re creating a table to track students:

CREATE TABLE students (
-- Which syntax should we use here? 🤔
name VARCHAR(100) NOT NULL
);

While SERIAL is still commonly used, the SQL Standard way is:

id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY

This has several advantages over SERIAL:

  • It’s SQL Standard compliant
  • Better handling of sequence ownership
  • Clearer semantics around manual inserts
  • Works better with replication

SERIAL is actually just shorthand for creating a sequence and setting a default value, which can lead to issues with dumps and restores.

3. 

ANSI SQL

Which of these PostgreSQL features is NOT ❌ part of the ANSI SQL standard?

Examples of PostgreSQL extensions to ANSI SQL:

  • RETURNING clause.
  • ILIKE for case-insensitive matching.
  • Array types and operations.
  • JSON/JSONB types.

The original ANSI SQL standard includes:

  • WITH (Common Table Expressions).
  • LIKE for pattern matching.
  • LATERAL joins.
  • Window functions.

4. 

Transactions

Except for “Depends on side effects” What happens after an error?

BEGIN;
UPDATE students
SET status = 'alumni'
WHERE grade = 12;
UPDATE students
SET graduation_date = CURRENT_DATE
WHERE grade = 12;
UPDATE students
SET grade = 'GRADUATED'; -- 🚨 Error: type mismatch
COMMIT;

When an error occurs here:

  • The entire transaction is rolled back
  • No changes are committed
  • The transaction is marked as failed

To partially commit:

BEGIN;
UPDATE ...;
SAVEPOINT my_save;
UPDATE ...; -- Error
ROLLBACK TO my_save;
COMMIT;

5. 

SQL Internals

Which ON condition triggers a HASH JOIN in PostgreSQL?

SELECT *
FROM students a
JOIN grades b
ON /* What goes here? ⁉ */;

A hash join is triggered by an equality (=) condition in the ON clause.

  • = is the most common hash join trigger
  • > and >= trigger a merge join
  • HASH JOIN is not a valid syntax in PostgreSQL

Hash joins are efficient for large tables, but require enough memory to build the hash table.

6. 

Indexing

Which index is better for this query?

-- Query:
SELECT * FROM students
WHERE grade_level = 42
AND last_name LIKE 'Levy%';

Put the most selective column first in multi-column indexes. Selectivity is roughly referring to the order the columns appear in the query.

A B+Tree index can be used for:

  • Any subset of columns.
  • Right-hand-side wildcard searches (LIKE 'Levy%').

Consider query patterns and column cardinality when designing indexes!

7. 

Quoting 1

Which is correct for case-sensitive column names?

CREATE TABLE grades (
"Student Name" VARCHAR(100),
'Student Grade' INTEGER
);

In PostgreSQL:

  • Double quotes (") are for identifiers (table names, column names)
  • Single quotes (') are for string literals

Best practices:

  • Avoid mixed case or spaces in names
  • Use snake_case for identifiers
  • Only use quoted identifiers when absolutely necessary

8. 

Quoting 2

How do you specify a column name containing a period?

CREATE TABLE teachers (
id INT GENERATED BY DEFAULT AS IDENTITY,
first.name TEXT,
-- How do you include the column: first.name?
salary NUMERIC
);

PostgreSQL uses double quotes for identifier quoting:

  • Double quotes "..." for identifiers (column names, table names)
  • Single quotes '...' for string literals
  • The form first.name is interpreted as table_name.column_name

Without quotes, a period in a column name would be interpreted as a schema/table separator! Additionally, all unquoted identifiers are lower-cased by default.

In order to reduce surprises, it’s best to use snake_case and avoid using special characters.

9. 

Data Sampling

Which is valid syntax to randomly sample 10% of students?

This one is tricky! It features roughly 6 traps, some include:

  1. High level description + fragments of syntax. Tests how you adapt & assemble incomplete info.
  2. Requires familiar knowledge of TABLESAMPLE syntax, since none of the TABLESAMPLE options use the correct syntax! (Sinister! I know! Keep going, I believe in you!!! ❤️)
  3. The ROW_NUMBER() OVER (ORDER BY RAND()) and ORDER BY RANDOM() > 10 are more red herrings. They look like they could be correct, but they’re not. RAND() isn’t a thing. And RANDOM() > 10 is a boolean expression that will always be false since RANDOM() always is in the range 0.0-1.0. One more problem, the ORDER BY RANDOM() > 10 would also sort on a boolean value, not a random number. But that > 10 sure felt like it could be a percentage, right?
  4. BERNOULLI is valid, however BERNOULLI(0.10) is not what we want. It should be BERNOULLI(10) for 10% sample rate.

As for the TABLESAMPLE method, PostgreSQL supports two sampling methods:

  • BERNOULLI: Each row has equal probability
  • SYSTEM: Block-level sampling (faster but less random)

Example:

-- Sample 10% of rows
SELECT * FROM students TABLESAMPLE BERNOULLI (10);
-- For repeatable results, use REPEATABLE:
SELECT * FROM students
TABLESAMPLE BERNOULLI (10) REPEATABLE (42);

Also worth mentioning, TABLESAMPLE has been reported to not be very random. If you need a more random sample, consider using ORDER BY RANDOM() LIMIT 10. But be aware that this can be slow on large tables.

There are also some third-party extensions that provide better random sampling methods, but that’s out of scope for this! 😅

10. 

Indexing

What does this incantation do?

CREATE INDEX idx_active_students
ON students(last_name)
WHERE status = 'active';

This trick is called a Partial Index.

They are useful for reducing disk use & speeding queries when only a subset of rows are typically queried.

Postgres will only use the index when the status = 'active' condition is present in the query.

Reminder: Postgres doesn’t support ‘query hints’ like other RDBMSs. The query planner will decide when to use any appropriate index(es).

11. 

NULL Behavior

How many rows will this return?

SELECT COUNT(*)
FROM students
WHERE grade = NULL;

This returns 0 because:

  1. NULL = NULL is never true
  2. Use IS NULL or IS NOT NULL to check for NULL values

Correct query would be:

SELECT COUNT(*)
FROM students
WHERE grade IS NULL;

12. 

Performance

What does EXPLAIN ANALYZE do for UPDATE statements?

EXPLAIN ANALYZE
UPDATE students
SET grade = grade + 1
WHERE status = 'active';

EXPLAIN ANALYZE actually executes the query! Be careful analyzing a write operation, specifically UPDATE, DELETE, and INSERT statements.

Best practices:

  • Use EXPLAIN (without ANALYZE) for UPDATE/DELETE/INSERT
  • Wrap modifications in transactions:
BEGIN;
EXPLAIN ANALYZE ...;
ROLLBACK;

Quiz Score:

Congrats! Quiz completed.

Well done! You went deep on several areas of PostgreSQL! 🐘

I hope you learned something new, or at least got a score to gloat about! 🏆

Want more thrills in life? Check out my Quiz Collection for endless* fun!

Edit on GitHubGitHub