SQL Interview Essentials for Product and Data Roles
SQL interviews are less about syntax trivia and more about structured thinking. Interviewers want to see that you can translate an ambiguous business question into a correct metric definition, a clean query plan, and a sanity-checked result. This guide focuses on the patterns that appear repeatedly across product analytics, data engineering, and data science screens — with a practical framework for avoiding the most common failure mode: wrong joins and wrong grain.
What are the highest-frequency SQL patterns in interviews?
Across roles, the same building blocks come back again and again: joins, aggregation, CTEs, window functions, and time-based logic. Interviewers pick these because they test correctness under ambiguity, not memorization.
A strong answer is usually a composition of 2–3 patterns: a CTE to define a clean base table, a join/aggregation to compute the metric, and a window function to rank or compute time deltas.
- Joins + aggregation: revenue per user, orders per region, conversion rate by channel
- Top-N per group: most recent event per user, highest spend per category
- Time windows: 7-day rolling average, week-over-week deltas, cohort retention
- De-duplication: pick first/last row, remove duplicates, handle late arriving data
How do I avoid the #1 SQL interview mistake (wrong grain)?
Grain is what one row represents in a dataset (one user, one order, one event, one session). Most “mysteriously wrong” SQL answers come from mixing grains — for example, joining users (1 row/user) to orders (many rows/user) and then counting users without de-duplicating.
A reliable approach is: define grain first, then define metric, then write the query. Say it out loud in the interview: “I’m going to compute this at the user grain” or “This intermediate result is at the order grain.” Interviewers love this because it prevents accidental double counting.
- Check 1: What is the grain of each table I’m touching?
- Check 2: After this join, did I multiply rows? (many-to-many?)
- Check 3: Should I aggregate before joining, or join before aggregating?
- Check 4: Do I need DISTINCT, ROW_NUMBER, or a pre-aggregation CTE?
When should I use GROUP BY vs window functions?
GROUP BY collapses rows into summaries. Window functions keep the original rows and add analytics columns. The trick is knowing which shape you need downstream.
If you need “top 1 order per user” and also need order details, window functions are usually the cleanest approach. If you only need “total orders per user,” GROUP BY is simpler.
- Use GROUP BY when you want fewer rows (summaries).
- Use window functions when you need rankings, running totals, or ‘pick one row’ logic without losing columns.
Which window functions should I master first (and what are they used for)?
Window functions are one of the highest-signal SQL topics because they test whether you can express complex logic cleanly. Start with a small set and practice them in patterns.
Common uses include: picking the most recent row per entity, calculating deltas between events, and computing running totals without collapsing the dataset.
- ROW_NUMBER(): pick latest row per user, de-dupe by ordering
- RANK() vs DENSE_RANK(): understand ties vs no gaps
- LAG()/LEAD(): compare current row to previous/next (time deltas)
- SUM() OVER (PARTITION BY ... ORDER BY ...): running totals / cumulative metrics
How do I solve cohort retention questions without getting lost?
Retention questions feel hard because the data is wide in time. Reduce complexity by creating two clean tables: (1) cohort assignment (user -> cohort_week) and (2) activity table (user -> activity_week). Then join them and compute week offsets.
The biggest pitfall is cohort leakage: accidentally including users who weren’t in the cohort, or counting activity outside the intended window.
- Step 1: Define cohort anchor event (signup, first purchase, first session).
- Step 2: Define activity event (login, purchase, message, session start).
- Step 3: Convert timestamps to week/month buckets consistently.
- Step 4: Compute week_offset and aggregate retention by offset.
How do I debug a SQL query like an engineer (not a student)?
In interviews, being correct matters — but demonstrating a debugging method matters too. When you get stuck, don’t thrash: validate in small steps.
A simple workflow: validate row counts, validate key uniqueness, validate join multipliers, then validate metric math. This mirrors how you’d debug a production analytics issue.
- Sanity-check counts before and after joins (did rows explode?)
- Check key uniqueness with COUNT(*) vs COUNT(DISTINCT key)
- Inspect a few sample entities (one user, one order_id) end-to-end
- Validate null behavior: COALESCE, NULLIF, filtering before aggregation
What should I practice to get interview-ready in 2 weeks?
Practice beats reading. Focus on small drills that target common patterns. Aim for correctness and explanation, not speed on day 1.
A good practice loop: attempt → compare against expected shape → fix → write a 3-line debrief of the mistake.
- Drill 1: Top 1 row per user by timestamp (ROW_NUMBER + filter via CTE)
- Drill 2: 7-day rolling average (window frame + ORDER BY date)
- Drill 3: Cohort retention by week (cohort CTE + activity CTE + offsets)
- Drill 4: Revenue per user without double counting (pre-aggregate orders then join)
Final Takeaway
Strong SQL candidates are precise communicators. Define the metric, define the grain, then build the query with clean intermediate steps. If you can consistently avoid grain mistakes, rank/filter correctly with window functions, and validate results with sanity checks, you’ll outperform candidates with “more SQL trivia” almost every time.