This article concerns real-time and knowledgeable SQL Scenario-Based Questions 2025. It is drafted with the interview theme in mind to provide maximum support for your interview. Go through these SQL Scenario-Based Questions 2025 to the end, as all scenarios have their importance and learning potential.
To check out other Scenarios Based Questions:- Click Here.
Disclaimer:
These solutions are based on my experience and best effort. Actual results may vary depending on your setup. Codes may need some tweaking.
1) You inherit a reporting query on a 200M-row Orders table that runs for minutes. How do you decide between adding an index, partitioning, or pre-aggregating?
- I start by profiling the access pattern: filters, joins, and grouping columns to see if a targeted composite index would cut most of the cost.
- If the workload is range-based on date or region, I weigh table partitioning to prune scans and simplify maintenance windows.
- When analysts repeat the same rollups, I consider pre-aggregating into summary tables or materialized views to avoid recomputing heavy group-bys.
- I estimate cardinality and selectivity to ensure any new index actually helps rather than bloating writes.
- I check write rates; if inserts/updates are high, too many indexes or tiny partitions can hurt DML performance.
- I validate expected gains with an execution plan and a realistic A/B test on a production-like copy.
- I choose the lightest change that meets the SLA and document the trade-offs for future growth.
2) A product team wants “real-time” dashboards, but your OLTP database is slowing down. How do you balance freshness vs load?
- I clarify “real-time” in seconds/minutes and set a freshness SLA that the business accepts.
- I propose read replicas or a change-data-capture feed so analytics don’t hit the primary.
- I batch micro-loads (e.g., every 1–5 minutes) into a warehouse, trading tiny latency for big stability.
- I cache hot metrics in materialized views that refresh incrementally instead of full rebuilds.
- I rate-limit ad-hoc queries and add guardrails like query timeouts on BI tools.
- I publish a lightweight semantic layer so analysts reuse efficient, vetted queries.
- I monitor replica lag and alert if dashboards drift beyond the agreed freshness window.
3) Your nightly ETL that loads dimension tables occasionally duplicates rows. How do you prevent and detect duplicate keys?
- I define natural vs surrogate keys clearly and enforce unique constraints where possible.
- I build idempotent merges (upsert logic) so reruns don’t create second copies.
- I add staging checksums or hash keys to detect incoming dupes before the final merge.
- I maintain a quarantine table to store and audit rejects for quick triage.
- I schedule data-quality rules (row counts, distinct counts) with thresholds and alerts.
- I keep a rerun-safe ETL design: immutable staging and deterministic merge conditions.
- I report duplicate trends to source owners to fix the problem at the origin.
4) You’re asked to “speed up everything” in a legacy database with unknown hot paths. What’s your first 1–2 weeks plan?
- I enable lightweight query logging/top-N capture to find the true heavy hitters.
- I baseline key KPIs: P95 latency, QPS, CPU, memory, IO, and lock wait times.
- I group costly queries by patterns (filters/joins) and target fixes that benefit many.
- I validate statistics freshness and rebuild only the stale or harmful indexes.
- I separate OLTP vs reporting traffic and add read scaling if needed.
- I adopt a change budget: make small, reversible changes with before/after metrics.
- I publish a simple performance scorecard so stakeholders see progress and trade-offs.
5) A vendor suggests adding many indexes to “fix slow queries.” What risks do you explain before agreeing?
- Extra indexes speed reads but slow writes; high-volume inserts can degrade badly.
- Index bloat consumes storage, memory, and backup time, raising overall costs.
- Too many overlapping indexes confuse the optimizer and complicate maintenance.
- Fragmentation and stats drift can make the initial win disappear later.
- Each index is a contract: you must test, monitor, and periodically prune it.
- I prefer query pattern analysis to consolidate into a few high-impact composite indexes.
- We commit to a review date to remove any index that doesn’t earn its keep.
6) Your team debates normalization vs denormalization for a new feature. How do you decide the right level?
- I align with workload: OLTP favors 3NF for data integrity; analytics leans denormalized for speed.
- I measure read/write ratios and the cost of joins vs the cost of duplicate storage.
- I consider update anomalies—if attributes change often, duplicating them is risky.
- I denormalize only where it removes proven hot joins or reduces complex query logic.
- I protect integrity with constraints or background reconciliation jobs when denormalizing.
- I document the intentional duplication and the refresh rules to avoid drift.
- I re-evaluate after real usage; structure can evolve as access patterns harden.
7) A critical query flips from fast to slow randomly. What intermittent causes do you check first?
- Out-of-date or skewed statistics leading to bad plan choices.
- Parameter sniffing causing the optimizer to reuse a plan unsuitable for other inputs.
- Cache eviction or memory pressure forcing re-compiles and extra IO.
- Locking/blocking from long transactions or noisy neighbors.
- Plan regression after a schema/index change or engine patch.
- Data skew growth—“big customer” or “end-of-month” effects changing selectivity.
- I capture the exact plan when slow, then compare with the fast baseline to pinpoint the shift.
8) Product wants pagination for a heavy listing. Why might OFFSET/LIMIT hurt, and what are better options?
- OFFSET forces the engine to scan and discard many rows as pages grow.
- It leads to inconsistent pages when underlying data changes quickly.
- I prefer keyset pagination (seek method) using a stable, indexed sort column.
- I may cap page depth and provide filters to reduce the working set.
- I add “Load more” UX with last-seen keys to keep queries cheap.
- For analytics, pre-aggregate to smaller result sets before paging.
- I monitor page latency and abandonment to justify further tuning.
9) Your database hits deadlocks after a new feature launch. How do you reduce them without big rewrites?
- I review transaction scopes to keep them short and consistent in order of touching tables.
- I ensure queries access rows in the same key order across the codebase.
- I reduce round trips and avoid user prompts inside open transactions.
- I add the right index to minimize lock footprints on searched rows.
- I retry deadlocked transactions with backoff where business allows.
- I consider a lower isolation or row-versioning if correctness still holds.
- I instrument and publish a deadlock graph to spot the worst offenders.
10) A leadership asks for “zero-downtime” schema changes. What’s realistic and how do you manage risk?
- I explain that most changes can be made online with the right engine features, but not all.
- I rank changes as safe (add nullable column), risky (type change), or invasive (PK rebuild).
- I use backward-compatible migrations: expand, dual-write, verify, then contract later.
- I schedule online operations during low-traffic windows and throttle if needed.
- I have a rollback plan with feature flags or shadow tables.
- I test migration duration on production-scale copies to avoid surprises.
- I communicate a blast radius and a dry-run report so everyone understands trade-offs.
11) You must choose between vertical scaling, read replicas, or sharding. How do you pick?
- If the working set fits in memory and latency is CPU/IO bound, vertical scaling might be simplest.
- If reads dominate and can tolerate slight lag, read replicas soak up traffic well.
- If data size or write throughput exceeds a single node’s limits, sharding becomes necessary.
- I prefer options that minimize app complexity and operational burden.
- I model cost vs headroom for 6–12 months of growth before committing.
- I plan for operational playbooks: failover, rebalance, and consistency checks.
- I pilot with a single critical workload before broad rollout.
12) A compliance team asks for “right to erasure” support. How do you design safe deletions?
- I avoid hard deletes in OLTP paths; I use soft-delete flags and retention policies.
- I separate identifying attributes into isolated tables for targeted purges.
- I propagate deletes to downstream systems via CDC/queues to keep copies consistent.
- I mask or tokenize where business use doesn’t need raw values.
- I validate backups and snapshots respect the retention and purge schedule.
- I log deletions with request IDs for auditability without storing the removed data.
- I test end-to-end to ensure derived aggregates don’t keep re-identifiable remnants.
13) Your warehouse’s materialized views refresh slowly and block users. How do you optimize refresh design?
- I switch to incremental refresh keyed on high-watermark columns like dates.
- I partition both base tables and materializations for parallel, targeted refresh.
- I pre-compute only high-value aggregates; push long-tail queries to ad hoc.
- I schedule refreshes just after upstream loads to reduce overlap and locking.
- I add staleness windows so readers can use slightly older, non-blocking snapshots.
- I track refresh cost per view and retire low-value materializations.
- I communicate freshness SLAs on each dashboard to set clear expectations.
14) Two teams dispute whether to store rich JSON in a relational table. What do you evaluate?
- Query patterns: if access is mostly whole-document, JSON can be fine; selective queries need indexes.
- Schema evolution needs: JSON helps evolve, but you trade strict validation.
- Index support on JSON paths vs the complexity of maintaining them.
- Data quality controls—do we need strong types and constraints upfront?
- Interoperability with downstream tools expecting columns.
- Storage and vacuum/GC implications for large documents.
- I often land core fields as columns and keep volatile extras in JSON.
15) You see growing lock wait times around monthly billing. What short-term mitigations help?
- I spread batch jobs across off-peak windows instead of one big monolith.
- I chunk processing into small, committed batches to release locks faster.
- I ensure hot queries have supporting indexes to avoid large lock footprints.
- I move long reports to replicas or a warehouse during the crunch period.
- I raise statement timeouts to kill runaways and preserve headroom.
- I review isolation levels; snapshot/row-versioning can reduce blocking.
- I add application-level backoff to smooth spikes.
16) Your team wants to adopt window functions widely. What pitfalls do you flag early?
- Without proper partitioning, window frames can explode work and memory.
- ORDER BY on non-selective columns can cause huge sorts and temp spills.
- Some windows are not sargable; indexes may not help like they do for filters.
- I encourage pre-aggregation to shrink inputs before heavy windows.
- I test boundary windows (UNBOUNDED vs range rows) for correctness and cost.
- I watch tempdb/spill metrics and set sensible work-mem limits.
- I provide patterns for ranking, deduping, and time-based gaps that scale.
17) A microservice team wants their own DB for agility. How do you protect cross-service analytics?
- I agree on autonomy but require shared IDs and event schemas for joins later.
- I set standards for CDC or event streams so changes flow to a central lake/warehouse.
- I publish a lightweight contract for slowly changing attributes.
- I define data ownership and SLAs for corrections/backfills.
- I encourage read-only replicas or federated views for emergency cross-reads.
- I keep a catalog so analysts know canonical sources across services.
- I review the total cost of many small databases vs shared ops tooling.
18) You must migrate a core workload from one SQL engine to another. What are your top risk controls?
- I inventory engine differences: data types, null handling, collation, and functions.
- I port critical queries and compare plans/latency with real data samples.
- I run dual-write or shadow read tests to validate correctness under load.
- I freeze risky schema changes during the migration window.
- I set clear RPO/RTO and practice failback to the old system.
- I monitor like-for-like KPIs before cutting over any traffic.
- I schedule phased cutovers starting with low-risk consumers.
19) Leaders want “one golden customer record,” but data arrives from many systems. How do you design identity resolution?
- I define match rules from strict (exact keys) to fuzzy (email/phone/name + geography).
- I track confidence scores and avoid merging when ambiguity is high.
- I store survivorship rules for conflicting attributes and keep source lineage.
- I keep a reversible merge design, allowing future unmerge when mistakes surface.
- I expose a stewardship UI/workflow for human review of edge cases.
- I publish a stable surrogate ID for downstream joins across domains.
- I measure duplicate rate and precision/recall to improve match quality over time.
20) Your backups are successful, but restores are untested. How do you build a trustworthy recovery practice?
- I define RPO/RTO targets and test restores against them quarterly.
- I automate restore drills to a sandbox and verify data integrity checks.
- I validate point-in-time recovery by simulating recent failure scenarios.
- I track full backup size, duration, and restore throughput to predict downtime.
- I store backups in separate fault domains and validate access/permissions.
- I document the runbook with named owners and on-call steps.
- I report recovery test results to leadership to keep investment on track.
21) An exec asks for “consistent analytics numbers” across teams. How do you fix metric drift?
- I define canonical metrics with formulas, filters, and time windows in a central spec.
- I implement metrics in one shared semantic layer instead of ad-hoc SQL scattered around.
- I tag each dashboard with the metric version to spot drift immediately.
- I add data-quality checks for late data and backfills that change denominators.
- I build certification badges so users prefer vetted datasets.
- I run periodic recon jobs to compare team-specific outputs vs the canonical ones.
- I keep a change log so stakeholders know when a metric definition changed.
22) A batch job times out due to a big JOIN between a fact table and a slowly changing dimension. What’s your approach?
- I check if the join keys are clean, typed the same, and indexed appropriately.
- I pre-filter the fact to the minimal slice before joining.
- I replace row-by-row lookups with a set-based merge when possible.
- I denormalize stable attributes into the fact if they’re used constantly.
- I consider snapshotting the dimension to match the fact’s load window.
- I partition both sides by date/tenant to improve locality.
- I capture the plan to confirm we’re avoiding massive broadcasts or cartesian joins.
23) Your team debates using query hints to force plans. When, if ever, do you allow them?
- Hints are last-resort seatbelts for known optimizer blind spots.
- I use them only with documented evidence and plan drift history.
- I prefer statistics fixes, indexes, or rewriting the query first.
- I scope hints narrowly and review them after data/engine upgrades.
- I add comments explaining why the hint exists and the removal criteria.
- I monitor hinted queries to ensure they still win over time.
- I keep a small, controlled list rather than normalizing hints across the codebase.
24) A tenant-heavy SaaS system has noisy neighbors. How do you isolate and protect shared resources?
- I partition data and indexes by tenant where feasible to keep scans localized.
- I add resource governance: connection caps, query timeouts, and workload groups.
- I route read-heavy tenants to replicas and keep OLTP tenants on primaries.
- I apply per-tenant caching and rate limiting on hot endpoints.
- I monitor top tenants by CPU/IO and engage them before they grow too hot.
- I consider sharding by tenant once a subset dominates the workload.
- I publish fair-use policies so customers know the rules ahead of time.
25) Your analytics team loves CTEs; DBAs prefer temp tables. How do you choose pragmatically?
- If reused multiple times, temp tables can save recomputation and improve plans.
- For single-use pipelines, CTEs keep logic readable without storage overhead.
- I weigh statistics: temp tables can have stats; CTEs depend on inlining and may misestimate.
- I check concurrency impact—temp tables may add tempdb pressure under heavy load.
- I benchmark both patterns on production-like data before standardizing.
- I document team-wide guidance with a few clear, tested examples.
- I stay flexible; different engines optimize these constructs differently.
26) A BI tool issues “SELECT *” everywhere. What’s the real impact and how do you change it?
- It pulls unnecessary data, inflating IO and network costs.
- It can break silently when columns are added or reordered.
- It bypasses indexes tuned for narrow projections, slowing queries.
- I enforce column whitelists via views or a semantic layer.
- I add lints or query governors to reject SELECT *.
- I educate analysts with before/after performance examples.
- I measure saved bytes and time to celebrate the change.
27) You notice frequent plan cache evictions. What knobs and practices help stability?
- I ensure statistics are fresh and representative to reduce re-compiles.
- I parameterize queries correctly to avoid unique plan per literal.
- I limit ad-hoc generated SQL by using prepared statements in the app.
- I cap max memory per query to reduce large plan spills.
- I watch cache size and bump it only if evidence shows memory pressure.
- I track plan reuse rate and chase the patterns with lowest reuse first.
- I pin only the rare, mission-critical plans with caution.
28) You’re designing time-series storage for telemetry. What choices matter most?
- Partitioning by time for fast retention management and pruning.
- Sparse, narrow tables vs wide rows depending on read patterns.
- Compression and columnstore features for scan-heavy analytics.
- Downsampling old data to keep costs in check without losing trends.
- Hot vs cold storage tiers to match access frequency.
- A write path that handles bursts without locking storms.
- Clear retention policy so growth doesn’t surprise the business.
29) A merge job causes massive temp spills. How do you reduce memory pressure?
- I ensure join keys are the same type and indexed to avoid big sorts.
- I break the job into sorted chunks that merge efficiently.
- I pre-aggregate to shrink the dataset before the final join.
- I increase work-mem only after confirming algorithmic fixes.
- I offload cold columns to a later enrichment step.
- I monitor spill metrics and iterate until spills drop near zero.
- I schedule the job away from other memory-intensive tasks.
30) An auditor asks for end-to-end lineage of a KPI. How do you make it reliable and usable?
- I map the KPI from report back to source tables, including transforms and filters.
- I store lineage metadata in a catalog with dataset owners and refresh cadences.
- I attach tests to each hop: row counts, ranges, and referential checks.
- I link dashboards to certified datasets and version the KPI definition.
- I build a simple UI so analysts can self-serve lineage questions.
- I keep lineage updated via CI/CD so it evolves with the code.
- I review lineage quarterly with business owners for accuracy.
31) Your team wants to add a text search feature to SQL. What do you consider first?
- Whether native full-text indexes meet relevance needs or we need a search engine.
- The update frequency and index refresh cost for near-real-time search.
- Ranking and linguistic features that vanilla SQL may not handle well.
- Operational complexity of running external search vs staying inside the DB.
- Data duplication risks and sync guarantees between systems.
- Security alignment so search honors row-level access rules.
- I prototype both paths with real queries and judge by quality vs cost.
32) You’re asked to enforce row-level access for analytics. How do you design it without killing performance?
- I prefer security-filtered views using indexed predicates on tenant/region keys.
- I cache user-to-filter mappings to avoid repeated lookups.
- I apply predicate pushdown early so engines scan less.
- I avoid complex UDFs in filters; they block index usage.
- I pre-aggregate data by access domains when feasible.
- I test with worst-case users who see the most rows.
- I document the model so BI tools don’t bypass it accidentally.
33) A vendor proposes “select count(*)” for health checks every 5 seconds. What’s a safer alternative?
- I suggest a lightweight, indexed existence probe on a tiny heartbeat table.
- I cap check frequency and add jitter to avoid synchronized spikes.
- I use connection pool pings instead of heavy queries.
- I set strict timeouts so health checks don’t pile up.
- I monitor the overhead and prove the win with metrics.
- I run checks on replicas where possible, not primaries.
- I ensure checks don’t participate in transactions.
34) Your team wants “one big table” for analytics simplicity. What trade-offs do you explain?
- Simpler queries but higher duplication and update anomalies.
- Slower writes and bulky storage without column pruning.
- Harder governance: unclear ownership and quality controls.
- Indexing becomes tricky when all access patterns live together.
- I propose a star schema or a curated layer that balances simplicity and performance.
- I quantify cost by comparing typical queries on both designs.
- I evolve incrementally, not a big-bang redesign.
35) An ML team needs features from multiple sources daily. How do you guarantee consistent cuts?
- I define a daily snapshot time with late-arriving data policy.
- I materialize feature tables with versioned timestamps.
- I track training vs serving schemas to avoid training-serving skew.
- I add data-quality gates on missingness and drift.
- I backfill consistently by re-running the same cut logic.
- I surface lineage so experiments are reproducible.
- I pre-compute heavy features to keep serving latency low.
36) The app uses UUIDs as primary keys and performance drops. How do you respond?
- Random UUIDs fragment clustered indexes and hurt inserts.
- I switch to ordered identifiers (e.g., ULIDs/time-based) or a surrogate sequence.
- If change is hard, I decouple: keep UUID externally, use an internal numeric key.
- I verify foreign keys and joins still work with the new key strategy.
- I test write latency and page splits before and after to show gains.
- I plan a careful migration with dual keys for a period.
- I document the rationale so we don’t re-introduce the same issue.
37) Your OLTP DB hosts long-running reports. What containment strategies work?
- Move reports to a read replica or warehouse designed for scans.
- Add query governors: max runtime, row limits, and concurrency caps.
- Provide curated reporting views with pre-joins and narrow columns.
- Schedule heavy jobs off-peak and isolate them in separate pools.
- Educate teams on using extracts instead of live hits for huge pulls.
- Track top costly reports and partner with owners to optimize.
- Celebrate wins with before/after latency and load graphs.
38) A table grows by 5% daily. How do you keep maintenance predictable?
- Partition by time for painless purges and maintenance on active slices only.
- Automate index maintenance based on real fragmentation thresholds.
- Compress colder partitions to save space and IO.
- Monitor growth vs storage headroom with alerts.
- Cap ad-hoc query costs to protect OLTP during growth spikes.
- Re-forecast hardware needs quarterly using real trends.
- Document retention policies so everyone understands the plan.
39) Analysts ask for near-duplicate detection (e.g., similar product names). How do you approach it in SQL?
- I define “similar” explicitly: exact, phonetic, or distance-based matching.
- I try deterministic cleanup first (trim, case, standard forms).
- I use helper keys (soundex/metaphone) or trigram indexes where supported.
- I bound candidate pairs with blocking keys to avoid N² comparisons.
- I quarantine and review borderline matches before merging.
- I measure precision/recall and tune thresholds iteratively.
- I document limits—no method is perfect; stewardship is part of the design.
40) You need to enforce business-hour SLAs in queries (e.g., turnaround in working hours). How do you model time?
- I keep a calendar table with holidays and working intervals.
- I compute business durations via joins, not ad-hoc date math in every query.
- I pre-calculate common intervals to keep reports fast.
- I support region-specific calendars for global teams.
- I validate edge cases like overnight shifts and DST changes.
- I expose a simple function/view for analysts to reuse safely.
- I test outputs against real tickets to build trust.
41) A critical KPI is off by 2–3% intermittently. Where do you look first?
- Late-arriving data and out-of-order events changing counts after the fact.
- Double-counting due to joins that multiply rows silently.
- Filters drifting between teams—slightly different WHERE clauses.
- Time zone conversions or daylight saving mismatches.
- Distinct vs count of keys when deduping isn’t consistent.
- Backfills that didn’t re-compute downstream aggregates.
- I add reconciliation jobs and unit tests at key hops.
42) You suspect parameter sniffing hurts a stored query. What mitigations work without full rewrites?
- Use recompile for truly skewed one-off calls, sparingly.
- Add a better index that narrows the plan choice across inputs.
- Introduce plan guides or option hints only with clear justification.
- Use local variables to decouple compile-time assumptions when acceptable.
- Split paths for small vs large inputs when the distribution is bimodal.
- Refresh stats with full-scan where sampling misses skew.
- Capture before/after performance to prove the fix.
43) The business wants a soft-delete feature. What design avoids messy queries?
- Add a clear IsDeleted flag and filtered unique indexes to ignore deleted rows.
- Use views that filter out deleted rows by default for most consumers.
- Enforce deletes via stored procedures to centralize logic.
- Archive details to a separate table/partition for compliance.
- Periodically hard-purge truly expired rows to control bloat.
- Ensure foreign keys and constraints still behave predictably.
- Provide an “include deleted” path only for admins.
44) You must calculate rolling 12-month metrics efficiently. What’s your plan?
- Pre-aggregate daily facts so rolling windows scan fewer rows.
- Partition by date and prune aggressively in queries.
- Use window functions with proper partition and frame definitions.
- Materialize common windows (30/90/365 days) when heavily reused.
- Validate differences between calendar vs fiscal periods.
- Cache the latest window and refresh incrementally each day.
- Compare results with a slow but simple reference to ensure correctness.
45) A data feed sometimes reprocesses yesterday’s files. How do you stay idempotent?
- I design merges keyed by natural/business keys plus a load date.
- I reject or overwrite duplicates based on a deterministic rule.
- I compute row hashes and update only when content changes.
- I log file manifests with checksums to spot repeats.
- I separate staging (immutable) from final (mergeable) layers.
- I write rerun-safe jobs—no deletes based only on today’s file.
- I alert on unexpected replays to fix the upstream schedule.
46) A client asks for cross-region failover. What DB considerations do you raise?
- Replication lag and the impact on read-your-writes guarantees.
- Conflict resolution if writes can occur in multiple regions.
- Data residency and compliance for specific countries.
- Network costs and latency for chatty OLTP patterns.
- Runbook complexity for failover/failback and regular drills.
- Testing under chaos scenarios, not just happy paths.
- A realistic RTO/RPO that the business funds and accepts.
47) Your BI queries timeout when users add many filters. How do you keep them responsive?
- I provide curated, indexed dimensions and avoid free-text on huge columns.
- I push selective filters first and limit cardinality explosions.
- I pre-compute common combinations in aggregates.
- I set sensible row limits and progressive reveal (top-N then drill-down).
- I cache stable dimensions in the BI layer.
- I analyze the worst dashboards and coach owners to simplify.
- I enable query watchdogs to abort runaway requests gracefully.
48) A table’s foreign keys were disabled “for speed.” What’s your remediation plan?
- I assess data damage: orphan counts and incorrect references.
- I backfill and correct with controlled repair scripts.
- I re-enable FK constraints with not-for-replication/online options if available.
- I add tests in CI/CD so future migrations can’t disable them silently.
- I measure net impact—constraints often save more than they cost.
- I communicate why integrity rules prevent downstream disasters.
- I monitor for violation attempts to catch app bugs early.
49) Stakeholders want to log everything for audits. How do you avoid a write-path meltdown?
- I scope audit to high-risk tables and critical columns only.
- I batch or stream audit writes asynchronously when safe.
- I use concise, structured formats with retention and tiering.
- I separate hot OLTP tables from audit tables to reduce contention.
- I sample low-value events instead of full capture.
- I monitor audit growth and enforce purge policies.
- I prove compliance needs are met without drowning the system.
50) Your team wants to toggle features by config stored in SQL. What patterns keep it safe?
- I store flags in a small, indexed table and cache aggressively in the app.
- I include rollout metadata: owner, expiry date, and target cohort.
- I avoid reading flags inside long transactions to reduce blocking.
- I log flag evaluations for debugging unusual behavior.
- I restrict who can change flags and require approvals for risky ones.
- I clean up retired flags to avoid permanent complexity creep.
- I document the intended behavior and fallback for each flag.
51) A partner API delivers late and sometimes out-of-order events. How do you model correctness?
- I accept events with event_time and process_time, not just arrival order.
- I maintain idempotent merges keyed by natural IDs and version numbers.
- I apply watermarks and hold-back windows before finalizing aggregates.
- I support backfills and retractions to fix historical metrics.
- I expose event lineage so consumers know the data’s confidence.
- I test edge cases with simulated late events.
- I set SLAs with the partner to reduce surprises.
52) Product asks for case-insensitive search everywhere. What design avoids surprises?
- I standardize collations or store normalized (lower-cased) search keys.
- I index the normalized forms to keep queries sargable.
- I document how accents and locales are handled.
- I test sort orders and comparisons across languages.
- I avoid hidden implicit conversions that kill indexes.
- I provide language-aware search only where it truly matters.
- I validate results with user acceptance tests in key locales.
53) You need to track slowly changing attributes like customer tier. Which approach fits most apps?
- For auditability, SCD-Type 2 gives history with start/end timestamps.
- If only the latest value matters, Type 1 keeps it simple and fast.
- I avoid mixing types in the same table without clear rules.
- I expose a “current snapshot” view for easy consumption.
- I index on current_flag and date ranges for efficient lookups.
- I budget storage for history growth and prune where allowed.
- I choose the simplest type that meets regulatory and reporting needs.
54) Your app sees connection storms during traffic spikes. How do you stabilize?
- I use a connection pool with sane caps per app instance.
- I add exponential backoff and circuit breakers on failures.
- I prefer fewer, long-lived connections over many short ones.
- I tune server limits and queueing to avoid thundering herds.
- I pre-warm pools before known peaks.
- I monitor wait times and pool saturation to scale early.
- I coordinate releases to avoid cold-start pileups.
55) Finance asks for “as-of” reporting (state at a past timestamp). How do you implement it?
- I keep valid_from/valid_to columns or system-versioned tables.
- I guarantee uniqueness per timeline so lookups are deterministic.
- I expose a parameterized view to query at time T.
- I snapshot critical aggregates daily to speed common as-of dates.
- I set retention and archiving for old versions.
- I validate results against a trusted monthly close.
- I educate users on the difference between as-of and current state.
56) A hot table mixes OLTP writes and analytical scans. How do you reduce contention?
- I separate concerns: move analytics to replicas or a warehouse.
- I use covering indexes for hot transactional paths.
- I apply row-versioning to reduce reader/writer blocking when supported.
- I consider columnstore on a copy for analytics, not on the OLTP table.
- I throttle heavy scans and add off-peak windows.
- I cache or precompute analytics for the most requested slices.
- I track blocking graphs to target the worst offenders.
57) A junior proposes triggers for many side effects. When are triggers acceptable?
- I allow them for strict integrity/enforcement that must be close to the data.
- I avoid triggers for business workflows that add hidden coupling.
- I keep them small, deterministic, and well-documented.
- I measure their overhead under peak write load.
- I prefer explicit app-layer actions or CDC for downstream effects.
- I add tests to prevent recursion and unintended cascades.
- I audit changes so debugging isn’t guesswork.
58) You’re asked to compress data to cut costs. What do you check before flipping the switch?
- Read vs write profile—compression helps scans, can slow heavy writes.
- Data types and patterns—repetitive columns compress better.
- CPU headroom to pay the compression/decompression cost.
- Impact on index size and maintenance jobs.
- Storage tiering; cold partitions may benefit most.
- Real benchmarks on production-like data, not guesses.
- A staged rollout with rollback if latency spikes.
59) A vendor table arrives with inconsistent time zones. How do you enforce a standard?
- I store timestamps in UTC with explicit time zone columns for display.
- I convert at ingestion with a clear, audited mapping.
- I validate offsets and reject impossible times (DST gaps).
- I expose helper views that handle conversion for consumers.
- I add tests for known tricky dates to avoid regressions.
- I educate teams to never compare raw local times across regions.
- I monitor for new offending sources and fix at the boundary.
60) Leadership wants a monthly “DB health” score. What should it include to drive action?
- P95/99 latencies for top 20 queries and their trend.
- Index effectiveness: usage vs size, and candidates for drop.
- Blocking/deadlock incidents and mean time to resolve.
- Stats freshness and percent of tables with drift.
- Backup/restore drill results vs RPO/RTO targets.
- Capacity trends: storage, CPU, memory, and replica lag.
- A prioritized, owner-assigned improvement backlog with due dates.