SQL Interview Questions 2025 - Crack IT Interview

This article concerns real-time and knowledgeable SQL Interview Questions 2025. It is drafted with the interview theme in mind to provide maximum support for your interview. Go through these SQL interview Questions to the end, as all scenarios have their importance and learning potential.

To check out other interview Questions:- Click Here.

Disclaimer:
These solutions are based on my experience and best effort. Actual results may vary depending on your setup. Codes may need some tweaking.

1) What is the real business value of normalization and when would you deliberately denormalize?

Normalization reduces duplicate data and update effort across teams.
It lowers storage and improves consistency for reporting.
Joins can get heavy, so read‑paths may slow at scale.
Denormalize when read speed matters more than write simplicity.
Use it for dashboards, feed pages, or high‑traffic lookups.
Keep a single source of truth and document the duplication.
Add checks or jobs to reconcile drift between copies.

2) How do you decide between a natural key and a surrogate key for a table?

Natural keys feel intuitive but can change in the real world.
Surrogate keys give stable joins and simpler migrations.
Natural keys reduce one column but can bloat joins if wide.
Surrogate keys help with privacy and masking strategies.
For integration, keep natural key as a unique alternate key.
Choose surrogate primary key; index the natural key for searches.
Revisit if partner systems enforce a natural key contract.

3) When do indexes hurt more than they help?

On heavy write tables where each insert must update many indexes.
When index selectivity is low and scans are cheaper.
With overlapping indexes that duplicate work.
When statistics are stale causing poor plans.
On frequently rebuilt or churned staging data.
When memory is limited, leading to cache churn.
If queries rarely use the indexed columns in predicates.

4) How do you explain “sargability” to a non‑DB stakeholder?

It means the query lets the index actually do its job.
Functions on columns often block index usage.
Range, equality, and proper data types keep it sargable.
Sargable queries cut CPU and IO costs directly.
It’s cheaper than buying bigger hardware.
A single rewrite can accelerate many reports.
Measure before/after with the same parameters.

5) What are the trade‑offs of using wide composite indexes?

Faster reads when the full pattern matches.
Risk of write overhead and larger index size.
Column order matters for filtering and sorting.
Too many included columns waste memory.
Might mask the need for query refactoring.
Great for hot, stable access paths.
Reconfirm usefulness after major releases.

6) How do isolation levels impact real users and SLAs?

Higher isolation protects correctness but can block.
Lower isolation improves concurrency with some risk.
Read Committed is a balanced default for OLTP.
Snapshot/MVCC reduces blocking but needs storage.
Serializable gives strongest guarantees with cost.
Pick per workload, not globally for everything.
Track wait times in SLO dashboards.

7) How do you prevent and resolve deadlocks in a busy system?

Keep transactions short and focused.
Touch objects in a consistent order across code.
Use proper indexes to shorten lock durations.
Break big writes into smaller, predictable batches.
Capture deadlock graphs and fix the root cause.
Consider retry logic for idempotent operations.
Educate teams on transaction boundaries.

8) What’s the practical difference between IN, EXISTS, and JOIN for filtering?

EXISTS checks for presence, often optimizer‑friendly.
IN works well for small, fixed lists.
JOINs can multiply rows if not carefully written.
EXISTS usually stops early once one match is found.
Choose the shape that reads simplest for maintainers.
Test with realistic data volumes and skew.
Confirm plan stability under parameter changes.

9) When would DISTINCT be the wrong tool?

When duplicates come from a missing join condition.
When GROUP BY with aggregates communicates intent better.
When a unique index can prevent duplicates at source.
DISTINCT hides data quality issues temporarily.
It can add unnecessary sorts and memory pressure.
Fix the model or join, then remove DISTINCT.
Keep a data quality alert to catch regressions.

10) Why do query plans change, and how do you manage that risk?

Statistics updates change cardinality estimates.
Parameter sniffing shifts plan choices.
Schema or index changes alter costs.
Engine upgrades tweak optimizer rules.
Use baselines or hints cautiously when needed.
Monitor plan regressions after deployments.
Keep a rollback path ready for hot fixes.

11) What is parameter sniffing and when is it harmful?

Engine builds a plan using the first parameter values.
Skewed data makes a one‑size plan suboptimal.
“Tiny input fast, big input slow” symptoms appear.
Options include recompile, hints, or plan guides.
Sometimes a rewrite into two paths is cleaner.
Validate across production‑like data slices.
Watch CPU spikes aligned with specific values.

12) How do you choose between a CTE and a temporary table?

CTE is for readability and single‑pass logic.
Temp table can persist intermediate results.
Temp tables help when you reuse data multiple times.
CTEs don’t materialize by default; plans may repeat work.
Temp tables can enable better stats for next steps.
Consider memory, logging, and contention costs.
Profile both if the query runs often.

13) What’s your approach to pagination at scale?

Avoid OFFSET with huge skips; it becomes slower later pages.
Prefer keyset pagination using a stable sort column.
Return lightweight payloads and precompute counts.
Cache page windows that users revisit often.
Expose next/prev cursors to the client.
Keep indexes aligned with the sort/filter fields.
Load test jump‑to‑page patterns from real behavior.

14) How do you think about NULLs in design and reporting?

NULL means “unknown,” not zero or empty string.
Too many NULLable columns hint at model smells.
Aggregations with NULLs can surprise stakeholders.
Make defaults explicit for business rules.
Use CHECK constraints to limit ambiguous states.
Educate analysts to handle NULLs consistently.
Document what “unknown” practically implies.

15) What’s the business impact of poor collation choices?

Sorting and comparisons may differ by locale.
Case and accent sensitivity affect search behavior.
Inconsistent collations break cross‑database joins.
Unicode storage needs should be planned early.
Choose a standard per organization and enforce it.
Test search features with real language samples.
Align with product markets and legal names.

16) How do you evaluate partitioning as a scaling strategy?

Great for large tables with time‑based access.
Helps with maintenance tasks like purge or rebuild.
Can prune scans and reduce IO when aligned to queries.
Comes with complexity in keys and data movement.
Over‑partitioning adds overhead and small files.
Monitor skew and hot partitions regularly.
Combine with indexing that matches partition key.

17) When is sharding justified over vertical scaling?

When a single node can’t meet storage or throughput.
When latency is bound by geographic distance.
When team growth needs independent deployments.
Sharding complicates joins and transactions.
Good fit for tenant‑isolated SaaS models.
Requires solid routing and rebalancing tools.
Start with clear shard key and growth plan.

18) How do you set RPO and RTO targets with stakeholders?

Translate data loss tolerance into minutes or hours.
Map recovery time to actual downtime cost.
Choose backups, replicas, and failover accordingly.
Test restores, not just backup completion.
Keep runbooks version‑controlled and rehearsed.
Align budgets to the promised protection level.
Report on drills like any other KPI.

19) What’s your philosophy on backup verification?

A backup isn’t real until it’s restored.
Automate periodic restore tests in lower envs.
Validate row counts and key business checks.
Time the recovery to check RTO feasibility.
Keep offsite copies and immutable storage where possible.
Document restore procedures step by step.
Track failures and tune frequency.

20) How do you decide between read replicas and caching for scale?

Replicas offload reads but add replication lag.
Caches are faster but risk staleness.
Use replicas for consistent analytical reads.
Use cache for hot, simple key lookups.
Invalidate cache on critical updates.
Monitor hit‑rate and replica latency.
Combine both for high‑traffic endpoints.

21) What are common anti‑patterns in reporting queries?

SELECT * against huge fact tables.
Unbounded date ranges hurting partitions.
DISTINCT used to hide modeling issues.
Scalar functions wrapping indexed columns.
Mixing OLTP and OLAP on the same database.
Missing filters for required dimensions.
Lack of summary tables for heavy dashboards.

22) How do you explain OLTP vs OLAP to a product manager?

OLTP is for quick, reliable transactions.
OLAP answers complex questions over large history.
Mixing them slows both and hurts SLAs.
Design paths for each workload separately.
Use ETL/ELT to feed analytics safely.
SLA and schema choices differ by goal.
Plan resource isolation between them.

23) What makes a good star schema in practice?

Clear business facts and conformed dimensions.
Narrow, numeric measures for performance.
Surrogate keys that are stable over time.
Slowly changing dimensions handled explicitly.
Grain is documented and consistent everywhere.
Indexing supports common slice‑and‑dice paths.
Naming matches business language, not IT jargon.

24) How do you handle Slowly Changing Dimensions (conceptually)?

Agree on when history matters to the business.
Type 1 overwrites; Type 2 tracks history by rows.
Use surrogate keys to separate versions cleanly.
Keep effective dates and current‑row flags.
Manage storage and query complexity honestly.
Train analysts on how to join to the right row.
Audit changes for compliance questions.

25) What’s your stance on constraints vs application checks?

Constraints are the first line of defense on data.
Apps can miss checks under load or bugs.
Constraints keep integrity during ad‑hoc loads.
They document business rules right in the schema.
Too many complex checks may slow writes.
Keep constraints simple and essential.
Pair with monitoring for failed attempts.

26) How do you approach soft delete vs hard delete?

Soft delete preserves history and recovery options.
Hard delete saves space and protects privacy.
Soft delete needs filtered unique indexes.
Queries must exclude deleted rows by default.
Hard delete requires careful backup retention.
Choose per table based on legal and business needs.
Communicate behavior to every consumer.

27) What are practical steps to avoid hot spots?

Avoid monotonically increasing clustered keys.
Distribute writes across partitions or shards.
Batch writes in predictable, smaller chunks.
Use randomization only when reads allow it.
Monitor page splits and latch waits.
Revisit index fillfactor and autogrowth.
Test with production‑like concurrency.

28) How do you think about statistics maintenance?

Good stats fuel accurate cardinality estimates.
Auto updates may lag on big, uneven tables.
Schedule targeted refresh for critical objects.
Watch for ascending keys and skewed distributions.
Validate impact on CPU and plans after refresh.
Tie jobs to release windows when possible.
Keep an audit of stats changes and outcomes.

29) When is a materialized view worth it?

For heavy aggregations needed frequently.
When precompute beats runtime cost by a lot.
If data freshness can be slightly delayed.
When indexes on the view target hot queries.
Monitor refresh times and blocking impact.
Document staleness behavior to stakeholders.
Reassess as workload patterns evolve.

30) What makes an effective indexing strategy for a new feature?

Start from the top three critical queries.
Design for predicates, joins, and sort order.
Prefer narrow keys and selective columns.
Avoid duplicates across existing indexes.
Measure before/after with production‑like data.
Recheck after a month of real usage.
Clean up unused indexes proactively.

31) How do you secure data at rest and in transit conceptually?

Encrypt storage with database‑level features.
Use TLS for connections end to end.
Limit who holds keys and rotate regularly.
Don’t expose secrets in application logs.
Apply least privilege on every role.
Monitor access and alert on anomalies.
Test restores with encrypted backups.

32) What is row‑level security and when is it essential?

It filters rows based on user identity or role.
Essential in multi‑tenant or sensitive datasets.
Reduces app logic and centralizes policy.
Adds planning for predicate performance.
Requires strict auditing of policy changes.
Keep emergency bypass roles tightly controlled.
Test with complex joins and subqueries.

33) What is the business risk of over‑privileged database roles?

Accidental data leaks from broad SELECTs.
Costly writes or deletes by mistake.
Harder audits and failed compliance checks.
Insider threat surface increases.
Slower incident response due to ambiguity.
Principle of least privilege reduces blast radius.
Review access quarterly with owners.

34) How do you explain SQL injection risk without code?

It happens when user input is treated as commands.
Attackers can read or change unintended data.
Parameterization and allow‑lists mitigate it.
Logging and alerts help catch attempts early.
Security tests should mirror real inputs.
Shared libraries enforce safer patterns.
Education is cheaper than breach recovery.

35) What’s your approach to data masking for non‑prod?

Replace direct identifiers with realistic fakes.
Keep referential integrity intact across tables.
Mask consistently so joins still work.
Exclude fields truly needed for testing.
Log transformations for traceability.
Automate refresh from trusted pipelines.
Revalidate before each major release.

36) How do you decide between stored procedures and ad‑hoc queries?

Procedures centralize logic and permissions.
They improve plan stability and auditing.
Ad‑hoc is flexible but hard to govern.
For hot paths, procedures usually win.
For exploration, ad‑hoc is acceptable.
Balance developer speed with production safety.
Keep procedures small and testable.

37) What’s your stance on ORMs from the database side?

Great for productivity on simple CRUD.
Can produce inefficient queries at scale.
Encourage use of query hints only when needed.
Expose views or procedures for complex paths.
Review generated SQL in performance reviews.
Provide samples of sargable filters.
Pair program to fix worst offenders.

38) How do you plan schema changes with zero or minimal downtime?

Backward‑compatible changes first.
Dual‑write or backfill strategies for new columns.
Feature flags to switch reads gradually.
Online index rebuilds where supported.
Strict rollout order across services.
Rollback script prepared in advance.
Post‑deploy monitoring for errors and waits.

39) What’s the impact of time zones and clocks on data quality?

Mixed time zones cause reporting confusion.
Store in UTC; convert at the edge.
Keep user‑visible time zone preferences.
Daylight savings can break naive logic.
Audit logs must include time zone context.
Agree on SLA timestamps in UTC.
Test around DST transitions deliberately.

40) How do you approach numeric precision decisions?

Choose DECIMAL for money to avoid rounding.
Floats are fine for scientific ranges.
Wrong choice leads to audit headaches.
Align precision with invoicing rules.
Validate with real ledger scenarios.
Indexing and storage size also matter.
Document assumptions in the model.

41) What’s your strategy for batch vs row‑by‑row processing?

Batches reduce round trips and locks.
Row‑by‑row is simpler but slower overall.
Size batches to fit log and memory limits.
Use staging tables for complex merges.
Capture per‑batch checkpoints for retries.
Monitor deadlocks and adjust sizes.
Keep business idempotency in mind.

42) How do you safely ingest third‑party files into SQL?

Validate schema, types, and expected ranges.
Land raw data unchanged for lineage.
Stage cleansed data with strict constraints.
Reject and log bad rows separately.
Never grant direct write to core tables.
Share data quality metrics with partners.
Automate alerts for drift over time.

43) What makes a good CDC (Change Data Capture) design?

Clear publisher and consumer boundaries.
Idempotent downstream processing.
Ordering guarantees for dependent events.
Retention tuned to consumer delays.
Visibility into lag and replay windows.
Schema versioning for event evolution.
Security applied at the change stream.

44) How do you decide between ELT and ETL?

ELT leverages database power for transforms.
ETL offloads work to external engines.
Consider data gravity and tool skills.
Cost and governance differ between paths.
ELT suits cloud warehouses well.
ETL suits complex, multi‑source joins upstream.
Hybrid models are common in practice.

45) What’s your process to retire unused indexes?

Capture usage over a full business cycle.
Tag candidate indexes for review.
Drop in stages with rollback plans.
Recheck critical queries after removal.
Communicate with teams before changes.
Keep documentation synced with reality.
Celebrate reclaimed IO and storage.

46) How do you explain query plan caching to developers?

The engine saves a plan to reuse next time.
Good for speed, risky if parameters vary a lot.
Clear when ad‑hoc plans bloat memory.
Use parameterization where appropriate.
Consider hints or recompiles sparingly.
Monitor cache hit rate and eviction causes.
Educate on sniffing and skew impacts.

47) When do you prefer application‑side caching over database tuning?

When reads are repetitive and predictable.
When data freshness can be brief.
When database CPU is already near limits.
When architecture supports edge caching.
Still fix obviously bad SQL first.
Keep cache invalidation simple and safe.
Measure total cost, not query time only.

48) How do you approach concurrency testing for SQL changes?

Simulate realistic read and write mixes.
Include long‑running reports in the plan.
Add jitter and randomness to mimic users.
Track waits, deadlocks, and queue depth.
Compare latency percentiles, not averages.
Rehearse rollbacks under pressure.
Share results with non‑DB stakeholders.

49) What’s your philosophy on audit tables?

Capture who, what, when, and why.
Keep a compact schema to reduce storage.
Partition by time for lifecycle management.
Protect from casual access and edits.
Expose curated views for analysts.
Align retention with legal needs.
Don’t let audits slow OLTP paths.

50) How do you design for multi‑tenant databases?

Decide schema‑per‑tenant vs shared with keys.
Enforce tenant isolation with constraints.
Index by tenant keys for common access.
Plan for noisy neighbor effects.
Provide tenant‑scoped backups and exports.
Monitor per‑tenant usage and costs.
Keep migration tooling tenant‑aware.

51) What are the limits of full‑text search inside SQL engines?

Great for basic relevance and stemming.
Not a replacement for dedicated search tech.
Ranking and faceting features are limited.
Heavy use can compete with OLTP resources.
Good for small to medium catalogs.
For large search, integrate external engines.
Sync strategy becomes the real challenge.

52) How do you balance cost on managed cloud databases?

Right‑size compute based on peak windows.
Use storage tiers for cold data.
Turn on autoscale with guardrails.
Avoid chatty queries that block scaling.
Use read replicas for heavy analytics.
Monitor spend per team or feature.
Kill unused test instances quickly.

53) What’s your view on JSON storage inside relational tables?

Useful for flexible, semi‑structured fields.
Keep core relational data strongly typed.
Index specific JSON paths if queried often.
Beware of oversized rows and IO churn.
Validate shape at ingestion boundaries.
Document contracts for downstream consumers.
Don’t let JSON become a dumping ground.

54) How do you keep test data realistic yet safe?

Start from production statistics and shapes.
Mask identifiers and sensitive fields.
Preserve correlations so queries behave similarly.
Include edge cases and skewed distributions.
Refresh frequently to avoid drift.
Automate provisioning for developer speed.
Track defects tied to data gaps.

55) What’s your approach to archiving and purging data?

Align policy to legal and business needs.
Partition or separate archive tables.
Keep indexes lean for archive access.
Purge in controlled, small time windows.
Prove that reports still reconcile after moves.
Communicate schedules to dependent teams.
Test restores that include archived data.

56) How do you defend against accidental destructive queries?

Use role‑based permissions with least privilege.
Require change approvals for risky operations.
Add safety rails like WHERE guards in tooling.
Limit direct prod access for most users.
Keep time‑boxed elevated access workflows.
Alert on high‑impact statements in logs.
Practice incident drills with realistic scenarios.

57) What are common mistakes in data migration projects?

Assuming perfect source data quality.
Skipping cutover rehearsal and timing.
Missing stakeholder sign‑offs on mappings.
Underestimating index rebuild windows.
Ignoring downstream report rewiring.
Forgetting rollback and dual‑run plans.
Not measuring performance on day one.

58) How do you choose between unique indexes and application‑side checks?

Unique indexes guarantee integrity always.
App checks can miss under concurrency.
DB constraint errors are fast to detect.
Use app checks for user feedback only.
Combine both for high‑value entities.
Monitor violations to catch design drifts.
Keep keys as narrow as possible.

59) What are the signs you need a warehouse separate from OLTP?

Reports regularly block OLTP users.
Complex joins need long history.
Data comes from multiple source systems.
Different SLAs for freshness vs stability.
Security and access patterns diverge.
Cost of scale is lower in warehouse tech.
Teams want semantic, curated layers.

60) How do you turn production incidents into SQL improvements?

Capture a blameless post‑incident timeline.
Identify the exact query and plan changes.
Add tests to lock in the fix behavior.
Improve monitors and alerts for early signals.
Share learnings with developers and analysts.
Track mean time to recovery metrics.
Budget time for preventative refactors.

Post Views: 21