SQL DISTINCT Explained — Remove Duplicates with Examples
The SQL
DISTINCT keyword is a simple but powerful tool to remove duplicate rows from query results. When you want unique values — for example, a list of unique countries from a customer table — DISTINCT is your friend. This guide explains how DISTINCT works, common use cases, how it interacts with COUNT() and grouping, and when you might prefer alternatives for performance.
What Does DISTINCT Do?
At its core,
DISTINCT filters duplicate rows from the result set. It considers all columns in the SELECT list when determining duplicates.
SELECT DISTINCT Country FROM Customers;
This returns a list of countries found in your
Customers table — each country appears only once.
📽 Watch Video!
Distinct on Multiple Columns
If you include multiple columns in
DISTINCT, the combination of those columns determines uniqueness.
SELECT DISTINCT Country, State FROM Customers;
Rows are considered duplicates only if both
Country and State match — useful when you want unique pairs.
DISTINCT with COUNT()
A very common pattern is counting unique values:
SELECT COUNT(DISTINCT CustomerID) AS UniqueCustomers FROM Orders;
Use this when you want to know how many unique customers placed orders (not how many orders total).
When DISTINCT and GROUP BY Overlap
Sometimes
DISTINCT and GROUP BY produce similar results, but they serve different purposes.
-- DISTINCT SELECT DISTINCT Category FROM Products; -- GROUP BY (returns one row per category too) SELECT Category FROM Products GROUP BY Category;
Use
GROUP BY when you need aggregation (SUM, COUNT, AVG) per group; use DISTINCT when you only need unique rows.
Performance Considerations
DISTINCT requires sorting or hashing under the hood, which can be expensive on large datasets. Here are tips to improve performance:
- Only select the columns you need — fewer columns means fewer comparisons.
- Use indexes on the columns you’re selecting if possible.
- Consider
GROUP BYwith aggregates if you also need counts or sums. - For large tables, test both approaches (DISTINCT vs. GROUP BY vs. window functions) to see which is faster in your DBMS.
Example: DISTINCT vs GROUP BY Performance
-- DISTINCT SELECT DISTINCT CustomerID FROM Orders; -- GROUP BY SELECT CustomerID FROM Orders GROUP BY CustomerID;
Both return unique customer IDs; performance depends on indexes and DB engine optimization.
Common Uses of DISTINCT
- Lists of unique values (countries, categories, tags)
- Removing duplicates before exporting data
- Counting unique entities with
COUNT(DISTINCT ...) - When data contains accidental duplicates and you need a clean list
Examples — Real Queries
1. Unique customer emails
SELECT DISTINCT Email FROM Customers WHERE Email IS NOT NULL;
2. Unique combinations (city + state)
SELECT DISTINCT City, State FROM Customers;
3. Count unique products ordered
SELECT COUNT(DISTINCT ProductID) AS UniqueProducts FROM OrderLines;
When NOT to Use DISTINCT
Sometimes
DISTINCT hides underlying data issues. Consider alternatives:
- Fix data duplicates at the source if they are accidental.
- Use
ROW_NUMBER()with window functions to pick a single row per group when additional logic is needed (for example, pick the latest record per customer). - Use
GROUP BYwhen you need aggregated metrics besides uniqueness.
Alternative: ROW_NUMBER() approach
WITH ranked AS ( SELECT *, ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY CreatedAt DESC) AS rn FROM CustomerLogins ) SELECT * FROM ranked WHERE rn = 1;
This returns the most recent login per customer — more flexible than a simple DISTINCT.
Edge Cases & Gotchas
- NULLs:
Several DBs treatNULLas equal for DISTINCT; results may show one row with NULL values. - Whitespace and case:
'USA' and 'usa' are different unless you normalize case or use a case-insensitive collation. - Large text columns:
DISTINCT on large VARCHAR/TEXT columns may be slow. - Unintended duplicates:
Duplicate rows caused by joins — consider SELECT DISTINCT on the joined table columns specifically.
Practical Exercises
Try these:
- Return the list of distinct product categories from the Products table.
- Count how many unique customers ordered in the last 30 days.
- Using a join, return distinct product names sold to a specific customer.
FAQ — DISTINCT
Q: Does DISTINCT remove duplicate rows entirely?
A: Yes, DISTINCT removes duplicate rows from the result set — duplicates are determined by all SELECTed columns.
Q: Is DISTINCT the same as GROUP BY?
A: They can produce the same output for simple uniqueness queries, but GROUP BY is used for aggregation. DISTINCT is simpler when you only want unique rows.
Q: How does DISTINCT interact with ORDER BY?
A: You can combine them — DISTINCT runs conceptually before ORDER BY, so you get unique rows which you can then sort.
Conclusion
DISTINCT is an essential SQL tool for cleaning and summarizing results quickly. Use it for unique lists and combined with
COUNT() to measure unique users or products. For complex de-duplication or performance-sensitive scenarios, consider window functions or grouping as scalable alternatives. Practice the examples above and use our exercises to build confidence.
⭐ Need an SQL Certificate, CEUs, or a FREE start with Hands-On Practice! → Get Started Here!