In this article, we will learn how to retrieve duplicate rows based on specific column values using SQL Server.
If you're seeking to identify duplicate rows in a database table based on certain columns, you've come to the right place.
I've been recently working on an e-commerce website project where I need to find duplicates in a SQL Server table.
There are various methods for identifying duplicate rows based on specific columns, each with different levels of efficiency, depending on the size of your database tables. In this post, we'll explore some efficient techniques to accomplish this task.
Let's consider an example: I have a table named Company_Customer in my SQL database. This table contains records of customers. Now, what I want is to select all duplicate records from the Company_Customer table.
select CustomerName,CustomerCity,CustomerSalary, count(*)
from Company_Customer
group by CustomerName,CustomerCity,CustomerSalary
having count(*) > 1
Generic Query
select column1,column2,column3, count(*)
from TableName
group by column1,column2,column3
having count(*) > 1
Query Reseult:
This query retrieves the CustomerName, CustomerCity, and CustomerSalary columns from the Company_Customer table and counts how many times each unique combination of these columns appears. It then filters the results to only include combinations that appear more than once, effectively identifying duplicates based on these columns. This query is useful for finding duplicate records or identifying potential data entry errors where multiple records have the same values for these columns.
If you have SQL Server 2005 or lastest version, then you can use row_number() functions to return the duplicate records from the database table.
select *
from(select CustomerName
, CustomerCity
, CustomerSalary
, row_number() over (partition by CustomerName
, CustomerCity
, CustomerSalary
order by Id ) as occurrence
from Company_Customer) x
where occurrence > 1
Generic Query
select *
from(select colm1
, colm2
, colm3
, row_number() over (partition by colm1
, colm2
, colm3
order by colm1) as occurrence
from TableName) x
where occurrence > 1
Query Reseult:
Above SQL query is querying the Company_Customer table, but it's utilizing a window function (ROW_NUMBER()) to assign a sequential number to each row within partitions defined by the combination of CustomerName, CustomerCity, and CustomerSalary.
In simpler words, this query identifies duplicate records in the Company_Customer table based on the combination of CustomerName, CustomerCity, and CustomerSalary. It assigns a sequential number to each occurrence of these combinations and then filters the results to only include occurrences where the sequential number is greater than 1, indicating duplicates. This approach is useful for identifying and handling duplicate data within the table.
Above is the easiest solution with SQL Server 2005. it will return all records record except for the first one if there are multiple occurrences.
SELECT Id
, CustomerName
, CustomerCity
, CustomerSalary
FROM
(
SELECT Id
, CustomerName
, CustomerCity
, CustomerSalary
, RANK() OVER (PARTITION BY CustomerName, CustomerCity, CustomerSalary ORDER BY Id ASC) AS [rank]
FROM Company_Customer
) a
WHERE [rank] > 1
Generic Query
SELECT Id
, cloumn1
, cloumn2
, cloumn3
FROM
(
SELECT Id
, cloumn1
, cloumn2
, cloumn3
, RANK() OVER (PARTITION BY cloumn1, cloumn2, cloumn3 ORDER BY Id ASC) AS [rank]
FROM Company_Customer
) a
WHERE [rank] > 1
This query identifies duplicate records in the Company_Customer table based on the combination of CustomerName, CustomerCity, and CustomerSalary. It assigns a ranking to each occurrence of these combinations and then filters the results to only include occurrences where the rank is greater than 1, indicating duplicates. This approach is useful for identifying and handling duplicate data within the table.
with MYCTE as (
select row_number() over ( partition by CustomerName,CustomerCity,CustomerSalary order by Id) rown, *
from Company_Customer
)
select * from MYCTE where rown >1
Above Query identifies duplicate records in the Company_Customer table based on the combination of CustomerName, CustomerCity, and CustomerSalary. It assigns a row number to each occurrence of these combinations and then selects only the rows where the row number is greater than 1, indicating duplicates. This approach is useful for identifying and handling duplicate data within the table.
CREATE TABLE [dbo].[Company_Customer](
[Id] [int] IDENTITY(1,1) NOT NULL,
[CustomerName] [nvarchar](150) NOT NULL,
[CustomerCity] [nvarchar](500) NOT NULL,
[CustomerSalary] [decimal](18, 2) NOT NULL,
)
GO
INSERT [dbo].[Company_Customer] ([Id], [CustomerName], [CustomerCity], [CustomerSalary]) VALUES (4, 'John', 'California', CAST(1000.00 AS Decimal(18, 2)))
GO
INSERT [dbo].[Company_Customer] ([Id], [CustomerName], [CustomerCity], [CustomerSalary]) VALUES (5, 'Marry', 'Washington dc', CAST(2000.00 AS Decimal(18, 2)))
GO
INSERT [dbo].[Company_Customer] ([Id], [CustomerName], [CustomerCity], [CustomerSalary]) VALUES (1012, 'Ashok', 'New Delhi', CAST(2000.00 AS Decimal(18, 2)))
GO
INSERT [dbo].[Company_Customer] ([Id], [CustomerName], [CustomerCity], [CustomerSalary]) VALUES (1014, 'John', 'California', CAST(1000.00 AS Decimal(18, 2)))
GO
INSERT [dbo].[Company_Customer] ([Id], [CustomerName], [CustomerCity], [CustomerSalary]) VALUES (1015, 'Marry', 'Washington dc', CAST(2000.00 AS Decimal(18, 2)))
GO
INSERT [dbo].[Company_Customer] ([Id], [CustomerName], [CustomerCity], [CustomerSalary]) VALUES (1016, 'Pankaj Patel', 'Kanpur', CAST(5000.00 AS Decimal(18, 2)))
GO
INSERT [dbo].[Company_Customer] ([Id], [CustomerName], [CustomerCity], [CustomerSalary]) VALUES (1019, 'Neeraj Singh', 'Banglore', CAST(6000.00 AS Decimal(18, 2)))
GO
INSERT [dbo].[Company_Customer] ([Id], [CustomerName], [CustomerCity], [CustomerSalary]) VALUES (1020, 'Aakankshi Gupta', 'London', CAST(40000.00 AS Decimal(18, 2)))
GO
INSERT [dbo].[Company_Customer] ([Id], [CustomerName], [CustomerCity], [CustomerSalary]) VALUES (1022, 'Ashish Thapliyal', 'Paris', CAST(5000.00 AS Decimal(18, 2)))
GO
INSERT [dbo].[Company_Customer] ([Id], [CustomerName], [CustomerCity], [CustomerSalary]) VALUES (1023, 'Marry', 'Washington dc', CAST(2000.00 AS Decimal(18, 2)))
GO
If you have any doubt or query then comment.