Code Answer: SQL Server Duplicate Checking

What is the best way to determine duplicate records in a SQL Server table?

For instance, I want to find the last duplicate email received in a table (table has primary key, receiveddate and email fields).

Sample data:

1 01/01/2008 stuff@stuff.com

2 02/01/2008 stuff@stuff.com

3 01/12/2008 noone@stuff.com

From stackoverflow $pageUsers[$entry.posts[0].author].name

Couldn't you join the list on the e-mail field and then see what nulls you get in your result?

Or better yet, count the instances of each e-mail address? And only return the ones with count > 1

Or even take the email and id fields. And return the entries where the e-mail is the same, and the IDs are different. (To avoid duplicates don't use != but rather either < or >.)

From Rob Rolnick

something like this

select email ,max(receiveddate) as MaxDate
from YourTable
group by email 
having count(email) > 1

From SQLMenace

select * from table a, table b where a.email = b.email

From Michael Sharek

SELECT [id], [receivedate], [email]
FROM [mytable]
WHERE [email] IN ( SELECT [email]
    FROM [myTable]
    GROUP BY [email]
    HAVING COUNT([email]) > 1 )

From palehorse

Do you want a list of the last items? If so you could use:
```
SELECT [info] FROM [table] t WHERE NOT EXISTS (SELECT * FROM [table] tCheck WHERE t.date > tCheck.date)
```
If you want a list of all duplicate email address use GROUP BY to collect similar data, then a HAVING clause to make sure the quantity is more than 1:
```
SELECT [info] FROM [table] GROUP BY [email] HAVING Count(*) > 1 DESC
```
If you want the last duplicate e-mail (a single result) you simply add a "TOP 1" and "ORDER BY":
```
SELECT TOP 1 [info] FROM [table] GROUP BY [email] HAVING Count(*) > 1 ORDER BY Date DESC
```
From enigmatic
If you have surrogate key, it is relatively easy to use the group by syntax mentioned in SQLMenance's post. Essentially, group by all the fields that make two or more rows 'the same'.

Example pseudo-code to delete duplicate records.
```
Create table people (ID(PK), Name, Address, DOB)

Delete from people where id not in (
Select min(ID) from people group by name, address, dob
)
```
From Brian

Code Answer

Friday, February 4, 2011

SQL Server Duplicate Checking

0 comments:

Post a Comment

Blog Archive