Friday, February 4, 2011

SQL Server Duplicate Checking

What is the best way to determine duplicate records in a SQL Server table?

For instance, I want to find the last duplicate email received in a table (table has primary key, receiveddate and email fields).

Sample data:

1 01/01/2008 stuff@stuff.com

2 02/01/2008 stuff@stuff.com

3 01/12/2008 noone@stuff.com

  • Couldn't you join the list on the e-mail field and then see what nulls you get in your result?

    Or better yet, count the instances of each e-mail address? And only return the ones with count > 1

    Or even take the email and id fields. And return the entries where the e-mail is the same, and the IDs are different. (To avoid duplicates don't use != but rather either < or >.)

  • something like this

    select email ,max(receiveddate) as MaxDate
    from YourTable
    group by email 
    having count(email) > 1
    
    From SQLMenace
  • select * from table a, table b where a.email = b.email

  • SELECT [id], [receivedate], [email]
    FROM [mytable]
    WHERE [email] IN ( SELECT [email]
        FROM [myTable]
        GROUP BY [email]
        HAVING COUNT([email]) > 1 )
    
    From palehorse
  • Do you want a list of the last items? If so you could use:

    SELECT [info] FROM [table] t WHERE NOT EXISTS (SELECT * FROM [table] tCheck WHERE t.date > tCheck.date)
    

    If you want a list of all duplicate email address use GROUP BY to collect similar data, then a HAVING clause to make sure the quantity is more than 1:

    SELECT [info] FROM [table] GROUP BY [email] HAVING Count(*) > 1 DESC
    

    If you want the last duplicate e-mail (a single result) you simply add a "TOP 1" and "ORDER BY":

    SELECT TOP 1 [info] FROM [table] GROUP BY [email] HAVING Count(*) > 1 ORDER BY Date DESC
    
    From enigmatic
  • If you have surrogate key, it is relatively easy to use the group by syntax mentioned in SQLMenance's post. Essentially, group by all the fields that make two or more rows 'the same'.

    Example pseudo-code to delete duplicate records.

    Create table people (ID(PK), Name, Address, DOB)
    
    Delete from people where id not in (
    Select min(ID) from people group by name, address, dob
    )
    
    From Brian

0 comments:

Post a Comment