Tuesday, April 5, 2011

Finding unmatched records with SQL

I'm trying write a query to find records which don't have a matching record in another table.

For example, I have a two tables whose structures looks something like this:

Table1
    State | Product | Distributor | other fields
    CA    | P1      |  A          | xxxx
    OR    | P1      |  A          | xxxx
    OR    | P1      |  B          | xxxx
    OR    | P1      |  X          | xxxx
    WA    | P1      |  X          | xxxx
    VA    | P2      |  A          | xxxx

Table2
    State | Product | Version | other fields
    CA    | P1      |  1.0    | xxxx
    OR    | P1      |  1.5    | xxxx
    WA    | P1      |  1.0    | xxxx
    VA    | P2      |  1.2    | xxxx

(State/Product/Distributor together form the key for Table1. State/Product is the key for Table2)

I want to find all the State/Product/Version combinations which are Not using distributor X. (So the result in this example is CA-P1-1.0, and VA-P2-1.2.)

Any suggestions on a query to do this?

From stackoverflow
  • SELECT
        *
    FROM
        Table2 T2
    WHERE
        NOT EXISTS (SELECT *
            FROM
               Table1 T1
            WHERE
               T1.State = T2.State AND
               T1.Product = T2.Product AND
               T1.Distributor = 'X')
    

    Edit: Damn: beaten to it. This should be ANSI compliant though.

    Jonathan Leffler : This works on most SQL systems; EXCEPT won't work everywhere (though it is more elegant where it does work).
    Tomalak : Using "SELECT 1 FROM" in the sub-query (instead of "SELECT * FROM") may prevent an unnecessary table scan. Though I would expect the DBMS to be smart enough to figure it out on it's own, upon seeing "EXISTS".
    gbn : I have this with my SQL Server MVP colleague all the time :-) The * is expanded at compile time, bt collapses trivially, but not runtime he says. He showed me an article once. I saw Itzak Ben-Gan a while ago and he said the * is quicker. The choice is yours...
    Boofus McGoofus : Worked well. Thanks.
    Dems : When using EXISTS there should be no difference between SELECT * and SELECT 1.
  • select * from table1 where state not in (select state from table1 where distributor = 'X')

    Probably not the most clever but that should work.

    gbn : IN is not as good as EXISTS, and does not handle the composite key on state/product
  • In T-SQL:

    SELECT DISTINCT Table2.State, Table2.Product, Table2.Version
    FROM Table2 
      LEFT JOIN Table1 ON Table1.State = Table2.State AND Table1.Product = Table2.Product AND Table1.Distributor = 'X'
    WHERE Table1.Distributor IS NULL
    

    No subqueries required.

    Edit: As the comments indicate, the DISTINCT is not necessary. Thanks!

    HLGEM : I wouldn't use distinct, but otherwise this is what you want.
    gbn : The distinct will probably make the query less efficient, but never more efficient. It depends on the relative table row counts. The distcnt also forces an aggregate that sub query does not need.
  • SELECT DISTINCT t2.State, t2.Product, t2.Version
    FROM table2 t2
    JOIN table1 t1 ON t1.State = t2.State AND t1.Product = t2.Product
                    AND t1.Distributor <> 'X'
    
  • In Oracle:

    SELECT t2.State, t2.Product, t2.Version
    FROM Table2 t2, Table t1
    WHERE t1.State(+) = t2.State
      AND t1.Product(+) = t2.Product
      AND t1.Distributor(+) = :distributor
      AND t1.State IS NULL
    

0 comments:

Post a Comment