Friday, April 15, 2011

SQL Inserting data from staging table to two other tables.

I have a situation in which a CSV is uploaded to my application and each line essentially needs to be put into the database. I read each line and build a data table and then SqlBulkCopy that table up to a staging table.

The staging table looks like this:

UserID, GroupID, FirstName, LastName, EmailAddress

I have three other relevant tables in the database. A contacts table, a groups table and a contacts to groups mapping table.

Contacts:
ID, UserID, FirstName, LastName, EmailAddress

Groups
ID, UserID, Name, Description

ContactGroupMapping
ID, ContactID, GroupID

The ContactGroupMapping table simply maps contacts to groups. Hopefully the staging table now makes sense, it holds the details of each imported contact plus the group they should also be mapped to.

My plan was to run a query against the database after the SqlBulkCopy to move the data from the staging table to the Contacts and ContactGroupMapping tables. Currently, I have a query looking something like this:

INSERT INTO Contacts (UserID, FirstName, LastName, EmailAddress)

SELECT DISTINCT [t1].UserID, [t1].EmailAddress, [t1].FirstName, [t1].LastName FROM ContactImportStaging as [t1]
WHERE NOT EXISTS 
(
    SELECT UserID, EmailAddress, FirstName, LastName FROM Contacts
    WHERE UserID = [t1].UserID AND EmailAddress = [t1].EmailAddress AND FirstName = [t1].FirstName AND LastName = [t1].LastName
)

So, my problem is that while this inserts all the distinct contacts into my contacts table, I then have no way to add the associated row to the mapping table for each newly inserted contact.

The only solution (probably because I suck at SQL) I can come up with is to have an extra nullable field in the contacts table identifying the group that the contact is to be associated with and insert this too. Then I could run a second query to select all contacts with a value in this column and insert into the mappings table.

Any thoughts on how this sort of thing should be achieved most efficiently?

Thanks.

Edit: To elaborate on the object model: There are any number of Contacts and any number of Groups. A contact can be in a group by way of an entry in the ContactGroupMapping table. One contact can be in any number of groups. At the database level, this model is the concern of the three tables Contacts, Groups and ContactGroupMapping. I'm needing to move one row from the staging table and create two rows; one in the Contacts table and one in the ContactGroupMapping table.

From stackoverflow
  • I'd just bulk copy the CSV into a staging database table. You can delete the table later, if space is a concern.

    After copying the CSV into, say, StagingTable, the trick is to split it into the other tables. Since your CSV can now be referenced in the database, you can join the other tables to retrieve the new table's ID:

    insert into ContactGroupMapping
    select cont.ID, grp.ID
    from StagingTable stag
    inner join Contacts cont on cont.UserID = stag.UserID
    inner join Groups grp on grp.GroupID = stag.GroupID
    

    An even easier solution would be to keep the original GroupID and UserID. Filling the GroupContactMapping would then be as easy as:

    insert into ContactGroupMapping
    select UserID, GroupID
    from StagingTable stag
    

0 comments:

Post a Comment