I am running a web site that helps manage lots of information for medical clinics. Part of the application needs to upload patient files from an excel spreadsheet. The patient table has about 1 million records and an excel import needs to insert or update 10k,20k,30k patient records at a time. All the while other customers are pounding the table. Processing time is less important than reducing contention on the database. What strategies would you recommend?
I know other sites effectively do this. Salesforce allows you to upload large amounts of data at once.
-
If each row update is independent, run a loop that gets a row, updates the table, get another row, ...
Then you can put a delay in the loop to slow it down to avoid impacting the main site (some sort of load metric could be used to adjust this on the fly). Some sort of token ring like setup could be used to make several update throttle together.
-
Load the Excel sheet to a staging table first, then decide whether to update/insert the rows in a single batch or what.
Typically, inserting a million rows from one table to another should be quick enough to run while the server is under load. You will have a lock during the insert, but it should be a matter of seconds. Unless you are loading billions of records a minute, or your upsert operation is very intensive, I don't see it being a problem.
If your upsert is very complex, there are a number of ways to do it. You can insert in a single batch, but mark the production records as incomplete as their subordinate records are updated. You can mark the staging rows as unprocessed and process in batches.
Brent Ozar : If I could award multiple upvotes to this I would. MonkeyBrother, this is the one you want to focus on.
0 comments:
Post a Comment