What's the best way to compare / insert / update products in a MySQL db from a .CSV file What's the best way to compare / insert / update products in a MySQL db from a .CSV file codeigniter codeigniter

What's the best way to compare / insert / update products in a MySQL db from a .CSV file


We had a similar situation. After a lot of attempts at making the script better, we decided that we needed another approach to make our import work and not take ~10 hours.

What we did was dump all the PHP code, and instead use mysqlimport to load the contents of the CSV file directly into a table. That table now contains everything we need, but not in a form that's useful for us (no structure, some fields need some processing, etc.)

However, because everything is now in the database, we can do everything we want with a query.For example, deleting all data that is no longer in the import file, thats just DELETE FROM structured_table AS st LEFT JOIN unstructured_table AS ut ON st.someField = ut.someField WHERE ut.someField IS NULL;, updating existing records is just UPDATE structured_table AS st INNER JOIN unstructured_table AS ut ON st.someField = ut.someField SET st.anotherField = CONCAT(ut.aField, ' ', ut.yetAnotherField);.

Obviously, for a complex import script, your queries will be more complex and you'll need more of them. You might even need to throw some stored procedures in to do processing on individual fields. But if you can take this kind of approach you'll end up with a process that can handle a lot of data and is very scalable.


I have a similar situation... Compare around 20M records every day to update a few records with changes and add / remove the delta. Data source is CSV as well. I use perl, while I think php also work.

  1. Each record must have a linking key, SKU of product? Or something like that. May already be the primary key /unique key in your DB table.
  2. You know the lst of fields that you want to compare and update.

Step 1: read ALL records from DB, store in an array using the linking key as named index.

1.1: value is concat of all fields need to compare, or md5() of the concat result to save memory.

Step 2: loop through the CSV file, extract the linking key and new values per row.

2.1: if linking key is NOT in the array, INSERT action to DB.

2.2: isset() return true so compare the values (or md5() of the value concat), if different, UPDATE action to DB.

2.3: delete this entry from the array.

Step 3: by the end of reading the CSV, the entries remains in the array were records to DELETE.

In my case, it use less than 2GB RAM for the process and runs around 3 minutes, which should be feasible and acceptable.