Gotchas while bulk loading CouchDB

So far I did some conversions from legacy SQL databases to CouchDB. I always had a somewhat different approach.

I used the primary key of the SQL-DB as the Document-Id. This allowed me to import over and over again without fear of duplicate documents.
I did row-by-row imports instead of a bulk import. It makes debugging easier. I saw between 5-10 inserts per second over an Internet connection. While this is not lightning fast it was fast enough for me. My biggest Database is 600.000 Documents totaling 20GB. row-by-row bloat the database during import so run compaction occasionally. Then again unless your rows are huge 15.000 rows sounds not much.

My importing code usually looks like this:

def main(): options = parse_commandline() server = couchdb.client.Server(options.couch)  db = server[options.db]  for kdnnr in get_kundennumemrs():    data = vars(get_kunde(kdnnr))     doc = {'name1': data.get('name1', ''),           'strasse': data.get('strasse', ''),           'plz': data.get('plz', ''), 'ort': data.get('ort', ''),           'tel': data.get('tel', ''), 'kundennr': data.get('kundennr', '')}    # update existing doc or insert a new one    newdoc = db.get(kdnnr, {})    newdoc.update(doc)    if newdoc != db.get(kdnnr, {}):        db[kdnnr] = newdoc

CodeHunter

Gotchas while bulk loading CouchDB

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last