Thursday, 15 April 2010

python - Django: how to wrap a bulk update/insert operation in transaction? -



python - Django: how to wrap a bulk update/insert operation in transaction? -

this utilize case:

i have multiple celery tasks run in parallel each task mass create or update many objects. i'm using django-bulk

so i'm using convenient function insert_or_update_many:

it first performs select if finds objects updates them otherwise creates them

but introduces problems of concurrency. example: if object did not exist during step 1 added list of objects inserted later. during period can happen celery task has created object , when tries perform mass insert (step 3) error of duplicate entry.

i guess need wrap 3 steps in 'blocking' block. i've read around transactions , i've tried wrap step 1,2,3 within with transaction.commit_on_success: block

with transaction.commit_on_success(): cursor.execute(sql, parameters) existing = set(cursor.fetchall()) if not skip_update: # find objects need updated update_objects = [o (o, k) in object_keys if k in existing] _update_many(model, update_objects, keys=keys, using=using) # find objects need inserted. insert_objects = [o (o, k) in object_keys if k not in existing] # filter out duplicates in insertion filtered_objects = _filter_objects(con, insert_objects, key_fields) _insert_many(model, filtered_objects, using=using)

but not work me. i'm not sure i've got total understanding of transactions. need block can set several operations beingness sure no other process or thread accessing (in write) db resources.

i need block can set several operations beingness sure no other process or thread accessing (in write) db resources

django transactions not, in general, guarantee you. if you're coming other areas of computer science naturally think of transaction blocking in way, in database world there different kinds of locks, @ different isolation levels, , vary each database. ensure transactions you're going have larn transactions, locks , performance characteristics, , mechanisms supplied database controlling them.

however, having bunch of processes trying lock table in order carry out competing inserts not sound idea. if collisions rare form of optimistic locking , retry transaction if fails. or perhaps can direct of these celery tasks single process (there's no performance advantage parallelizing if you're going acquire table lock anyway).

my suggestion start out forgetting mass operations , 1 row @ time using django's update_or_create (new in 1.7 not hard re-create , implement yourself). long database has constraints prevent duplicate entries (which sounds does), should free of race conditions describe above. if performance turn out unacceptable, these more complex options.

update: thought behind optimistic concurrency rather than, say, acquiring table lock prevent conflicts, proceed normal , retry operation if turns out you've nail conflict. in case might (using django 1.6+ transactions):

while true: try: transaction.atomic(): # mass insert / update operation except integrityerror: pass else: break

so if run race condition, resulting integrityerror cause transaction.atomic() block roll changes have been made, , while loop forcefulness retry of transaction (where presumably mass operation see newly-existing row , mark updating rather insertion).

this kind of approach can work if collisions rare, , badly if frequent.

python sql django transactions django-database

No comments:

Post a Comment