Wednesday, 15 May 2013

sql - Huge amount of updates and postgresql -



sql - Huge amount of updates and postgresql -

currently have situation, makes me scared. have 20k rows in db, isn't 1% of data, have in next 3 months. each row represents object(let's phone call object1) data. also, have table stats each object1, let's phone call object1stats, located in mongodb. have object1stats each day, get, example, total stats, should sum every object1stats object1.

the problem is: need have info precalculated. example, display user, ability sort object1 collection stats. load , sort in code, with, example, 5 millions object1, expensive.

so, came thought of precalculating stats each hour(object1stats updated twice in hour), each object1. process makes me afraid of time need perform everything... should take each object1, send query mongodb sum object1stats, create sql update object1. repeat at least 3 1000000 times.

i have 2 bottlenecks here: calculation of sum(mapreduce) in mongodb , sql update queries in postgre. can't speedup mapreduce now(i assume good), i'm thinking sql updates.

any thoughts or suggestions? take anything, suggestions utilize different db or approach.

also, can't add together new stats info object, because lastly day stats can changed often, , previous days stats can changed too.

some ideas on postgresql end:

use copy load fresh info temporary table, update objects single query. it's faster issuing every update separately. see this answer. (if driver allows it, besides re-create , multi-valued insert options there's alternative pipeline).

keep updated part of object (the stats) in separate table.

if sure all objects updated, might want load updated stats copy , switch tables (drop table stats; alter table new_stats rename stats).

if, on other hand, updating stats in well-defined batches (e.g. first update stats of objects 1..99999, update stats of objects 100000..199999, , on), might partition stats table according these batches.

another angle load stats straight mongodb, on demand, using foreign table wrapper. might want utilize stored procedure accessing stats cache stats in local table. updating stats paramount truncating cache. downside of approach postgresql issue separate mongodb request every stat fetches, if queries need touch lot of stats approach might worse hourly batch update.

yet way create mongodb "river", driver force stat changes postgresql occur in mongodb. way you'll pay use, updating postgresql objects indeed changed in mongodb. load less rough. imo preferred way, don't know how hard create "river" driver.

p.s. here's blog post using notify update es: http://evol-monkey.blogspot.ru/2014/08/postgresql-and-elasticsearch.html

sql database mongodb postgresql optimization

No comments:

Post a Comment