Cassandra: Is this proper schema for the data model? -
in sensor-based application, 300k objects beingness monitored per hr on 30 metrics each having success , failure counters.
my schema:
create table measurements( objid int, hr timestamp, metric text, succ int, fail int, primary key (objid, hour, metric));
data retention period within 1 year, way table have 300k rows each having 24*360*30*2 columns(cells).
usual queries counter values aggregated on specified time interval (could days, weeks, months) , specified objects (ranging 1 hundreds).
time slicing ok column slicing, while retrieval of multiple objects bit pain, since rows keyed per object objid , lead multiget.
the general query can think of is:
select * measurements objid in (id1, id2, id3...idn) , hr >= <starttime> , hr < <endtime>;
of course of study aggregation have done manually in application.
q: optimal way construction info given query pattern?
worst case 'overall' result on period, means taking objects account. mean, perspective, total table scan. recommended practice perform such task w/o resorting mapreduce?
if know typically restricting subset of time , possible set of objects within each hr may sparse, might consider reversing index order, time first dimension. way, picking out columns restricted set of rows, still need multi-get, if querying objects common, number of rows may smaller.
if typically query/aggregate different granularities of time, store duplicate info @ higher granularities of time well, per day, week, month, etc. speed queries larger time scales. de-normalization friend in cassandra!
it's possible maintain around indices both orderings , take index based upon type of query performing.
cassandra schema
No comments:
Post a Comment