MongoDB MapReduce--is there an Aggregation alternative? -
i've got collection documents using schema (some members redacted):
{ "_id" : objectid("539f41a95d1887b57ab78bea"), "answers" : { "ratings" : { "positivity" : [ 2, 3, 5 ], "activity" : [ 4, 4, 3 ], }, "media" : [ objectid("537ea185df872bb71e4df270"), objectid("537ea185df872bb71e4df275"), objectid("537ea185df872bb71e4df272") ] }
in schema, first, second, , 3rd positivity
ratings correspond first, second, , 3rd entries in media
array, respectively. same true activity
ratings. need calculate statistics positivity
, activity
ratings respect associated media
objects across documents in collection. right now, i'm doing mapreduce. i'd to, however, accomplish aggregation pipeline.
ideally, i'd $unwind
media
, answers.ratings.positivity
, , answers.ratings.activity
arrays simultaneously end with, example, next 3 documents based on previous example:
[ { "_id" : objectid("539f41a95d1887b57ab78bea"), "answers" : { "ratings" : { "positivity" : 2, "activity" : 4 } }, "media" : objectid("537ea185df872bb71e4df270") }, { "_id" : objectid("539f41a95d1887b57ab78bea"), "answers" : { "ratings" : { "positivity" : 3 "activity" : 4 } }, "media" : objectid("537ea185df872bb71e4df275") }, { "_id" : objectid("539f41a95d1887b57ab78bea"), "answers" : { "ratings" : { "positivity" : 5 "activity" : 3 } }, "media" : objectid("537ea185df872bb71e4df272") } ]
is there way accomplish this?
the current aggregation framework not allow this. beingness able unwind multiple arrays know same size , creating document ith value of each feature.
if want utilize aggregation framework need alter schema little. illustration take next document schema:
{ "_id" : objectid("539f41a95d1887b57ab78bea"), "answers" : { "ratings" : { "positivity" : [ {k:1, v:2}, {k:2, v:3}, {k:3, v:5} ], "activity" : [ {k:1, v:4}, {k:2, v:4}, {k:3, v:3} ], }}, "media" : [ {k:1, v:objectid("537ea185df872bb71e4df270")}, {k:2, v:objectid("537ea185df872bb71e4df275")}, {k:3, v:objectid("537ea185df872bb71e4df272")} ] }
by doing adding index object within array. after it's matter of unwinding arrays , matching on key.
db.test.aggregate([{$unwind:"$media"}, {$unwind:"$answers.ratings.positivity"}, {$unwind:"$answers.ratings.activity"}, {$project:{"media":1, "answers.ratings.positivity":1,"answers.ratings.activity":1, include:{$and:[ {$eq:["$media.k", "$answers.ratings.positivity.k"]}, {$eq:["$media.k", "$answers.ratings.activity.k"]} ]}} }, {$match:{include:true}}])
and output is:
[ { "_id" : objectid("539f41a95d1887b57ab78bea"), "answers" : { "ratings" : { "positivity" : { "k" : 1, "v" : 2 }, "activity" : { "k" : 1, "v" : 4 } } }, "media" : { "k" : 1, "v" : objectid("537ea185df872bb71e4df270") }, "include" : true }, { "_id" : objectid("539f41a95d1887b57ab78bea"), "answers" : { "ratings" : { "positivity" : { "k" : 2, "v" : 3 }, "activity" : { "k" : 2, "v" : 4 } } }, "media" : { "k" : 2, "v" : objectid("537ea185df872bb71e4df275") }, "include" : true }, { "_id" : objectid("539f41a95d1887b57ab78bea"), "answers" : { "ratings" : { "positivity" : { "k" : 3, "v" : 5 }, "activity" : { "k" : 3, "v" : 3 } } }, "media" : { "k" : 3, "v" : objectid("537ea185df872bb71e4df272") }, "include" : true } ]
doing creates lot of document overhead , may slower current mapreduce implementation. need run tests check this. computations required grow in cubic way based on size of 3 arrays. should kept in mind.
mongodb mapreduce aggregation-framework
No comments:
Post a Comment