Wednesday, 15 April 2015

MongoDB MapReduce--is there an Aggregation alternative? -



MongoDB MapReduce--is there an Aggregation alternative? -

i've got collection documents using schema (some members redacted):

{ "_id" : objectid("539f41a95d1887b57ab78bea"), "answers" : { "ratings" : { "positivity" : [ 2, 3, 5 ], "activity" : [ 4, 4, 3 ], }, "media" : [ objectid("537ea185df872bb71e4df270"), objectid("537ea185df872bb71e4df275"), objectid("537ea185df872bb71e4df272") ] }

in schema, first, second, , 3rd positivity ratings correspond first, second, , 3rd entries in media array, respectively. same true activity ratings. need calculate statistics positivity , activity ratings respect associated media objects across documents in collection. right now, i'm doing mapreduce. i'd to, however, accomplish aggregation pipeline.

ideally, i'd $unwind media, answers.ratings.positivity, , answers.ratings.activity arrays simultaneously end with, example, next 3 documents based on previous example:

[ { "_id" : objectid("539f41a95d1887b57ab78bea"), "answers" : { "ratings" : { "positivity" : 2, "activity" : 4 } }, "media" : objectid("537ea185df872bb71e4df270") }, { "_id" : objectid("539f41a95d1887b57ab78bea"), "answers" : { "ratings" : { "positivity" : 3 "activity" : 4 } }, "media" : objectid("537ea185df872bb71e4df275") }, { "_id" : objectid("539f41a95d1887b57ab78bea"), "answers" : { "ratings" : { "positivity" : 5 "activity" : 3 } }, "media" : objectid("537ea185df872bb71e4df272") } ]

is there way accomplish this?

the current aggregation framework not allow this. beingness able unwind multiple arrays know same size , creating document ith value of each feature.

if want utilize aggregation framework need alter schema little. illustration take next document schema:

{ "_id" : objectid("539f41a95d1887b57ab78bea"), "answers" : { "ratings" : { "positivity" : [ {k:1, v:2}, {k:2, v:3}, {k:3, v:5} ], "activity" : [ {k:1, v:4}, {k:2, v:4}, {k:3, v:3} ], }}, "media" : [ {k:1, v:objectid("537ea185df872bb71e4df270")}, {k:2, v:objectid("537ea185df872bb71e4df275")}, {k:3, v:objectid("537ea185df872bb71e4df272")} ] }

by doing adding index object within array. after it's matter of unwinding arrays , matching on key.

db.test.aggregate([{$unwind:"$media"}, {$unwind:"$answers.ratings.positivity"}, {$unwind:"$answers.ratings.activity"}, {$project:{"media":1, "answers.ratings.positivity":1,"answers.ratings.activity":1, include:{$and:[ {$eq:["$media.k", "$answers.ratings.positivity.k"]}, {$eq:["$media.k", "$answers.ratings.activity.k"]} ]}} }, {$match:{include:true}}])

and output is:

[ { "_id" : objectid("539f41a95d1887b57ab78bea"), "answers" : { "ratings" : { "positivity" : { "k" : 1, "v" : 2 }, "activity" : { "k" : 1, "v" : 4 } } }, "media" : { "k" : 1, "v" : objectid("537ea185df872bb71e4df270") }, "include" : true }, { "_id" : objectid("539f41a95d1887b57ab78bea"), "answers" : { "ratings" : { "positivity" : { "k" : 2, "v" : 3 }, "activity" : { "k" : 2, "v" : 4 } } }, "media" : { "k" : 2, "v" : objectid("537ea185df872bb71e4df275") }, "include" : true }, { "_id" : objectid("539f41a95d1887b57ab78bea"), "answers" : { "ratings" : { "positivity" : { "k" : 3, "v" : 5 }, "activity" : { "k" : 3, "v" : 3 } } }, "media" : { "k" : 3, "v" : objectid("537ea185df872bb71e4df272") }, "include" : true } ]

doing creates lot of document overhead , may slower current mapreduce implementation. need run tests check this. computations required grow in cubic way based on size of 3 arrays. should kept in mind.

mongodb mapreduce aggregation-framework

No comments:

Post a Comment