Breeding: MongoDB MapReduce--is there an Aggregation alternative? -

Wednesday, 15 April 2015

MongoDB MapReduce--is there an Aggregation alternative? -

i've got collection documents using schema (some members redacted):

{     "_id" : objectid("539f41a95d1887b57ab78bea"),     "answers" : {         "ratings" : {             "positivity" : [                  2,                  3,                  5             ],             "activity" : [                  4,                  4,                  3             ],     },     "media" : [          objectid("537ea185df872bb71e4df270"),          objectid("537ea185df872bb71e4df275"),          objectid("537ea185df872bb71e4df272")     ] }

in schema, first, second, , 3rd positivity ratings correspond first, second, , 3rd entries in media array, respectively. same true activity ratings. need calculate statistics positivity , activity ratings respect associated media objects across documents in collection. right now, i'm doing mapreduce. i'd to, however, accomplish aggregation pipeline.

ideally, i'd $unwind media, answers.ratings.positivity, , answers.ratings.activity arrays simultaneously end with, example, next 3 documents based on previous example:

[     {         "_id" : objectid("539f41a95d1887b57ab78bea"),         "answers" : {             "ratings" : {                 "positivity" : 2,                 "activity" : 4             }         },         "media" : objectid("537ea185df872bb71e4df270")     },     {         "_id" : objectid("539f41a95d1887b57ab78bea"),         "answers" : {             "ratings" : {                 "positivity" : 3                 "activity" : 4             }         },         "media" : objectid("537ea185df872bb71e4df275")     },     {         "_id" : objectid("539f41a95d1887b57ab78bea"),         "answers" : {             "ratings" : {                 "positivity" : 5                 "activity" : 3             }         },         "media" : objectid("537ea185df872bb71e4df272")     } ]

is there way accomplish this?

the current aggregation framework not allow this. beingness able unwind multiple arrays know same size , creating document ith value of each feature.

if want utilize aggregation framework need alter schema little. illustration take next document schema:

{     "_id" : objectid("539f41a95d1887b57ab78bea"),     "answers" : {         "ratings" : {             "positivity" : [                  {k:1, v:2},                  {k:2, v:3},                  {k:3, v:5}             ],             "activity" : [                  {k:1, v:4},                  {k:2, v:4},                  {k:3, v:3}             ],     }},     "media" : [          {k:1, v:objectid("537ea185df872bb71e4df270")},          {k:2, v:objectid("537ea185df872bb71e4df275")},          {k:3, v:objectid("537ea185df872bb71e4df272")}     ] }

by doing adding index object within array. after it's matter of unwinding arrays , matching on key.

db.test.aggregate([{$unwind:"$media"}, {$unwind:"$answers.ratings.positivity"}, {$unwind:"$answers.ratings.activity"}, {$project:{"media":1, "answers.ratings.positivity":1,"answers.ratings.activity":1,     include:{$and:[                   {$eq:["$media.k", "$answers.ratings.positivity.k"]},                   {$eq:["$media.k", "$answers.ratings.activity.k"]}             ]}} }, {$match:{include:true}}])

and output is:

[          {             "_id" : objectid("539f41a95d1887b57ab78bea"),             "answers" : {                 "ratings" : {                     "positivity" : {                         "k" : 1,                         "v" : 2                     },                     "activity" : {                         "k" : 1,                         "v" : 4                     }                 }             },             "media" : {                 "k" : 1,                 "v" : objectid("537ea185df872bb71e4df270")             },             "include" : true         },          {             "_id" : objectid("539f41a95d1887b57ab78bea"),             "answers" : {                 "ratings" : {                     "positivity" : {                         "k" : 2,                         "v" : 3                     },                     "activity" : {                         "k" : 2,                         "v" : 4                     }                 }             },             "media" : {                 "k" : 2,                 "v" : objectid("537ea185df872bb71e4df275")             },             "include" : true         },          {             "_id" : objectid("539f41a95d1887b57ab78bea"),             "answers" : {                 "ratings" : {                     "positivity" : {                         "k" : 3,                         "v" : 5                     },                     "activity" : {                         "k" : 3,                         "v" : 3                     }                 }             },             "media" : {                 "k" : 3,                 "v" : objectid("537ea185df872bb71e4df272")             },             "include" : true         }     ]

doing creates lot of document overhead , may slower current mapreduce implementation. need run tests check this. computations required grow in cubic way based on size of 3 arrays. should kept in mind.

mongodb mapreduce aggregation-framework

Breeding

Wednesday, 15 April 2015

MongoDB MapReduce--is there an Aggregation alternative? -

No comments:

Post a Comment