Mongodb Map Reduce
π
2026-06-19 | π MongoDB
MongoDB Map Reduce
Map-Reduce is a computational model. Simply put, it involves breaking down a large batch of work (data) into smaller tasks (MAP), executing them, and then merging the results into a final outcome (REDUCE).
The Map-Reduce provided by MongoDB is very flexible and quite practical for large-scale data analysis.
### MapReduce Command
Here is the basic syntax for MapReduce:
>db.collection.mapReduce( function() {emit(key,value);}, //map function function(key,values) {return reduceFunction}, //reduce function { out: collection, query: document, sort: document, limit: number })
To use MapReduce, you need to implement two functions: the Map function and the Reduce function. The Map function calls emit(key, value), iterates through all records in the collection, and passes the key and value to the Reduce function for processing.
The Map function must call emit(key, value) to return key-value pairs.
Parameter Description:
* **map**: The mapping function (generates a sequence of key-value pairs as arguments for the reduce function).
* **reduce**: The aggregation function. The task of the reduce function is to transform key-values into a key-value, meaning it converts the values array into a single value.
* **out**: The collection where the statistical results are stored (if not specified, a temporary collection is used, which is automatically deleted after the client disconnects).
* **query**: A filter condition. Only documents that meet the condition will have the map function called. (query, limit, and sort can be combined freely).
* **sort**: A sort parameter used in conjunction with limit (it sorts documents before sending them to the map function), which can optimize the grouping mechanism.
* **limit**: The upper limit on the number of documents sent to the map function (if there is no limit, using sort alone is not very useful).
The following example searches for data with status:"A" in the `orders` collection, groups it by `cust_id`, and calculates the sum of `amount`.
!
* * *
## Using MapReduce
Consider the following document structure storing user posts. The document stores the user's `user_name` and the post's `status` field:
>db.posts.insert({ "post_text": ", the most comprehensive technical documentation.", "user_name": "mark", "status":"active"})WriteResult({ "nInserted" : 1 })>db.posts.insert({ "post_text": ", the most comprehensive technical documentation.", "user_name": "mark", "status":"active"})WriteResult({ "nInserted" : 1 })>db.posts.insert({ "post_text": ", the most comprehensive technical documentation.", "user_name": "mark", "status":"active"})WriteResult({ "nInserted" : 1 })>db.posts.insert({ "post_text": ", the most comprehensive technical documentation.", "user_name": "mark", "status":"active"})WriteResult({ "nInserted" : 1 })>db.posts.insert({ "post_text": ", the most comprehensive technical documentation.", "user_name": "mark", "status":"disabled"})WriteResult({ "nInserted" : 1 })>db.posts.insert({ "post_text": ", the most comprehensive technical documentation.", "user_name": "tutorial", "status":"disabled"})WriteResult({ "nInserted" : 1 })>db.posts.insert({ "post_text": ", the most comprehensive technical documentation.", "user_name": "tutorial", "status":"disabled"})WriteResult({ "nInserted" : 1 })>db.posts.insert({ "post_text": ", the most comprehensive technical documentation.", "user_name": "tutorial", "status":"active"})WriteResult({ "nInserted" : 1 })
Now, we will use the mapReduce function in the `posts` collection to select published posts (status:"active"), group them by `user_name`, and calculate the number of posts per user:
>db.posts.mapReduce( function() { emit(this.user_name,1); }, function(key, values) {return Array.sum(values)}, { query:{status:"active"}, out:"post_total" })
The output of the above mapReduce is:
{ "result" : "post_total", "timeMillis" : 23, "counts" : { "input" : 5, "emit" : 5, "reduce" : 1, "output" : 2 }, "ok" : 1}
The result indicates that there are 5 documents meeting the query condition (status:"active"). The map function generated 5 key-value pair documents, and finally, the reduce function grouped them into 2 groups based on identical keys.
Detailed Parameter Description:
* result: The name of the collection storing the results. This is a temporary collection that is automatically deleted when the MapReduce connection closes.
* timeMillis: The time taken to execute, in milliseconds.
* input: The number of documents meeting the condition that were sent to the map function.
* emit: The number of times emit was called in the map function, which is the total amount of data in all collections.
* output: The number of documents in the result collection **(counts are very helpful for debugging)**.
* ok: Whether it was successful. 1 indicates success.
* err: If it failed, the reason for the failure may be here. However, from experience, the reason is often vague and not very useful.
Use the `find` operator to view the query results of mapReduce:
> var map=function() { emit(this.user_name,1); }> var reduce=function(key, values) {return Array.sum(values)}> var options={query:{status:"active"},out:"post_total"}> db.posts.mapReduce(map,reduce,options){ "result" : "post_total", "ok" : 1 }> db.post_total.find();
The above query displays the following results:
{ "_id" : "mark", "value" : 4 }{ "_id" : "tutorial", "value" : 1 }
In a similar way, MapReduce can be used to build large, complex aggregation queries.
The Map and Reduce functions can be implemented using JavaScript, making MapReduce very flexible and powerful to use.