MongoDB's aggregation (aggregate) is primarily used for processing data (such as calculating averages, sums, etc.) and returning the computed results.
It is somewhat similar to the count(*) statement in SQL.
aggregate() Method
In MongoDB, the aggregation method uses aggregate().
Syntax
The basic syntax format of the aggregate() method is as follows:
db.COLLECTION_NAME.aggregate(AGGREGATE_OPERATION)
Example
The data in the collection is as follows:
{
_id: ObjectId(7df78ad8902c),
title: 'MongoDB Overview',
description: 'MongoDB is no sql database',
by_user: 'example.com',
url: '
tags: ['mongodb', 'database', 'NoSQL'],
likes: 100
},
{
_id: ObjectId(7df78ad8902d),
title: 'NoSQL Overview',
description: 'No sql database is very fast',
by_user: 'example.com',
url: '
tags: ['mongodb', 'database', 'NoSQL'],
likes: 10
},
{
_id: ObjectId(7df78ad8902e),
title: 'Neo4j Overview',
description: 'Neo4j is no sql database',
by_user: 'Neo4j',
url: 'http://www.neo4j.com',
tags: ['neo4j', 'database', 'NoSQL'],
likes: 750
}
Now, we use the above collection to calculate the number of articles written by each author. The result using aggregate() is as follows:
db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$sum : 1}}}])
{ "result" : [ { "_id" : "example.com", "num_tutorial" : 2 }, { "_id" : "Neo4j", "num_tutorial" : 1 } ], "ok" : 1 }
The above example is similar to the SQL statement:
select by_user, count(*) from mycol group by by_user
In the example above, we group the data by the by_user field and calculate the sum of identical values in the by_user field.
The following table shows some aggregation expressions:
| Expression | Description | Example |
|---|---|---|
$sum |
Calculates the sum. | db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$sum : "$likes"}}}]) |
$avg |
Calculates the average. | db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$avg : "$likes"}}}]) |
$min |
Gets the minimum value of the corresponding field from all documents in the collection. | db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$min : "$likes"}}}]) |
$max |
Gets the maximum value of the corresponding field from all documents in the collection. | db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$max : "$likes"}}}]) |
$push |
Adds a value to an array. Does not check for duplicate values. | db.mycol.aggregate([{$group : {_id : "$by_user", url : {$push: "$url"}}}]) |
$addToSet |
Adds a value to an array, checking for duplicates. If the same value already exists in the array, it is not added. | db.mycol.aggregate([{$group : {_id : "$by_user", url : {$addToSet : "$url"}}}]) |
$first |
Gets the first document's data based on the resource document's sort order. | db.mycol.aggregate([{$group : {_id : "$by_user", first_url : {$first : "$url"}}}]) |
$last |
Gets the last document's data based on the resource document's sort order. | db.mycol.aggregate([{$group : {_id : "$by_user", last_url : {$last : "$url"}}}]) |
The Concept of Pipelines
In Unix and Linux, pipelines are generally used to pass the output of one command as the input to the next command.
MongoDB's aggregation pipeline processes MongoDB documents through a series of stages. The result of one stage is passed to the next. Pipeline operations can be repeated.
Expressions: Process input documents and output results. Expressions are stateless; they can only process documents within the current aggregation pipeline stage and cannot process other documents.
Here, we introduce some commonly used operations in the aggregation framework:
$project: Modifies the structure of the input document. It can be used to rename, add, or remove fields, and also to create computed results or nested documents.$match: Used to filter data, outputting only documents that meet the criteria.$matchuses MongoDB's standard query operations.$limit: Used to limit the number of documents returned by the MongoDB aggregation pipeline.$skip: Skips a specified number of documents in the aggregation pipeline and returns the remaining documents.$unwind: Splits an array field in a document into multiple documents, each containing one value from the array.$group: Groups documents in a collection, which can be used for statistical results.$sort: Sorts the input documents before outputting them.$geoNear: Outputs ordered documents that are close to a specific geographic location.
Pipeline Operator Examples
1. $project Example
db.article.aggregate( { $project : { title : 1 , author : 1 , }} );
This way, the result will only contain the _id, title, and author fields. By default, the _id field is included. If you want to exclude _id, you can do this:
db.article.aggregate( { $project : { _id : 0 , title : 1 , author : 1 }});
2. $match Example
db.articles.aggregate( [ { $match : { score : { $gt : 70, $lte : 90 } } }, { $group: { _id: null, count: { $sum: 1 } } } ] );
$match is used to get records with a score greater than 70 and less than or equal to 90, and then passes the matching records to the next stage, the $group pipeline operator, for processing.
3. $skip Example
db.article.aggregate( { $skip : 5 });
After processing by the $skip pipeline operator, the first five documents are "filtered" out.
YouTip