Skip to content

MongoDB Aggregation

MongoDB provides a powerful aggregation framework that allows us to perform complex query and analysis operations on data. Aggregation operations can help us calculate statistics, group data, sort data, and more.

Basic Concepts

Aggregation Pipeline

MongoDB's aggregation operations use the pipeline pattern. Data passes through a series of stages, each performing some operations on the data, and finally outputs the results we need.

Common Aggregation Stages

  1. $match: Used to filter documents
  2. $group: Used to group documents
  3. $sort: Used to sort documents
  4. $limit: Used to limit the number of returned documents
  5. $skip: Used to skip a specified number of documents
  6. $project: Used to project fields
  7. $unwind: Used to unwind array fields
  8. $lookup: Used to perform join operations

Basic Aggregation Operations

Using $match and $group Stages

javascript
// Count the number of users for each status
db.users.aggregate([
  { $match: { status: { $in: ["active", "pending", "inactive"] } } },
  { $group: { _id: "$status", count: { $sum: 1 } } },
  { $sort: { count: -1 } }
])

Using $project Stage

javascript
// Calculate the full name and age of users, returning only specific fields
db.users.aggregate([
  {
    $project: {
      fullName: { $concat: ["$firstName", " ", "$lastName"] },
      age: 1,
      email: 1,
      _id: 0
    }
  }
])

Using $unwind Stage

javascript
// Unwind the tags array field
db.users.aggregate([
  { $unwind: "$tags" }
])

Using $lookup Stage

javascript
// Perform a join operation between users and their orders
db.users.aggregate([
  {
    $lookup: {
      from: "orders",
      localField: "_id",
      foreignField: "userId",
      as: "orders"
    }
  }
])

Common Aggregation Operators

Mathematical Operators

  1. $sum: Calculate the sum
  2. $avg: Calculate the average
  3. $min: Calculate the minimum value
  4. $max: Calculate the maximum value
  5. $push: Add values to an array
  6. $addToSet: Add values to an array, ensuring uniqueness

String Operators

  1. $concat: Concatenate strings
  2. $toUpper: Convert strings to uppercase
  3. $toLower: Convert strings to lowercase
  4. $substr: Extract a substring

Date Operators

  1. $year: Extract the year
  2. $month: Extract the month
  3. $dayOfMonth: Extract the day of the month
  4. $hour: Extract the hour
  5. $minute: Extract the minute
  6. $second: Extract the second

Optimizing Aggregation Operations

Pipeline Order

When using aggregation operations, we should pay attention to the order of operations. In general, we should place filtering operations first to reduce the amount of data processed in subsequent stages.

Using Indexes

When using the $match and $sort stages, we should ensure that these stages use appropriate indexes to improve query performance.

Limiting Returned Data

We can use the $limit and $skip stages to limit the amount of returned data, thus reducing network transfer and client-side memory usage.

Comparison with MapReduce

While MongoDB provides MapReduce functionality, the aggregation framework generally has better performance and is easier to use in most cases. Therefore, we should prioritize using the aggregation framework instead of MapReduce.

Performance Considerations

  1. Data Size: The performance of aggregation operations depends on the size of the data. For large datasets, aggregation operations may take longer.
  2. Index Usage: Creating appropriate indexes for query fields and sorting fields can improve query performance.
  3. Memory Limits: Aggregation operations have a default memory limit (100MB). If the memory required for an aggregation operation exceeds this limit, MongoDB will spill data to disk, which will cause performance degradation.

Summary

MongoDB's aggregation framework is a powerful tool for querying and analyzing data, allowing us to perform complex operations on data. By using different aggregation stages and operators, we can implement various query requirements. When using aggregation operations, we need to pay attention to the order of operations, index usage, and performance optimization to ensure efficient query execution.

Content is for learning and research only.