Aggregation
Pipeline
Turn raw, messy data into clean, useful answers - one step at a time.
You use this every day
- you just didn't know it
Think about the last time you used Swiggy. You open it and see "Top Restaurants Near You." That list isn't magic - somewhere in a database, Swiggy is looking at thousands of restaurants, filtering by your city, grouping by rating, sorting by delivery time, and showing you only the top 10. That is aggregation.
Zomato showing top-rated restaurants · Amazon showing "Most Bought in Electronics" · Your bank showing "Total spent this month" · YouTube showing trending videos · A sales dashboard showing city-wise revenue
The technical
explanation
MongoDB stores data as documents - like JSON objects. When you want more than just "find me this document," you use the Aggregation Pipeline.
A pipeline is an ordered list of stages. Each stage receives documents from the previous stage, transforms them, and passes the result forward. The final stage produces the output. It runs entirely inside MongoDB - your application only receives the final answer.
// The structure - just a list of stages db.collectionName.aggregate([ { $stage1: { ... } }, // step 1 - transforms data { $stage2: { ... } }, // step 2 - uses step 1's output { $stage3: { ... } } // step 3 - final transformation ])
Each stage does
one job
Think of stages like workers on an assembly line. Worker 1 picks only the right parts. Worker 2 groups them. Worker 3 sorts them. Worker 4 packages them. Each person has a clear, single responsibility. Together they produce something a single person couldn't.
MongoDB has many stages. We'll walk through the most important ones - with one simple, clear example for each.
Every example below uses this same collection. It has orders with fields: customer, city, product, amount, and status. Simple.
{ customer: "Arjun", city: "Mumbai", product: "Laptop", amount: 45000, status: "delivered" }
{ customer: "Priya", city: "Delhi", product: "Phone", amount: 32000, status: "delivered" }
{ customer: "Arjun", city: "Mumbai", product: "Chair", amount: 8000, status: "cancelled" }
{ customer: "Sneha", city: "Bangalore", product: "Laptop", amount: 45000, status: "delivered" }
{ customer: "Rohit", city: "Mumbai", product: "Phone", amount: 32000, status: "delivered" }
$matchIt's like a security guard at the gate. He checks every person. If you don't match the criteria, you're not allowed in. Only documents that pass the condition move forward in the pipeline.
$match filters documents using the same conditions as a regular MongoDB find(). Use it early in your pipeline to reduce the number of documents the later stages have to process.
{ $match: { status: "delivered" } }
$groupImagine you have all your monthly bills scattered on the table. You sort them into piles - electricity, rent, food - and add up each pile. You started with 20 bills and ended with 3 totals. That's $group.
$group groups documents by a field (the _id) and computes values using accumulators like $sum, $avg, $min, $max. Many documents go in - fewer summary documents come out.
{ $group: {
_id: "$city", // group by city
totalRevenue: { $sum: "$amount" }, // add up amounts
orderCount: { $sum: 1 } // count orders
}}
| Accumulator | What it does |
|---|---|
| $sum | Add values up - use $sum: 1 to count documents |
| $avg | Calculate the average |
| $min / $max | Smallest or largest value |
| $push | Collect all values into an array |
| $addToSet | Collect only unique values into an array |
| $first / $last | First or last value in the group |
$sort
Like a leaderboard on a game. You look at everyone's score and arrange them from highest to lowest. $sort does the same for your documents - rank them by any field, in any direction.
{ $sort: { totalRevenue: -1 } }
// -1 = descending (highest first)
// 1 = ascending (lowest first)
$project
Like cropping a photo. You have a full image but you only want to show a part of it. $project lets you keep the fields you want, hide the ones you don't, and even rename or create new fields.
{ $project: {
_id: 0, // 0 = hide
city: "$_id", // rename _id → city
totalRevenue: 1 // 1 = keep
}}
$limit & $skip$limit says: give me only the top N. $skip says: skip the first N, then start. Together they're how you build pagination - like page 1 shows results 1–10, page 2 shows 11–20.
// Show only the top 2 cities { $sort: { totalRevenue: -1 } }, { $limit: 2 } // Page 2 - skip first 10, show next 10 { $skip: 10 }, { $limit: 10 }
$unwindSay one order has 3 products in it: ["Laptop", "Mouse", "Keyboard"]. Before you can count how many times "Laptop" was ordered across all orders, you need to break that array open - one document per product. That's $unwind.
// One order with an array { order: "#001", products: ["Laptop", "Mouse", "Keyboard"] } // After $unwind: "$products" { order: "#001", products: "Laptop" } { order: "#001", products: "Mouse" } { order: "#001", products: "Keyboard" } // Now you can group by products and count
$lookup
Your orders table has a customer name. Your customers table has their phone number and city. $lookup is like a bank clerk who goes to two filing cabinets, pulls the right file from each, and staples them together into one combined record.
{ $lookup: {
from: "customers", // other collection
localField: "customer", // field in orders
foreignField: "name", // field in customers
as: "info" // name for the joined data
}}
// Result: each order now has an "info" array with the matching customer doc
$addFieldsLike a doctor who adds a single note to your file without rewriting the whole thing. You keep all existing fields - you just get one (or more) new ones added based on a calculation.
{ $addFields: {
isHighValue: { $gte: ["$amount", 20000] }
}}
// Arjun 45000 → isHighValue: true
// Rohit 8000 → isHighValue: false
// All original fields stay untouched
$count
Sometimes you don't need the details. You just need to know: how many? Like a teacher who takes attendance and says "24 present today" - one number, no details. $count returns exactly that.
db.orders.aggregate([ { $match: { status: "delivered" } }, { $count: "totalDelivered" } ]) // → { totalDelivered: 4 }
$facet
Think of a shopping mall. One entrance, but inside there's a food court, a clothing store, and an electronics shop - all running independently. $facet is the same: one query, multiple independent sub-pipelines, one combined response.
{ $facet: {
// sub-pipeline 1
byCity: [
{ $group: { _id: "$city", total: { $sum: "$amount" } } }
],
// sub-pipeline 2
topProduct: [
{ $group: { _id: "$product", sales: { $sum: "$amount" } } },
{ $sort: { sales: -1 } },
{ $limit: 1 }
]
}}
$bucket
Like a teacher who grades exams: 0–40 is Fail, 40–70 is Pass, 70–100 is Distinction. $bucket lets you define ranges and MongoDB will put each document into the right bucket. Want MongoDB to auto-decide the ranges? Use $bucketAuto.
{ $bucket: {
groupBy: "$amount",
boundaries: [0, 10000, 30000, 60000],
output: { count: { $sum: 1 } }
}}
// → { _id: 0, count: 1 } ← below 10k
// → { _id: 10000, count: 1 } ← 10k–30k
// → { _id: 30000, count: 3 } ← 30k–60k
Why not just process
data in your app?
Fair question. You could fetch all orders into Node.js and write JavaScript to filter and count. For 100 records, fine. But what happens when you have 2 million orders?
Don't ship the entire sugarcane field to the city so someone there can extract the juice. Build the factory next to the farm. Process data where it lives.
A complete pipeline
— start to finish
The manager asks: "Show me total revenue per city — only from delivered orders, highest city first." That's four steps. Filter → Group → Sort → Clean. Here's the full story.
{ name: "Arjun", city: "Mumbai", amount: 45000, status: "delivered" }
{ name: "Priya", city: "Delhi", amount: 32000, status: "delivered" }
{ name: "Arjun", city: "Mumbai", amount: 8000, status: "cancelled" }
{ name: "Sneha", city: "Bangalore", amount: 45000, status: "delivered" }
{ name: "Rohit", city: "Mumbai", amount: 32000, status: "delivered" }
{ name: "Priya", city: "Delhi", amount: 12000, status: "pending" }
Step 1 — $match
Drop the cancelled and pending orders right away. Only delivered ones move forward. 6 docs → 4 docs.
{ $match: { status: "delivered" } }
// Arjun cancelled + Priya pending are gone. 4 orders remain.
Step 2 — $group
Group the 4 orders by city and add up the amounts. 4 individual orders → 3 city totals.
{ $group: {
_id: "$city",
revenue: { $sum: "$amount" },
orders: { $sum: 1 }
}}
// Mumbai → 45000 + 32000 = 77000
// Delhi → 32000
// Bangalore → 45000
Step 3 — $sort
Put the highest revenue city at the top.
{ $sort: { revenue: -1 } }
// Mumbai (77000) → Bangalore (45000) → Delhi (32000)
Step 4 — $project
Rename _id to city and hide the internal _id field. Clean output for the frontend.
{ $project: {
_id: 0,
city: "$_id",
revenue: 1,
orders: 1
}}
This is exactly what you write in your code. MongoDB runs all four stages in sequence on the server.
db.orders.aggregate([ { $match: { status: "delivered" } }, { $group: { _id: "$city", revenue: { $sum: "$amount" }, orders: { $sum: 1 } }}, { $sort: { revenue: -1 } }, { $project: { _id: 0, city: "$_id", revenue: 1, orders: 1 } } ])
{ city: "Mumbai", revenue: 77000, orders: 2 }
{ city: "Bangalore", revenue: 45000, orders: 1 }
{ city: "Delhi", revenue: 32000, orders: 1 }
We started with 6 raw order documents. After four stages — filter, group, sort, shape — we ended with 3 clean city summaries. The manager gets exactly what they asked for. Nothing more, nothing less.
All stages
at a glance
-1 descending, 1 ascending.