AGG
MongoDB

Aggregation
Pipeline

Turn raw, messy data into clean, useful answers - one step at a time.

Presented by Kishan Vyas
$match
Filter
$group
Summarise
$sort
Rank
$project
Shape
$lookup
Join
$facet
Multi-report
Where it's used

You use this every day
- you just didn't know it

Think about the last time you used Swiggy. You open it and see "Top Restaurants Near You." That list isn't magic - somewhere in a database, Swiggy is looking at thousands of restaurants, filtering by your city, grouping by rating, sorting by delivery time, and showing you only the top 10. That is aggregation.

Real-world examples where aggregation runs

Zomato showing top-rated restaurants  ·  Amazon showing "Most Bought in Electronics"  ·  Your bank showing "Total spent this month"  ·  YouTube showing trending videos  ·  A sales dashboard showing city-wise revenue

How it works

The technical
explanation

MongoDB stores data as documents - like JSON objects. When you want more than just "find me this document," you use the Aggregation Pipeline.

Technical definition

A pipeline is an ordered list of stages. Each stage receives documents from the previous stage, transforms them, and passes the result forward. The final stage produces the output. It runs entirely inside MongoDB - your application only receives the final answer.

aggregation syntax
// The structure - just a list of stages
db.collectionName.aggregate([
  { $stage1: { ... } },   // step 1 - transforms data
  { $stage2: { ... } },   // step 2 - uses step 1's output
  { $stage3: { ... } }    // step 3 - final transformation
])
The stages

Each stage does
one job

Think of stages like workers on an assembly line. Worker 1 picks only the right parts. Worker 2 groups them. Worker 3 sorts them. Worker 4 packages them. Each person has a clear, single responsibility. Together they produce something a single person couldn't.

MongoDB has many stages. We'll walk through the most important ones - with one simple, clear example for each.

Our data - an orders collection

Every example below uses this same collection. It has orders with fields: customer, city, product, amount, and status. Simple.

orders collection
{ customer: "Arjun",  city: "Mumbai",    product: "Laptop", amount: 45000, status: "delivered" }
{ customer: "Priya",   city: "Delhi",     product: "Phone",  amount: 32000, status: "delivered" }
{ customer: "Arjun",  city: "Mumbai",    product: "Chair",  amount: 8000,  status: "cancelled" }
{ customer: "Sneha",   city: "Bangalore", product: "Laptop", amount: 45000, status: "delivered" }
{ customer: "Rohit",   city: "Mumbai",    product: "Phone",  amount: 32000, status: "delivered" }
01
$match
Filter - keep only what you need
In plain English

It's like a security guard at the gate. He checks every person. If you don't match the criteria, you're not allowed in. Only documents that pass the condition move forward in the pipeline.

Technically

$match filters documents using the same conditions as a regular MongoDB find(). Use it early in your pipeline to reduce the number of documents the later stages have to process.

$match - keep only delivered orders
{ $match: { status: "delivered" } }
5 orders in
Arjun    delivered
Priya    delivered
Arjun    cancelled
Sneha    delivered
Rohit    delivered
$match
4 orders pass
Arjun    delivered
Priya    delivered
Sneha    delivered
Rohit    delivered
02
$group
Summarise - collapse many into few
In plain English

Imagine you have all your monthly bills scattered on the table. You sort them into piles - electricity, rent, food - and add up each pile. You started with 20 bills and ended with 3 totals. That's $group.

Technically

$group groups documents by a field (the _id) and computes values using accumulators like $sum, $avg, $min, $max. Many documents go in - fewer summary documents come out.

$group - total revenue per city
{ $group: {
  _id:          "$city",         // group by city
  totalRevenue: { $sum: "$amount" }, // add up amounts
  orderCount:   { $sum: 1 }          // count orders
}}
4 individual orders
Arjun   Mumbai   45000
Priya   Delhi    32000
Sneha   Bangalore 45000
Rohit   Mumbai   32000
$group
3 city summaries
Mumbai     77000
Delhi      32000
Bangalore 45000
AccumulatorWhat it does
$sumAdd values up  -  use $sum: 1 to count documents
$avgCalculate the average
$min / $maxSmallest or largest value
$pushCollect all values into an array
$addToSetCollect only unique values into an array
$first / $lastFirst or last value in the group
03
$sort
Rank - put things in order
In plain English

Like a leaderboard on a game. You look at everyone's score and arrange them from highest to lowest. $sort does the same for your documents - rank them by any field, in any direction.

$sort - highest revenue city first
{ $sort: { totalRevenue: -1 } }
// -1 = descending (highest first)
//  1 = ascending  (lowest first)
Unsorted
Mumbai     77000
Delhi      32000
Bangalore 45000
$sort -1
Ranked
#1  Mumbai     77000
#2  Bangalore 45000
#3  Delhi      32000
04
$project
Shape - choose what the output looks like
In plain English

Like cropping a photo. You have a full image but you only want to show a part of it. $project lets you keep the fields you want, hide the ones you don't, and even rename or create new fields.

$project - rename _id to city, hide internal fields
{ $project: {
  _id:          0,       // 0 = hide
  city:         "$_id", // rename _id → city
  totalRevenue: 1        // 1 = keep
}}
Before
_id: "Mumbai"
totalRevenue: 77000
orderCount: 2
internalRef: "MH-01"
$project
After
city: "Mumbai"
totalRevenue: 77000
05
$limit & $skip
Paginate - control how many you get
In plain English

$limit says: give me only the top N. $skip says: skip the first N, then start. Together they're how you build pagination - like page 1 shows results 1–10, page 2 shows 11–20.

$limit & $skip - top 2 cities / page 2
// Show only the top 2 cities
{ $sort:  { totalRevenue: -1 } },
{ $limit: 2 }

// Page 2 - skip first 10, show next 10
{ $skip:  10 },
{ $limit: 10 }
06
$unwind
Flatten - open arrays into individual documents
In plain English

Say one order has 3 products in it: ["Laptop", "Mouse", "Keyboard"]. Before you can count how many times "Laptop" was ordered across all orders, you need to break that array open - one document per product. That's $unwind.

$unwind - flatten products array
// One order with an array
{ order: "#001", products: ["Laptop", "Mouse", "Keyboard"] }

// After $unwind: "$products"
{ order: "#001", products: "Laptop"   }
{ order: "#001", products: "Mouse"    }
{ order: "#001", products: "Keyboard" }
// Now you can group by products and count
07
$lookup
Join - combine data from two collections
In plain English

Your orders table has a customer name. Your customers table has their phone number and city. $lookup is like a bank clerk who goes to two filing cabinets, pulls the right file from each, and staples them together into one combined record.

$lookup - attach customer details to each order
{ $lookup: {
  from:         "customers", // other collection
  localField:   "customer",  // field in orders
  foreignField: "name",       // field in customers
  as:           "info"        // name for the joined data
}}
// Result: each order now has an "info" array with the matching customer doc
08
$addFields
Enrich - add new computed fields
In plain English

Like a doctor who adds a single note to your file without rewriting the whole thing. You keep all existing fields - you just get one (or more) new ones added based on a calculation.

$addFields - flag orders above ₹20,000
{ $addFields: {
  isHighValue: { $gte: ["$amount", 20000] }
}}
// Arjun  45000  → isHighValue: true
// Rohit  8000   → isHighValue: false
// All original fields stay untouched
09
$count
Count - just give me the number
In plain English

Sometimes you don't need the details. You just need to know: how many? Like a teacher who takes attendance and says "24 present today" - one number, no details. $count returns exactly that.

$count - how many delivered orders?
db.orders.aggregate([
  { $match: { status: "delivered" } },
  { $count: "totalDelivered" }
])
// → { totalDelivered: 4 }
10
$facet
Multi-report - run several pipelines in one shot
In plain English

Think of a shopping mall. One entrance, but inside there's a food court, a clothing store, and an electronics shop - all running independently. $facet is the same: one query, multiple independent sub-pipelines, one combined response.

$facet - revenue by city + top product in one query
{ $facet: {

  // sub-pipeline 1
  byCity: [
    { $group: { _id: "$city", total: { $sum: "$amount" } } }
  ],

  // sub-pipeline 2
  topProduct: [
    { $group: { _id: "$product", sales: { $sum: "$amount" } } },
    { $sort:  { sales: -1 } },
    { $limit: 1 }
  ]

}}
11
$bucket
Range groups - classify into buckets
In plain English

Like a teacher who grades exams: 0–40 is Fail, 40–70 is Pass, 70–100 is Distinction. $bucket lets you define ranges and MongoDB will put each document into the right bucket. Want MongoDB to auto-decide the ranges? Use $bucketAuto.

$bucket - group orders by spending range
{ $bucket: {
  groupBy:    "$amount",
  boundaries: [0, 10000, 30000, 60000],
  output: { count: { $sum: 1 } }
}}
// → { _id: 0,     count: 1 }  ← below 10k
// → { _id: 10000, count: 1 }  ← 10k–30k
// → { _id: 30000, count: 3 }  ← 30k–60k
Why use it

Why not just process
data in your app?

Fair question. You could fetch all orders into Node.js and write JavaScript to filter and count. For 100 records, fine. But what happens when you have 2 million orders?

Doing it in Node.js
2 million documents travel the network
Server RAM spikes holding them all
JavaScript loop runs for 30 seconds
Users stare at a loading spinner
Cloud costs go up
Using Aggregation Pipeline
MongoDB processes everything internally
Uses indexes - extremely fast
Only the final result crosses the network
Response in milliseconds
Scales without breaking a sweat
The principle

Don't ship the entire sugarcane field to the city so someone there can extract the juice. Build the factory next to the farm. Process data where it lives.

Putting it together

A complete pipeline
— start to finish

The manager asks: "Show me total revenue per city — only from delivered orders, highest city first." That's four steps. Filter → Group → Sort → Clean. Here's the full story.

orders collection
{ name: "Arjun", city: "Mumbai",    amount: 45000, status: "delivered" }
{ name: "Priya", city: "Delhi",     amount: 32000, status: "delivered" }
{ name: "Arjun", city: "Mumbai",    amount: 8000,  status: "cancelled" }
{ name: "Sneha", city: "Bangalore", amount: 45000, status: "delivered" }
{ name: "Rohit", city: "Mumbai",    amount: 32000, status: "delivered" }
{ name: "Priya", city: "Delhi",     amount: 12000, status: "pending"   }

Step 1 — $match

Drop the cancelled and pending orders right away. Only delivered ones move forward. 6 docs → 4 docs.

step 1
{ $match: { status: "delivered" } }
// Arjun cancelled + Priya pending are gone. 4 orders remain.

Step 2 — $group

Group the 4 orders by city and add up the amounts. 4 individual orders → 3 city totals.

step 2
{ $group: {
  _id:     "$city",
  revenue: { $sum: "$amount" },
  orders:  { $sum: 1 }
}}
// Mumbai → 45000 + 32000 = 77000
// Delhi  → 32000
// Bangalore → 45000

Step 3 — $sort

Put the highest revenue city at the top.

step 3
{ $sort: { revenue: -1 } }
// Mumbai (77000) → Bangalore (45000) → Delhi (32000)

Step 4 — $project

Rename _id to city and hide the internal _id field. Clean output for the frontend.

step 4
{ $project: {
  _id:     0,
  city:    "$_id",
  revenue: 1,
  orders:  1
}}
All four steps combined into one query

This is exactly what you write in your code. MongoDB runs all four stages in sequence on the server.

full pipeline
db.orders.aggregate([

  { $match:   { status: "delivered" } },

  { $group:   {
      _id:     "$city",
      revenue: { $sum: "$amount" },
      orders:  { $sum: 1 }
  }},

  { $sort:    { revenue: -1 } },

  { $project: { _id: 0, city: "$_id", revenue: 1, orders: 1 } }

])
output
{ city: "Mumbai",    revenue: 77000, orders: 2 }
{ city: "Bangalore", revenue: 45000, orders: 1 }
{ city: "Delhi",     revenue: 32000, orders: 1 }
What just happened

We started with 6 raw order documents. After four stages — filter, group, sort, shape — we ended with 3 clean city summaries. The manager gets exactly what they asked for. Nothing more, nothing less.

Quick reference

All stages
at a glance

$match
Filter documents. Only matching ones move forward.
Security guard at the gate
$group
Group and compute summaries - sum, avg, count, min, max.
Sorting bills into piles and totalling each
$sort
Order results. -1 descending, 1 ascending.
Game leaderboard
$project
Keep, hide, rename, or compute fields in the output.
Cropping a photo
$limit
Keep only the first N documents.
Showing only top 10 results
$skip
Skip the first N documents. Used with $limit for pagination.
Jumping to page 2
$unwind
Flatten an array field - one document per element.
Opening a box to sort items individually
$lookup
Join data from another collection (like a SQL JOIN).
Fetching files from two separate offices
$addFields
Add new computed fields without removing existing ones.
Doctor adding a note to your file
$count
Return the total count of documents as a single number.
Teacher taking attendance
$facet
Run multiple sub-pipelines simultaneously on the same data.
A shopping mall - multiple stores, one entrance
$bucket
Group documents into defined ranges. $bucketAuto picks ranges automatically.
Exam grading - Fail, Pass, Distinction