AGG

MongoDB

Aggregation
Pipeline

Turn raw, messy data into clean, useful answers - one step at a time.

Presented by Kishan Vyas

$match

Filter

$group

Summarise

$sort

Rank

$project

Shape

$lookup

Join

$facet

Multi-report

Where it's used

You use this every day
- you just didn't know it

Think about the last time you used Swiggy. You open it and see "Top Restaurants Near You." That list isn't magic - somewhere in a database, Swiggy is looking at thousands of restaurants, filtering by your city, grouping by rating, sorting by delivery time, and showing you only the top 10. That is aggregation.

Real-world examples where aggregation runs

Zomato showing top-rated restaurants · Amazon showing "Most Bought in Electronics" · Your bank showing "Total spent this month" · YouTube showing trending videos · A sales dashboard showing city-wise revenue

How it works

The technical
explanation

MongoDB stores data as documents - like JSON objects. When you want more than just "find me this document," you use the Aggregation Pipeline.

Technical definition

A pipeline is an ordered list of stages. Each stage receives documents from the previous stage, transforms them, and passes the result forward. The final stage produces the output. It runs entirely inside MongoDB - your application only receives the final answer.

// The structure - just a list of stages
db.collectionName.aggregate([
  { $stage1: { ... } },   // step 1 - transforms data
  { $stage2: { ... } },   // step 2 - uses step 1's output
  { $stage3: { ... } }    // step 3 - final transformation
])

The stages

Each stage does
one job

Think of stages like workers on an assembly line. Worker 1 picks only the right parts. Worker 2 groups them. Worker 3 sorts them. Worker 4 packages them. Each person has a clear, single responsibility. Together they produce something a single person couldn't.

MongoDB has many stages. We'll walk through the most important ones - with one simple, clear example for each.

Our data - an orders collection

Every example below uses this same collection. It has orders with fields: customer, city, product, amount, and status. Simple.

{ customer: "Arjun",  city: "Mumbai",    product: "Laptop", amount: 45000, status: "delivered" }
{ customer: "Priya",   city: "Delhi",     product: "Phone",  amount: 32000, status: "delivered" }
{ customer: "Arjun",  city: "Mumbai",    product: "Chair",  amount: 8000,  status: "cancelled" }
{ customer: "Sneha",   city: "Bangalore", product: "Laptop", amount: 45000, status: "delivered" }
{ customer: "Rohit",   city: "Mumbai",    product: "Phone",  amount: 32000, status: "delivered" }

$match

Filter - keep only what you need

In plain English

It's like a security guard at the gate. He checks every person. If you don't match the criteria, you're not allowed in. Only documents that pass the condition move forward in the pipeline.

Technically

$match filters documents using the same conditions as a regular MongoDB find(). Use it early in your pipeline to reduce the number of documents the later stages have to process.

{ $match: { status: "delivered" } }

5 orders in

Arjun delivered

Priya delivered

Arjun cancelled

Sneha delivered

Rohit delivered

$match

→

4 orders pass

Arjun delivered

Priya delivered

Sneha delivered

Rohit delivered

$group

Summarise - collapse many into few

In plain English

Imagine you have all your monthly bills scattered on the table. You sort them into piles - electricity, rent, food - and add up each pile. You started with 20 bills and ended with 3 totals. That's $group.

Technically

$group groups documents by a field (the _id) and computes values using accumulators like $sum, $avg, $min, $max. Many documents go in - fewer summary documents come out.

{ $group: {
  _id:          "$city",         // group by city
  totalRevenue: { $sum: "$amount" }, // add up amounts
  orderCount:   { $sum: 1 }          // count orders
}}

4 individual orders

Arjun Mumbai 45000

Priya Delhi 32000

Sneha Bangalore 45000

Rohit Mumbai 32000

$group

→

3 city summaries

Mumbai 77000

Delhi 32000

Bangalore 45000

Accumulator	What it does
$sum	Add values up - use `$sum: 1` to count documents
$avg	Calculate the average
$min / $max	Smallest or largest value
$push	Collect all values into an array
$addToSet	Collect only unique values into an array
$first / $last	First or last value in the group

$sort

Rank - put things in order

In plain English

Like a leaderboard on a game. You look at everyone's score and arrange them from highest to lowest. $sort does the same for your documents - rank them by any field, in any direction.

{ $sort: { totalRevenue: -1 } }
// -1 = descending (highest first)
//  1 = ascending  (lowest first)

Unsorted

Mumbai 77000

Delhi 32000

Bangalore 45000

$sort -1

→

Ranked

#1 Mumbai 77000

#2 Bangalore 45000

#3 Delhi 32000

$project

Shape - choose what the output looks like

In plain English

Like cropping a photo. You have a full image but you only want to show a part of it. $project lets you keep the fields you want, hide the ones you don't, and even rename or create new fields.

{ $project: {
  _id:          0,       // 0 = hide
  city:         "$_id", // rename _id → city
  totalRevenue: 1        // 1 = keep
}}

Before

_id: "Mumbai"

totalRevenue: 77000

orderCount: 2

internalRef: "MH-01"

$project

→

After

city: "Mumbai"

totalRevenue: 77000

$limit & $skip

Paginate - control how many you get

In plain English

$limit says: give me only the top N. $skip says: skip the first N, then start. Together they're how you build pagination - like page 1 shows results 1–10, page 2 shows 11–20.

// Show only the top 2 cities
{ $sort:  { totalRevenue: -1 } },
{ $limit: 2 }

// Page 2 - skip first 10, show next 10
{ $skip:  10 },
{ $limit: 10 }

$unwind

Flatten - open arrays into individual documents

In plain English

Say one order has 3 products in it: ["Laptop", "Mouse", "Keyboard"]. Before you can count how many times "Laptop" was ordered across all orders, you need to break that array open - one document per product. That's $unwind.

// One order with an array
{ order: "#001", products: ["Laptop", "Mouse", "Keyboard"] }

// After $unwind: "$products"
{ order: "#001", products: "Laptop"   }
{ order: "#001", products: "Mouse"    }
{ order: "#001", products: "Keyboard" }
// Now you can group by products and count

$lookup

Join - combine data from two collections

In plain English

Your orders table has a customer name. Your customers table has their phone number and city. $lookup is like a bank clerk who goes to two filing cabinets, pulls the right file from each, and staples them together into one combined record.

{ $lookup: {
  from:         "customers", // other collection
  localField:   "customer",  // field in orders
  foreignField: "name",       // field in customers
  as:           "info"        // name for the joined data
}}
// Result: each order now has an "info" array with the matching customer doc

$addFields

Enrich - add new computed fields

In plain English

Like a doctor who adds a single note to your file without rewriting the whole thing. You keep all existing fields - you just get one (or more) new ones added based on a calculation.

{ $addFields: {
  isHighValue: { $gte: ["$amount", 20000] }
}}
// Arjun  45000  → isHighValue: true
// Rohit  8000   → isHighValue: false
// All original fields stay untouched

$count

Count - just give me the number

In plain English

Sometimes you don't need the details. You just need to know: how many? Like a teacher who takes attendance and says "24 present today" - one number, no details. $count returns exactly that.

db.orders.aggregate([
  { $match: { status: "delivered" } },
  { $count: "totalDelivered" }
])
// → { totalDelivered: 4 }

$bucket

Range groups - classify into buckets

In plain English

Like a teacher who grades exams: 0–40 is Fail, 40–70 is Pass, 70–100 is Distinction. $bucket lets you define ranges and MongoDB will put each document into the right bucket. Want MongoDB to auto-decide the ranges? Use $bucketAuto.

{ $bucket: {
  groupBy:    "$amount",
  boundaries: [0, 10000, 30000, 60000],
  output: { count: { $sum: 1 } }
}}
// → { _id: 0,     count: 1 }  ← below 10k
// → { _id: 10000, count: 1 }  ← 10k–30k
// → { _id: 30000, count: 3 }  ← 30k–60k

Why use it

Why not just process
data in your app?

Fair question. You could fetch all orders into Node.js and write JavaScript to filter and count. For 100 records, fine. But what happens when you have 2 million orders?

Doing it in Node.js

✗2 million documents travel the network

✗Server RAM spikes holding them all

✗JavaScript loop runs for 30 seconds

✗Users stare at a loading spinner

✗Cloud costs go up

Using Aggregation Pipeline

✓MongoDB processes everything internally

✓Uses indexes - extremely fast

✓Only the final result crosses the network

✓Response in milliseconds

✓Scales without breaking a sweat

The principle

Don't ship the entire sugarcane field to the city so someone there can extract the juice. Build the factory next to the farm. Process data where it lives.

Putting it together

A complete pipeline
— start to finish

The manager asks: "Show me total revenue per city — only from delivered orders, highest city first." That's four steps. Filter → Group → Sort → Clean. Here's the full story.

{ name: "Arjun", city: "Mumbai",    amount: 45000, status: "delivered" }
{ name: "Priya", city: "Delhi",     amount: 32000, status: "delivered" }
{ name: "Arjun", city: "Mumbai",    amount: 8000,  status: "cancelled" }
{ name: "Sneha", city: "Bangalore", amount: 45000, status: "delivered" }
{ name: "Rohit", city: "Mumbai",    amount: 32000, status: "delivered" }
{ name: "Priya", city: "Delhi",     amount: 12000, status: "pending"   }

Step 1 — $match

Drop the cancelled and pending orders right away. Only delivered ones move forward. 6 docs → 4 docs.

{ $match: { status: "delivered" } }
// Arjun cancelled + Priya pending are gone. 4 orders remain.

Step 2 — $group

Group the 4 orders by city and add up the amounts. 4 individual orders → 3 city totals.

{ $group: {
  _id:     "$city",
  revenue: { $sum: "$amount" },
  orders:  { $sum: 1 }
}}
// Mumbai → 45000 + 32000 = 77000
// Delhi  → 32000
// Bangalore → 45000

Step 3 — $sort

Put the highest revenue city at the top.

{ $sort: { revenue: -1 } }
// Mumbai (77000) → Bangalore (45000) → Delhi (32000)

Step 4 — $project

Rename _id to city and hide the internal _id field. Clean output for the frontend.

{ $project: {
  _id:     0,
  city:    "$_id",
  revenue: 1,
  orders:  1
}}

All four steps combined into one query

This is exactly what you write in your code. MongoDB runs all four stages in sequence on the server.

db.orders.aggregate([

  { $match:   { status: "delivered" } },

  { $group:   {
      _id:     "$city",
      revenue: { $sum: "$amount" },
      orders:  { $sum: 1 }
  }},

  { $sort:    { revenue: -1 } },

  { $project: { _id: 0, city: "$_id", revenue: 1, orders: 1 } }

])

{ city: "Mumbai",    revenue: 77000, orders: 2 }
{ city: "Bangalore", revenue: 45000, orders: 1 }
{ city: "Delhi",     revenue: 32000, orders: 1 }

What just happened

We started with 6 raw order documents. After four stages — filter, group, sort, shape — we ended with 3 clean city summaries. The manager gets exactly what they asked for. Nothing more, nothing less.

Quick reference

All stages
at a glance

$match

Filter documents. Only matching ones move forward.

Security guard at the gate

$group

Group and compute summaries - sum, avg, count, min, max.

Sorting bills into piles and totalling each

$sort

Order results. -1 descending, 1 ascending.

Game leaderboard

$project

Keep, hide, rename, or compute fields in the output.

Cropping a photo

$limit

Keep only the first N documents.

Showing only top 10 results

$skip

Skip the first N documents. Used with $limit for pagination.

Jumping to page 2

$unwind

Flatten an array field - one document per element.

Opening a box to sort items individually

$lookup

Join data from another collection (like a SQL JOIN).

Fetching files from two separate offices

$addFields

Add new computed fields without removing existing ones.

Doctor adding a note to your file

$count

Return the total count of documents as a single number.

Teacher taking attendance

$facet

Run multiple sub-pipelines simultaneously on the same data.

A shopping mall - multiple stores, one entrance

$bucket

Group documents into defined ranges. $bucketAuto picks ranges automatically.

Exam grading - Fail, Pass, Distinction

AggregationPipeline

You use this every day- you just didn't know it

The technicalexplanation

Each stage doesone job

Why not just processdata in your app?

A complete pipeline— start to finish

All stagesat a glance

Aggregation
Pipeline

You use this every day
- you just didn't know it

The technical
explanation

Each stage does
one job

Why not just process
data in your app?

A complete pipeline
— start to finish

All stages
at a glance