MongoDB Data Modeling

Data modeling is a critical aspect of MongoDB application development. A good data model can improve query performance, simplify application logic, and ensure data consistency.

Data Modeling Principles

1. Model Based on Application Requirements

Understand application data access patterns
Identify frequently queried data
Determine read/write ratios

2. Prioritize Query Performance

Keep data that is queried together in the same document
Avoid excessive document references
Use indexes appropriately

3. Balance Flexibility and Consistency

Embedded documents improve query performance
Referenced documents ensure data consistency

Relationship Modeling Strategies

One-to-One Relationships

Embedded Approach (Recommended)

javascript

// User and user profile one-to-one
{
  "_id": ObjectId("..."),
  "username": "johndoe",
  "email": "john@example.com",
  "profile": {
    "firstName": "John",
    "lastName": "Doe",
    "birthDate": ISODate("1990-01-01"),
    "phone": "555-0123"
  }
}

Referenced Approach

javascript

// users collection
{ "_id": 1, "username": "johndoe", "profile_id": 101 }

// profiles collection
{ "_id": 101, "firstName": "John", "lastName": "Doe" }

One-to-Many Relationships

One-to-Few (Embedded)

javascript

// User and addresses (users typically have few addresses)
{
  "_id": ObjectId("..."),
  "username": "johndoe",
  "addresses": [
    {
      "type": "home",
      "city": "New York",
      "street": "123 Main St"
    },
    {
      "type": "work",
      "city": "Boston",
      "street": "456 Office Ave"
    }
  ]
}

One-to-Many (Referenced)

javascript

// Author and books (one author has many books)
// authors collection
{
  "_id": ObjectId("..."),
  "name": "John Smith",
  "nationality": "American"
}

// books collection
{
  "_id": ObjectId("..."),
  "title": "The Great Novel",
  "author_id": ObjectId("..."),  // Reference to author
  "publishYear": 2020
}

One-to-Many (Parent Referencing)

javascript

// Device and logs (one device has many logs)
// devices collection
{
  "_id": ObjectId("..."),
  "deviceName": "Sensor-A001",
  "location": "Building 1"
}

// logs collection
{
  "_id": ObjectId("..."),
  "device_id": ObjectId("..."),  // Reference to device
  "timestamp": ISODate("2024-01-20T10:00:00Z"),
  "temperature": 77.9,
  "humidity": 60
}

Many-to-Many Relationships

Bidirectional Embedding (Small Data)

javascript

// Students and courses (when quantities are small)
// students collection
{
  "_id": ObjectId("..."),
  "name": "John Doe",
  "courses": [
    { "course_id": ObjectId("..."), "name": "Math" },
    { "course_id": ObjectId("..."), "name": "English" }
  ]
}

// courses collection
{
  "_id": ObjectId("..."),
  "name": "Math",
  "students": [
    { "student_id": ObjectId("..."), "name": "John Doe" },
    { "student_id": ObjectId("..."), "name": "Jane Smith" }
  ]
}

Junction Table Approach (Recommended)

javascript

// students collection
{ "_id": 1, "name": "John Doe" }

// courses collection
{ "_id": 101, "name": "Math" }

// enrollments collection (junction table)
{
  "_id": ObjectId("..."),
  "student_id": 1,
  "course_id": 101,
  "enrollmentDate": ISODate("2024-01-15"),
  "grade": 85
}

Real-World Modeling Examples

E-commerce System

Products Collection

javascript

{
  "_id": ObjectId("..."),
  "sku": "PHONE-001",
  "name": "iPhone 15 Pro",
  "category": {
    "id": ObjectId("..."),
    "name": "Phones"
  },
  "price": 999,
  "specifications": {
    "color": "Space Black",
    "storage": "256GB",
    "screen": "6.1 inches"
  },
  "inventory": {
    "quantity": 100,
    "warehouse": "Warehouse A"
  },
  "reviews": [
    {
      "user_id": ObjectId("..."),
      "rating": 5,
      "comment": "Excellent product!",
      "createdAt": ISODate("2024-01-20")
    }
  ],
  "createdAt": ISODate("2024-01-01")
}

Orders Collection

javascript

{
  "_id": ObjectId("..."),
  "orderNo": "ORD202401200001",
  "customer": {
    "user_id": ObjectId("..."),
    "name": "John Doe",
    "phone": "555-0123"
  },
  "items": [
    {
      "product_id": ObjectId("..."),
      "sku": "PHONE-001",
      "name": "iPhone 15 Pro",
      "price": 999,
      "quantity": 1
    }
  ],
  "shipping": {
    "address": "123 Main St, New York...",
    "status": "shipped",
    "trackingNo": "TRK123456789"
  },
  "payment": {
    "method": "Credit Card",
    "amount": 999,
    "status": "paid"
  },
  "status": "completed",
  "createdAt": ISODate("2024-01-20T10:00:00Z")
}

Blog System

Articles Collection

javascript

{
  "_id": ObjectId("..."),
  "title": "MongoDB Data Modeling Guide",
  "slug": "mongodb-data-modeling-guide",
  "content": "Article content...",
  "author": {
    "user_id": ObjectId("..."),
    "name": "Tech Blogger",
    "avatar": "https://..."
  },
  "tags": ["MongoDB", "Database", "NoSQL"],
  "category": "Technology",
  "status": "published",
  "views": 1250,
  "likes": 89,
  "comments_count": 15,
  "publishedAt": ISODate("2024-01-20T08:00:00Z"),
  "updatedAt": ISODate("2024-01-20T10:00:00Z")
}

Comments Collection (Separate storage due to potentially large quantity)

javascript

{
  "_id": ObjectId("..."),
  "article_id": ObjectId("..."),
  "user": {
    "user_id": ObjectId("..."),
    "name": "Reader A",
    "avatar": "https://..."
  },
  "content": "Great article!",
  "parent_id": null,  // ID of replied comment, null for top-level
  "likes": 5,
  "createdAt": ISODate("2024-01-20T09:00:00Z")
}

Denormalization

What is Denormalization

In MongoDB, appropriate data redundancy can improve query performance and avoid frequent join operations.

Denormalization Scenarios

javascript

// Embed basic product info in orders to avoid querying products collection
{
  "items": [
    {
      "product_id": ObjectId("..."),
      "name": "iPhone 15 Pro",  // Denormalized
      "price": 999               // Denormalized (historical price)
    }
  ]
}

2. Computed Fields

javascript

// Store comment count to avoid real-time calculation
{
  "title": "Article Title",
  "content": "Article content",
  "comments_count": 15,  // Denormalized field
  "views": 1250          // Denormalized field
}

Maintaining Denormalized Data

javascript

// Use $inc for atomic counter updates
db.articles.updateOne(
  { _id: articleId },
  { $inc: { comments_count: 1 } }
)

Index Design

Index Design Principles

Create indexes for frequently queried fields
Create indexes for sort fields
Compound indexes follow ESR rule (Equality, Sort, Range)

Examples

javascript

// Users collection indexes
db.users.createIndex({ "email": 1 }, { unique: true })  // Unique index
db.users.createIndex({ "username": 1 })                 // Single field index
db.users.createIndex({ "createdAt": -1 })               // Time descending index

// Orders collection indexes
db.orders.createIndex({ "customer.user_id": 1, "createdAt": -1 })  // Compound index
db.orders.createIndex({ "orderNo": 1 }, { unique: true })          // Order number unique

Summary

Embedded vs Referenced Decision Tree

Data relationship type?
├── One-to-one → Embed
├── One-to-few → Embed
├── One-to-many → Reference (child document ID array) or Parent reference
└── Many-to-many → Junction table

Query pattern?
├── Frequently queried together → Embed
└── Independently queried → Reference

Data update frequency?
├── Rarely updated → Can embed (denormalize)
└── Frequently updated → Reference (avoid multiple updates)

Best Practices

Prefer embedding unless there's a clear reason to use references
Avoid overly deep document nesting (recommend no more than 3 levels)
Be aware of the 16MB document size limit
Create appropriate indexes for common queries
Use denormalization appropriately to improve read performance

In the next chapter, we will learn about MongoDB User Management.

MongoDB Data Modeling ​

Data Modeling Principles ​

1. Model Based on Application Requirements ​

2. Prioritize Query Performance ​

3. Balance Flexibility and Consistency ​

Relationship Modeling Strategies ​

One-to-One Relationships ​

Embedded Approach (Recommended) ​

Referenced Approach ​

One-to-Many Relationships ​

One-to-Few (Embedded) ​

One-to-Many (Referenced) ​

One-to-Many (Parent Referencing) ​

Many-to-Many Relationships ​

Bidirectional Embedding (Small Data) ​

Junction Table Approach (Recommended) ​

Real-World Modeling Examples ​

E-commerce System ​

Products Collection ​

Orders Collection ​

Blog System ​

Articles Collection ​

Comments Collection (Separate storage due to potentially large quantity) ​

Denormalization ​

What is Denormalization ​

Denormalization Scenarios ​

1. Frequently Accessed Related Data ​

2. Computed Fields ​

Maintaining Denormalized Data ​

Index Design ​

Index Design Principles ​

Examples ​

Summary ​

Embedded vs Referenced Decision Tree ​

Best Practices ​

MongoDB Data Modeling

Data Modeling Principles

1. Model Based on Application Requirements

2. Prioritize Query Performance

3. Balance Flexibility and Consistency

Relationship Modeling Strategies

One-to-One Relationships

Embedded Approach (Recommended)

Referenced Approach

One-to-Many Relationships

One-to-Few (Embedded)

One-to-Many (Referenced)

One-to-Many (Parent Referencing)

Many-to-Many Relationships

Bidirectional Embedding (Small Data)

Junction Table Approach (Recommended)

Real-World Modeling Examples

E-commerce System

Products Collection

Orders Collection

Blog System

Articles Collection

Comments Collection (Separate storage due to potentially large quantity)

Denormalization

What is Denormalization

Denormalization Scenarios

1. Frequently Accessed Related Data

2. Computed Fields

Maintaining Denormalized Data

Index Design

Index Design Principles

Examples

Summary

Embedded vs Referenced Decision Tree

Best Practices