Skip to content

MongoDB Data Modeling

Data modeling is a critical aspect of MongoDB application development. A good data model can improve query performance, simplify application logic, and ensure data consistency.

Data Modeling Principles

1. Model Based on Application Requirements

  • Understand application data access patterns
  • Identify frequently queried data
  • Determine read/write ratios

2. Prioritize Query Performance

  • Keep data that is queried together in the same document
  • Avoid excessive document references
  • Use indexes appropriately

3. Balance Flexibility and Consistency

  • Embedded documents improve query performance
  • Referenced documents ensure data consistency

Relationship Modeling Strategies

One-to-One Relationships

javascript
// User and user profile one-to-one
{
  "_id": ObjectId("..."),
  "username": "johndoe",
  "email": "john@example.com",
  "profile": {
    "firstName": "John",
    "lastName": "Doe",
    "birthDate": ISODate("1990-01-01"),
    "phone": "555-0123"
  }
}

Referenced Approach

javascript
// users collection
{ "_id": 1, "username": "johndoe", "profile_id": 101 }

// profiles collection
{ "_id": 101, "firstName": "John", "lastName": "Doe" }

One-to-Many Relationships

One-to-Few (Embedded)

javascript
// User and addresses (users typically have few addresses)
{
  "_id": ObjectId("..."),
  "username": "johndoe",
  "addresses": [
    {
      "type": "home",
      "city": "New York",
      "street": "123 Main St"
    },
    {
      "type": "work",
      "city": "Boston",
      "street": "456 Office Ave"
    }
  ]
}

One-to-Many (Referenced)

javascript
// Author and books (one author has many books)
// authors collection
{
  "_id": ObjectId("..."),
  "name": "John Smith",
  "nationality": "American"
}

// books collection
{
  "_id": ObjectId("..."),
  "title": "The Great Novel",
  "author_id": ObjectId("..."),  // Reference to author
  "publishYear": 2020
}

One-to-Many (Parent Referencing)

javascript
// Device and logs (one device has many logs)
// devices collection
{
  "_id": ObjectId("..."),
  "deviceName": "Sensor-A001",
  "location": "Building 1"
}

// logs collection
{
  "_id": ObjectId("..."),
  "device_id": ObjectId("..."),  // Reference to device
  "timestamp": ISODate("2024-01-20T10:00:00Z"),
  "temperature": 77.9,
  "humidity": 60
}

Many-to-Many Relationships

Bidirectional Embedding (Small Data)

javascript
// Students and courses (when quantities are small)
// students collection
{
  "_id": ObjectId("..."),
  "name": "John Doe",
  "courses": [
    { "course_id": ObjectId("..."), "name": "Math" },
    { "course_id": ObjectId("..."), "name": "English" }
  ]
}

// courses collection
{
  "_id": ObjectId("..."),
  "name": "Math",
  "students": [
    { "student_id": ObjectId("..."), "name": "John Doe" },
    { "student_id": ObjectId("..."), "name": "Jane Smith" }
  ]
}
javascript
// students collection
{ "_id": 1, "name": "John Doe" }

// courses collection
{ "_id": 101, "name": "Math" }

// enrollments collection (junction table)
{
  "_id": ObjectId("..."),
  "student_id": 1,
  "course_id": 101,
  "enrollmentDate": ISODate("2024-01-15"),
  "grade": 85
}

Real-World Modeling Examples

E-commerce System

Products Collection

javascript
{
  "_id": ObjectId("..."),
  "sku": "PHONE-001",
  "name": "iPhone 15 Pro",
  "category": {
    "id": ObjectId("..."),
    "name": "Phones"
  },
  "price": 999,
  "specifications": {
    "color": "Space Black",
    "storage": "256GB",
    "screen": "6.1 inches"
  },
  "inventory": {
    "quantity": 100,
    "warehouse": "Warehouse A"
  },
  "reviews": [
    {
      "user_id": ObjectId("..."),
      "rating": 5,
      "comment": "Excellent product!",
      "createdAt": ISODate("2024-01-20")
    }
  ],
  "createdAt": ISODate("2024-01-01")
}

Orders Collection

javascript
{
  "_id": ObjectId("..."),
  "orderNo": "ORD202401200001",
  "customer": {
    "user_id": ObjectId("..."),
    "name": "John Doe",
    "phone": "555-0123"
  },
  "items": [
    {
      "product_id": ObjectId("..."),
      "sku": "PHONE-001",
      "name": "iPhone 15 Pro",
      "price": 999,
      "quantity": 1
    }
  ],
  "shipping": {
    "address": "123 Main St, New York...",
    "status": "shipped",
    "trackingNo": "TRK123456789"
  },
  "payment": {
    "method": "Credit Card",
    "amount": 999,
    "status": "paid"
  },
  "status": "completed",
  "createdAt": ISODate("2024-01-20T10:00:00Z")
}

Blog System

Articles Collection

javascript
{
  "_id": ObjectId("..."),
  "title": "MongoDB Data Modeling Guide",
  "slug": "mongodb-data-modeling-guide",
  "content": "Article content...",
  "author": {
    "user_id": ObjectId("..."),
    "name": "Tech Blogger",
    "avatar": "https://..."
  },
  "tags": ["MongoDB", "Database", "NoSQL"],
  "category": "Technology",
  "status": "published",
  "views": 1250,
  "likes": 89,
  "comments_count": 15,
  "publishedAt": ISODate("2024-01-20T08:00:00Z"),
  "updatedAt": ISODate("2024-01-20T10:00:00Z")
}

Comments Collection (Separate storage due to potentially large quantity)

javascript
{
  "_id": ObjectId("..."),
  "article_id": ObjectId("..."),
  "user": {
    "user_id": ObjectId("..."),
    "name": "Reader A",
    "avatar": "https://..."
  },
  "content": "Great article!",
  "parent_id": null,  // ID of replied comment, null for top-level
  "likes": 5,
  "createdAt": ISODate("2024-01-20T09:00:00Z")
}

Denormalization

What is Denormalization

In MongoDB, appropriate data redundancy can improve query performance and avoid frequent join operations.

Denormalization Scenarios

javascript
// Embed basic product info in orders to avoid querying products collection
{
  "items": [
    {
      "product_id": ObjectId("..."),
      "name": "iPhone 15 Pro",  // Denormalized
      "price": 999               // Denormalized (historical price)
    }
  ]
}

2. Computed Fields

javascript
// Store comment count to avoid real-time calculation
{
  "title": "Article Title",
  "content": "Article content",
  "comments_count": 15,  // Denormalized field
  "views": 1250          // Denormalized field
}

Maintaining Denormalized Data

javascript
// Use $inc for atomic counter updates
db.articles.updateOne(
  { _id: articleId },
  { $inc: { comments_count: 1 } }
)

Index Design

Index Design Principles

  1. Create indexes for frequently queried fields
  2. Create indexes for sort fields
  3. Compound indexes follow ESR rule (Equality, Sort, Range)

Examples

javascript
// Users collection indexes
db.users.createIndex({ "email": 1 }, { unique: true })  // Unique index
db.users.createIndex({ "username": 1 })                 // Single field index
db.users.createIndex({ "createdAt": -1 })               // Time descending index

// Orders collection indexes
db.orders.createIndex({ "customer.user_id": 1, "createdAt": -1 })  // Compound index
db.orders.createIndex({ "orderNo": 1 }, { unique: true })          // Order number unique

Summary

Embedded vs Referenced Decision Tree

Data relationship type?
├── One-to-one → Embed
├── One-to-few → Embed
├── One-to-many → Reference (child document ID array) or Parent reference
└── Many-to-many → Junction table

Query pattern?
├── Frequently queried together → Embed
└── Independently queried → Reference

Data update frequency?
├── Rarely updated → Can embed (denormalize)
└── Frequently updated → Reference (avoid multiple updates)

Best Practices

  1. Prefer embedding unless there's a clear reason to use references
  2. Avoid overly deep document nesting (recommend no more than 3 levels)
  3. Be aware of the 16MB document size limit
  4. Create appropriate indexes for common queries
  5. Use denormalization appropriately to improve read performance

In the next chapter, we will learn about MongoDB User Management.

Content is for learning and research only.