MongoDB Data Modeling
Data modeling is a critical aspect of MongoDB application development. A good data model can improve query performance, simplify application logic, and ensure data consistency.
Data Modeling Principles
1. Model Based on Application Requirements
- Understand application data access patterns
- Identify frequently queried data
- Determine read/write ratios
2. Prioritize Query Performance
- Keep data that is queried together in the same document
- Avoid excessive document references
- Use indexes appropriately
3. Balance Flexibility and Consistency
- Embedded documents improve query performance
- Referenced documents ensure data consistency
Relationship Modeling Strategies
One-to-One Relationships
Embedded Approach (Recommended)
javascript
// User and user profile one-to-one
{
"_id": ObjectId("..."),
"username": "johndoe",
"email": "john@example.com",
"profile": {
"firstName": "John",
"lastName": "Doe",
"birthDate": ISODate("1990-01-01"),
"phone": "555-0123"
}
}Referenced Approach
javascript
// users collection
{ "_id": 1, "username": "johndoe", "profile_id": 101 }
// profiles collection
{ "_id": 101, "firstName": "John", "lastName": "Doe" }One-to-Many Relationships
One-to-Few (Embedded)
javascript
// User and addresses (users typically have few addresses)
{
"_id": ObjectId("..."),
"username": "johndoe",
"addresses": [
{
"type": "home",
"city": "New York",
"street": "123 Main St"
},
{
"type": "work",
"city": "Boston",
"street": "456 Office Ave"
}
]
}One-to-Many (Referenced)
javascript
// Author and books (one author has many books)
// authors collection
{
"_id": ObjectId("..."),
"name": "John Smith",
"nationality": "American"
}
// books collection
{
"_id": ObjectId("..."),
"title": "The Great Novel",
"author_id": ObjectId("..."), // Reference to author
"publishYear": 2020
}One-to-Many (Parent Referencing)
javascript
// Device and logs (one device has many logs)
// devices collection
{
"_id": ObjectId("..."),
"deviceName": "Sensor-A001",
"location": "Building 1"
}
// logs collection
{
"_id": ObjectId("..."),
"device_id": ObjectId("..."), // Reference to device
"timestamp": ISODate("2024-01-20T10:00:00Z"),
"temperature": 77.9,
"humidity": 60
}Many-to-Many Relationships
Bidirectional Embedding (Small Data)
javascript
// Students and courses (when quantities are small)
// students collection
{
"_id": ObjectId("..."),
"name": "John Doe",
"courses": [
{ "course_id": ObjectId("..."), "name": "Math" },
{ "course_id": ObjectId("..."), "name": "English" }
]
}
// courses collection
{
"_id": ObjectId("..."),
"name": "Math",
"students": [
{ "student_id": ObjectId("..."), "name": "John Doe" },
{ "student_id": ObjectId("..."), "name": "Jane Smith" }
]
}Junction Table Approach (Recommended)
javascript
// students collection
{ "_id": 1, "name": "John Doe" }
// courses collection
{ "_id": 101, "name": "Math" }
// enrollments collection (junction table)
{
"_id": ObjectId("..."),
"student_id": 1,
"course_id": 101,
"enrollmentDate": ISODate("2024-01-15"),
"grade": 85
}Real-World Modeling Examples
E-commerce System
Products Collection
javascript
{
"_id": ObjectId("..."),
"sku": "PHONE-001",
"name": "iPhone 15 Pro",
"category": {
"id": ObjectId("..."),
"name": "Phones"
},
"price": 999,
"specifications": {
"color": "Space Black",
"storage": "256GB",
"screen": "6.1 inches"
},
"inventory": {
"quantity": 100,
"warehouse": "Warehouse A"
},
"reviews": [
{
"user_id": ObjectId("..."),
"rating": 5,
"comment": "Excellent product!",
"createdAt": ISODate("2024-01-20")
}
],
"createdAt": ISODate("2024-01-01")
}Orders Collection
javascript
{
"_id": ObjectId("..."),
"orderNo": "ORD202401200001",
"customer": {
"user_id": ObjectId("..."),
"name": "John Doe",
"phone": "555-0123"
},
"items": [
{
"product_id": ObjectId("..."),
"sku": "PHONE-001",
"name": "iPhone 15 Pro",
"price": 999,
"quantity": 1
}
],
"shipping": {
"address": "123 Main St, New York...",
"status": "shipped",
"trackingNo": "TRK123456789"
},
"payment": {
"method": "Credit Card",
"amount": 999,
"status": "paid"
},
"status": "completed",
"createdAt": ISODate("2024-01-20T10:00:00Z")
}Blog System
Articles Collection
javascript
{
"_id": ObjectId("..."),
"title": "MongoDB Data Modeling Guide",
"slug": "mongodb-data-modeling-guide",
"content": "Article content...",
"author": {
"user_id": ObjectId("..."),
"name": "Tech Blogger",
"avatar": "https://..."
},
"tags": ["MongoDB", "Database", "NoSQL"],
"category": "Technology",
"status": "published",
"views": 1250,
"likes": 89,
"comments_count": 15,
"publishedAt": ISODate("2024-01-20T08:00:00Z"),
"updatedAt": ISODate("2024-01-20T10:00:00Z")
}Comments Collection (Separate storage due to potentially large quantity)
javascript
{
"_id": ObjectId("..."),
"article_id": ObjectId("..."),
"user": {
"user_id": ObjectId("..."),
"name": "Reader A",
"avatar": "https://..."
},
"content": "Great article!",
"parent_id": null, // ID of replied comment, null for top-level
"likes": 5,
"createdAt": ISODate("2024-01-20T09:00:00Z")
}Denormalization
What is Denormalization
In MongoDB, appropriate data redundancy can improve query performance and avoid frequent join operations.
Denormalization Scenarios
1. Frequently Accessed Related Data
javascript
// Embed basic product info in orders to avoid querying products collection
{
"items": [
{
"product_id": ObjectId("..."),
"name": "iPhone 15 Pro", // Denormalized
"price": 999 // Denormalized (historical price)
}
]
}2. Computed Fields
javascript
// Store comment count to avoid real-time calculation
{
"title": "Article Title",
"content": "Article content",
"comments_count": 15, // Denormalized field
"views": 1250 // Denormalized field
}Maintaining Denormalized Data
javascript
// Use $inc for atomic counter updates
db.articles.updateOne(
{ _id: articleId },
{ $inc: { comments_count: 1 } }
)Index Design
Index Design Principles
- Create indexes for frequently queried fields
- Create indexes for sort fields
- Compound indexes follow ESR rule (Equality, Sort, Range)
Examples
javascript
// Users collection indexes
db.users.createIndex({ "email": 1 }, { unique: true }) // Unique index
db.users.createIndex({ "username": 1 }) // Single field index
db.users.createIndex({ "createdAt": -1 }) // Time descending index
// Orders collection indexes
db.orders.createIndex({ "customer.user_id": 1, "createdAt": -1 }) // Compound index
db.orders.createIndex({ "orderNo": 1 }, { unique: true }) // Order number uniqueSummary
Embedded vs Referenced Decision Tree
Data relationship type?
├── One-to-one → Embed
├── One-to-few → Embed
├── One-to-many → Reference (child document ID array) or Parent reference
└── Many-to-many → Junction table
Query pattern?
├── Frequently queried together → Embed
└── Independently queried → Reference
Data update frequency?
├── Rarely updated → Can embed (denormalize)
└── Frequently updated → Reference (avoid multiple updates)Best Practices
- Prefer embedding unless there's a clear reason to use references
- Avoid overly deep document nesting (recommend no more than 3 levels)
- Be aware of the 16MB document size limit
- Create appropriate indexes for common queries
- Use denormalization appropriately to improve read performance
In the next chapter, we will learn about MongoDB User Management.