MongoDB Concepts
MongoDB uses a series of unique concepts and terminology. This chapter will detail MongoDB's core concepts to help you build a solid theoretical foundation.
Core Concept Comparison
MongoDB vs Relational Databases
| Relational DB | MongoDB | Description |
|---|---|---|
| Database | Database | Physical container holding multiple collections |
| Table | Collection | Container for documents, no fixed schema |
| Row | Document | Basic data unit in BSON format |
| Column | Field | Key-value pairs in documents |
| Primary Key | _id | Auto-generated unique identifier |
| Index | Index | Data structure for query performance |
| Table Join | Embedded Documents | Relate data through nesting or references |
Database
Definition
A database is a physical container for collections, with each database having its own set of files on the file system.
Characteristics
- A MongoDB instance can contain multiple databases
- Databases are independent with separate access control
- Naming convention: lowercase letters, max 64 bytes
// Show all databases
show dbs
// Switch/create database
use mydb
// Show current database
db
// Drop current database
db.dropDatabase()Reserved Database Names
- admin: Authentication and authorization database
- local: Stores local server data, not replicated
- config: Sharded cluster configuration information
Collection
Definition
A collection is a group of MongoDB documents, similar to a table in relational databases but without a predefined schema.
Characteristics
- Documents in a collection can have different fields
- Dynamic schema, flexible for data changes
- Names cannot start with "system." (system reserved)
// Create collection
db.createCollection("users")
// Show collection list
show collections
// Drop collection
db.users.drop()
// Show collection statistics
db.users.stats()Collection Types
- Standard Collection: Regular document collection
- Capped Collection: Fixed-size collection with circular overwrite
Document
Definition
A document is the basic unit of data in MongoDB, stored in BSON (Binary JSON) format.
BSON Characteristics
- Supports more data types (Date, ObjectId, Binary, etc.)
- Binary encoding for higher parsing efficiency
- Supports nested documents and arrays
Document Structure
{
"_id": ObjectId("507f1f77bcf86cd799439011"),
"name": "John Doe",
"age": 25,
"email": "john@example.com",
"isActive": true,
"createdAt": ISODate("2024-01-15T08:30:00Z"),
"address": {
"street": "123 Main St",
"city": "New York",
"zipCode": "10001"
},
"hobbies": ["reading", "swimming", "programming"],
"metadata": {
"loginCount": 15,
"lastLogin": ISODate("2024-01-20T10:00:00Z")
}
}Document Limits
- Maximum document size: 16MB
- Maximum nesting level: 100 levels
- Field names cannot contain
$and.characters
Field
Definition
A field is a key-value pair in a document, equivalent to a column in relational databases.
Naming Rules
- Field names are strings
- Cannot contain
$character (reserved for operators) - Cannot start or end with
. - Cannot contain null characters
_idis reserved as the primary key
Data Types
| Type | Example | Description |
|---|---|---|
| String | "hello" | UTF-8 string |
| Integer | 42 | Integer value |
| Double | 3.14 | Floating point |
| Boolean | true | Boolean value |
| Date | ISODate(...) | Date and time |
| Null | null | Null value |
| Array | [1, 2, 3] | Array |
| Object | {a: 1} | Embedded document |
| ObjectId | ObjectId(...) | 12-byte unique ID |
| Binary Data | BinData(...) | Binary data |
| Timestamp | Timestamp(...) | Timestamp |
_id Field
Definition
Every document must have an _id field as the primary key to uniquely identify the document.
Characteristics
- Automatically creates a unique index
- If not specified, MongoDB auto-generates an ObjectId
- Can be customized but must be unique
ObjectId Structure
507f1f77bcf86cd799439011
\____/\____/\________/\____/
Time Machine PID Random
4bytes 3bytes 2bytes 3bytes// Get timestamp from ObjectId
ObjectId("507f1f77bcf86cd799439011").getTimestamp()
// Output: ISODate("2012-10-17T20:46:47Z")Index
Definition
An index is a special data structure that stores a sorted subset of data from a collection to improve query efficiency.
Index Types
- Single Field Index: Index on a single field
- Compound Index: Index on multiple fields combined
- Multikey Index: Index on array fields
- Text Index: For full-text search
- Geospatial Index: For location-based queries
- Hashed Index: Based on field value hash
// Create single field index
db.users.createIndex({ name: 1 })
// Create compound index
db.users.createIndex({ age: 1, name: -1 })
// Create unique index
db.users.createIndex({ email: 1 }, { unique: true })
// Show indexes
db.users.getIndexes()
// Drop index
db.users.dropIndex("name_1")Replica Set
Definition
A replica set is a group of MongoDB instances that maintain the same data set, providing redundancy and high availability.
Member Roles
- Primary: Handles all write operations
- Secondary: Replicates primary data, can handle read operations
- Arbiter: Participates in elections, doesn't store data
Election Mechanism
- When primary fails, secondaries automatically elect a new primary
- Requires majority of nodes to be alive for election
- Election process usually completes within seconds
Sharding
Definition
Sharding is the process of distributing data across multiple servers to handle large-scale data and high-throughput operations.
Sharded Cluster Components
- mongos: Query router, handles client requests
- config servers: Store cluster metadata
- shards: Store actual data
Sharding Strategies
- Range Sharding: Based on shard key value ranges
- Hashed Sharding: Based on shard key hash values
Data Model Design
Embedding vs Referencing
Embedded Documents
Advantages:
- Single query retrieves all data
- Atomic updates
- Better read performance
Use cases:
- One-to-one relationships
- One-to-few relationships (few sub-documents)
- Data frequently queried together
// Embedding example
{
"user": "John Doe",
"address": {
"city": "New York",
"street": "123 Main St"
}
}Referencing
Advantages:
- Avoids data duplication
- Data consistency
- Controllable document size
Use cases:
- One-to-many relationships (many sub-documents)
- Many-to-many relationships
- Data frequently queried independently
// Referencing example
// users collection
{ "_id": 1, "name": "John Doe" }
// orders collection
{ "_id": 101, "user_id": 1, "amount": 100 }Summary
Understanding MongoDB's core concepts is fundamental to mastering MongoDB:
- Database: Container for collections
- Collection: Container for documents, no fixed schema
- Document: Basic data unit in BSON format
- Field: Key-value pairs in documents
- _id: Unique identifier for documents
- Index: Data structure for query efficiency
- Replica Set: Provides high availability
- Sharding: Supports horizontal scaling
In the next chapter, we will learn about MongoDB Data Modeling.