MongoDB Concepts
MongoDB uses a series of unique concepts and terminology. This chapter will detail MongoDB's core concepts to help you build a solid theoretical foundation.
Core Concept Comparison
MongoDB vs Relational Databases
Database
Definition
A database is a physical container for collections, with each database having its own set of files on the file system.
Characteristics
- A MongoDB instance can contain multiple databases
- Databases are independent with separate access control
- Naming convention: lowercase letters, max 64 bytes
Reserved Database Names
- admin: Authentication and authorization database
- local: Stores local server data, not replicated
- config: Sharded cluster configuration information
Collection
Definition
A collection is a group of MongoDB documents, similar to a table in relational databases but without a predefined schema.
Characteristics
- Documents in a collection can have different fields
- Dynamic schema, flexible for data changes
- Names cannot start with "system." (system reserved)
Collection Types
- Standard Collection: Regular document collection
- Capped Collection: Fixed-size collection with circular overwrite
Document
Definition
A document is the basic unit of data in MongoDB, stored in BSON (Binary JSON) format.
BSON Characteristics
- Supports more data types (Date, ObjectId, Binary, etc.)
- Binary encoding for higher parsing efficiency
- Supports nested documents and arrays
Document Structure
Document Limits
- Maximum document size: 16MB
- Maximum nesting level: 100 levels
- Field names cannot contain
$and.characters
Field
Definition
A field is a key-value pair in a document, equivalent to a column in relational databases.
Naming Rules
- Field names are strings
- Cannot contain
$character (reserved for operators) - Cannot start or end with
. - Cannot contain null characters
_idis reserved as the primary key
Data Types
_id Field
Definition
Every document must have an _id field as the primary key to uniquely identify the document.
Characteristics
- Automatically creates a unique index
- If not specified, MongoDB auto-generates an ObjectId
- Can be customized but must be unique
ObjectId Structure
Index
Definition
An index is a special data structure that stores a sorted subset of data from a collection to improve query efficiency.
Index Types
- Single Field Index: Index on a single field
- Compound Index: Index on multiple fields combined
- Multikey Index: Index on array fields
- Text Index: For full-text search
- Geospatial Index: For location-based queries
- Hashed Index: Based on field value hash
Replica Set
Definition
A replica set is a group of MongoDB instances that maintain the same data set, providing redundancy and high availability.
Member Roles
- Primary: Handles all write operations
- Secondary: Replicates primary data, can handle read operations
- Arbiter: Participates in elections, doesn't store data
Election Mechanism
- When primary fails, secondaries automatically elect a new primary
- Requires majority of nodes to be alive for election
- Election process usually completes within seconds
Sharding
Definition
Sharding is the process of distributing data across multiple servers to handle large-scale data and high-throughput operations.
Sharded Cluster Components
- mongos: Query router, handles client requests
- config servers: Store cluster metadata
- shards: Store actual data
Sharding Strategies
- Range Sharding: Based on shard key value ranges
- Hashed Sharding: Based on shard key hash values
Data Model Design
Embedding vs Referencing
Embedded Documents
Advantages:
- Single query retrieves all data
- Atomic updates
- Better read performance
Use cases:
- One-to-one relationships
- One-to-few relationships (few sub-documents)
- Data frequently queried together
Referencing
Advantages:
- Avoids data duplication
- Data consistency
- Controllable document size
Use cases:
- One-to-many relationships (many sub-documents)
- Many-to-many relationships
- Data frequently queried independently
Summary
Understanding MongoDB's core concepts is fundamental to mastering MongoDB:
- Database: Container for collections
- Collection: Container for documents, no fixed schema
- Document: Basic data unit in BSON format
- Field: Key-value pairs in documents
- _id: Unique identifier for documents
- Index: Data structure for query efficiency
- Replica Set: Provides high availability
- Sharding: Supports horizontal scaling
In the next chapter, we will learn about MongoDB Data Modeling.