Skip to content

MongoDB Concepts

MongoDB uses a series of unique concepts and terminology. This chapter will detail MongoDB's core concepts to help you build a solid theoretical foundation.

Core Concept Comparison

MongoDB vs Relational Databases

Relational DBMongoDBDescription
DatabaseDatabasePhysical container holding multiple collections
TableCollectionContainer for documents, no fixed schema
RowDocumentBasic data unit in BSON format
ColumnFieldKey-value pairs in documents
Primary Key_idAuto-generated unique identifier
IndexIndexData structure for query performance
Table JoinEmbedded DocumentsRelate data through nesting or references

Database

Definition

A database is a physical container for collections, with each database having its own set of files on the file system.

Characteristics

  • A MongoDB instance can contain multiple databases
  • Databases are independent with separate access control
  • Naming convention: lowercase letters, max 64 bytes
javascript
// Show all databases
show dbs

// Switch/create database
use mydb

// Show current database
db

// Drop current database
db.dropDatabase()

Reserved Database Names

  • admin: Authentication and authorization database
  • local: Stores local server data, not replicated
  • config: Sharded cluster configuration information

Collection

Definition

A collection is a group of MongoDB documents, similar to a table in relational databases but without a predefined schema.

Characteristics

  • Documents in a collection can have different fields
  • Dynamic schema, flexible for data changes
  • Names cannot start with "system." (system reserved)
javascript
// Create collection
db.createCollection("users")

// Show collection list
show collections

// Drop collection
db.users.drop()

// Show collection statistics
db.users.stats()

Collection Types

  • Standard Collection: Regular document collection
  • Capped Collection: Fixed-size collection with circular overwrite

Document

Definition

A document is the basic unit of data in MongoDB, stored in BSON (Binary JSON) format.

BSON Characteristics

  • Supports more data types (Date, ObjectId, Binary, etc.)
  • Binary encoding for higher parsing efficiency
  • Supports nested documents and arrays

Document Structure

javascript
{
  "_id": ObjectId("507f1f77bcf86cd799439011"),
  "name": "John Doe",
  "age": 25,
  "email": "john@example.com",
  "isActive": true,
  "createdAt": ISODate("2024-01-15T08:30:00Z"),
  "address": {
    "street": "123 Main St",
    "city": "New York",
    "zipCode": "10001"
  },
  "hobbies": ["reading", "swimming", "programming"],
  "metadata": {
    "loginCount": 15,
    "lastLogin": ISODate("2024-01-20T10:00:00Z")
  }
}

Document Limits

  • Maximum document size: 16MB
  • Maximum nesting level: 100 levels
  • Field names cannot contain $ and . characters

Field

Definition

A field is a key-value pair in a document, equivalent to a column in relational databases.

Naming Rules

  • Field names are strings
  • Cannot contain $ character (reserved for operators)
  • Cannot start or end with .
  • Cannot contain null characters
  • _id is reserved as the primary key

Data Types

TypeExampleDescription
String"hello"UTF-8 string
Integer42Integer value
Double3.14Floating point
BooleantrueBoolean value
DateISODate(...)Date and time
NullnullNull value
Array[1, 2, 3]Array
Object{a: 1}Embedded document
ObjectIdObjectId(...)12-byte unique ID
Binary DataBinData(...)Binary data
TimestampTimestamp(...)Timestamp

_id Field

Definition

Every document must have an _id field as the primary key to uniquely identify the document.

Characteristics

  • Automatically creates a unique index
  • If not specified, MongoDB auto-generates an ObjectId
  • Can be customized but must be unique

ObjectId Structure

507f1f77bcf86cd799439011
\____/\____/\________/\____/
 Time   Machine PID   Random
 4bytes 3bytes  2bytes 3bytes
javascript
// Get timestamp from ObjectId
ObjectId("507f1f77bcf86cd799439011").getTimestamp()
// Output: ISODate("2012-10-17T20:46:47Z")

Index

Definition

An index is a special data structure that stores a sorted subset of data from a collection to improve query efficiency.

Index Types

  • Single Field Index: Index on a single field
  • Compound Index: Index on multiple fields combined
  • Multikey Index: Index on array fields
  • Text Index: For full-text search
  • Geospatial Index: For location-based queries
  • Hashed Index: Based on field value hash
javascript
// Create single field index
db.users.createIndex({ name: 1 })

// Create compound index
db.users.createIndex({ age: 1, name: -1 })

// Create unique index
db.users.createIndex({ email: 1 }, { unique: true })

// Show indexes
db.users.getIndexes()

// Drop index
db.users.dropIndex("name_1")

Replica Set

Definition

A replica set is a group of MongoDB instances that maintain the same data set, providing redundancy and high availability.

Member Roles

  • Primary: Handles all write operations
  • Secondary: Replicates primary data, can handle read operations
  • Arbiter: Participates in elections, doesn't store data

Election Mechanism

  • When primary fails, secondaries automatically elect a new primary
  • Requires majority of nodes to be alive for election
  • Election process usually completes within seconds

Sharding

Definition

Sharding is the process of distributing data across multiple servers to handle large-scale data and high-throughput operations.

Sharded Cluster Components

  • mongos: Query router, handles client requests
  • config servers: Store cluster metadata
  • shards: Store actual data

Sharding Strategies

  • Range Sharding: Based on shard key value ranges
  • Hashed Sharding: Based on shard key hash values

Data Model Design

Embedding vs Referencing

Embedded Documents

Advantages:

  • Single query retrieves all data
  • Atomic updates
  • Better read performance

Use cases:

  • One-to-one relationships
  • One-to-few relationships (few sub-documents)
  • Data frequently queried together
javascript
// Embedding example
{
  "user": "John Doe",
  "address": {
    "city": "New York",
    "street": "123 Main St"
  }
}

Referencing

Advantages:

  • Avoids data duplication
  • Data consistency
  • Controllable document size

Use cases:

  • One-to-many relationships (many sub-documents)
  • Many-to-many relationships
  • Data frequently queried independently
javascript
// Referencing example
// users collection
{ "_id": 1, "name": "John Doe" }

// orders collection
{ "_id": 101, "user_id": 1, "amount": 100 }

Summary

Understanding MongoDB's core concepts is fundamental to mastering MongoDB:

  1. Database: Container for collections
  2. Collection: Container for documents, no fixed schema
  3. Document: Basic data unit in BSON format
  4. Field: Key-value pairs in documents
  5. _id: Unique identifier for documents
  6. Index: Data structure for query efficiency
  7. Replica Set: Provides high availability
  8. Sharding: Supports horizontal scaling

In the next chapter, we will learn about MongoDB Data Modeling.

Content is for learning and research only.