MongoDB Replication (Replica Sets)
MongoDB's replication feature allows us to create multiple copies of data to improve availability and reliability. A replica set is a group of MongoDB servers that includes a primary server and multiple secondary servers. The primary server handles all write operations, while the secondary servers replicate data from the primary server.
Basic Concepts
Replica Set Structure
A replica set typically consists of three types of servers:
- Primary: Handles all write operations and syncs data to secondary servers.
- Secondary: Replicates data from the primary server and handles read operations.
- Arbiter: Does not store data but only participates in the election of a new primary server.
Data Synchronization
When the primary server receives a write operation, it records the operation in the operation log (Oplog). Secondary servers periodically replicate the operation log from the primary server and execute these operations, thereby achieving data synchronization.
Failover
When the primary server fails, the replica set automatically performs a failover to elect a new primary server. The election process involves arbiters to ensure the validity of the election result.
Configuring a Replica Set
Initializing a Replica Set
// Connect to the primary server
mongo --host primary.example.com
// Initialize the replica set
rs.initiate({
_id: "myReplicaSet",
members: [
{ _id: 0, host: "primary.example.com:27017" },
{ _id: 1, host: "secondary1.example.com:27017" },
{ _id: 2, host: "secondary2.example.com:27017" }
]
})Adding Secondary Servers
// Add a secondary server
rs.add("secondary3.example.com:27017")
// Add an arbiter
rs.addArb("arbiter.example.com:27017")Checking Replica Set Status
// Check the status of the replica set
rs.status()Using a Replica Set
Connecting to a Replica Set
// Connect to the replica set
mongo --host myReplicaSet/primary.example.com,secondary1.example.com,secondary2.example.comRead Operations
By default, all read operations are handled by the primary server. We can specify the read preference using the readPreference option.
// Read from the primary server (default)
db.collection.find().readPref("primary")
// Read from primary or secondary servers
db.collection.find().readPref("primaryPreferred")
// Read from secondary servers
db.collection.find().readPref("secondary")
// Read from secondary servers, or primary if no secondary is available
db.collection.find().readPref("secondaryPreferred")
// Read from the nearest server
db.collection.find().readPref("nearest")Write Operations
All write operations are handled by the primary server. When the primary server processes a write operation, it records the operation in the operation log and syncs it to the secondary servers.
Disaster Recovery
Manual Failover
// Perform a manual failover
rs.stepDown(300)Recovering the Primary Server
When the original primary server recovers, it will join the replica set as a secondary server.
Performance Considerations
- Network Latency: The performance of a replica set depends on network latency. If network latency is high, data synchronization and failover times will be longer.
- Server Resources: The performance of a replica set depends on the resources of the servers. We should ensure that the servers have sufficient memory, CPU, and disk space.
- Oplog Size: The size of the operation log affects the performance of data synchronization. We can adjust the size of the operation log using the
oplogSizeMBoption.
Common Issues
Replication Lag
Replication lag is the time difference between the secondary servers and the primary server. If replication lag is too large, it may cause data inconsistency.
Election Failures
Election failures can cause the replica set to fail to work properly. We should ensure that there are enough arbiters and secondary servers participating in the election.
Summary
MongoDB's replication feature allows us to create multiple copies of data to improve availability and reliability. A replica set is a group of MongoDB servers that includes a primary server and multiple secondary servers. By using replica sets, we can achieve high data availability and disaster recovery. At the same time, we need to pay attention to performance factors such as network latency, server resources, and operation log size to ensure the efficient operation of the replica set.