Best practices for performance
Use this information to apply best practices to your Databases for MongoDB deployment running on IBM Cloud.
Performance troubleshooting flowchart
Use the flowchart to determine how to troubleshoot performance and steps to take next.
┌─────────────────────────────────┐
│ Performance issue detected │
└────────────┬────────────────────┘
│
▼
┌─────────────────────────────────┐
│ Check IBM Cloud Monitoring │
│ - CPU > 80%? │
│ - Memory > 80%? │
│ - Disk latency high? │
└────────────┬────────────────────┘
│
┌────┴────┐
│ YES │
▼ │
┌──────────────┐ │
│ Scale │ │
│ resources │ │
└──────────────┘ │
│ NO
▼
┌─────────────────────┐
│ Check slow queries │
│ db.system.profile │
└─────────┬───────────┘
│
┌────┴────┐
│ Found? │
▼ │
┌─────────┐ │
│ Optimize│ │
│ queries │ │
│ & indexes│ │
└─────────┘ │
│ NO
▼
┌────────────────┐
│ Check Locks │
│ currentOp() │
└────────┬───────┘
│
┌────┴────┐
│ Locked? │
▼ │
┌─────────┐ │
│ Kill or │ │
│ optimize│ │
└─────────┘ │
│ NO
▼
┌────────────────┐
│ Check cache │
│ hit ratio │
└────────┬───────┘
│
┌────┴────┐
│ < 95%? │
▼ │
┌─────────┐ │
│ Scale │ │
│ memory │ │
└─────────┘ │
│ NO
▼
┌────────────────┐
│ Check │
│ replication │
└────────┬───────┘
│
┌────┴────┐
│ Lagging?│
▼ │
┌─────────┐ │
│ Scale │ │
│ or fix │ │
└─────────┘ │
│ NO
▼
┌────────────────┐
│ Contact IBM │
│ Support │
└────────────────┘
Common anti-patterns
Avoid these common mistakes that lead to performance issues.
Query anti-patterns
1. Missing indexes
Problem:
// No index on 'email' field
db.users.find({ email: "user@example.com" })
Solution:
// Create index
db.users.createIndex({ email: 1 })
2. Inefficient regex queries
Problem:
// Case-insensitive regex without index
db.users.find({ name: /john/i })
Solution:
// Use text index or exact match
db.users.createIndex({ name: "text" })
db.users.find({ $text: { $search: "john" } })
3. Large skip() operations
Problem:
// Skipping thousands of documents
db.collection.find().skip(10000).limit(10)
Solution:
// Use range queries with indexed field
db.collection.find({ _id: { $gt: lastSeenId } }).limit(10)
4. Selecting unnecessary fields
Problem:
// Fetching entire documents
db.users.find({ status: "active" })
Solution:
// Use projection
db.users.find({ status: "active" }, { name: 1, email: 1 })
5. Inefficient aggregation pipelines
Problem:
// $match after $lookup
db.orders.aggregate([
{ $lookup: { ... } },
{ $match: { status: "completed" } }
])
Solution:
// $match first to reduce documents
db.orders.aggregate([
{ $match: { status: "completed" } },
{ $lookup: { ... } }
])
Schema design issues
1. Unbounded arrays
Problem:
// Array grows indefinitely
{
userId: 123,
activities: [/* thousands of items */]
}
Solution:
// Use separate collection or bucketing
{
userId: 123,
month: "2024-01",
activities: [/* limited items */]
}
2. Excessive embedding
Problem:
// Deeply nested documents
{
user: {
profile: {
settings: {
preferences: {
// many levels deep
}
}
}
}
}
Solution:
// Flatten or use references
{
userId: 123,
profileId: 456
}
3. Large documents
Problem:
// Documents approaching 16MB limit
{
data: "very large string...",
attachments: [/* large binary data */]
}
Solution:
// Store large data separately (GridFS or object storage)
{
dataRef: "s3://bucket/key",
attachments: [{ ref: "gridfs://id" }]
}
Connection management mistakes
1. Not using connection pooling
Problem:
// Creating new connection per request
app.get('/api/users', async (req, res) => {
const client = await MongoClient.connect(uri);
// ...
await client.close();
});
Solution:
// Reuse connection pool
const client = new MongoClient(uri, { maxPoolSize: 50 });
await client.connect();
app.get('/api/users', async (req, res) => {
const db = client.db();
// ...
});
2. Not closing cursors
Problem:
// Cursor left open
const cursor = db.collection.find();
// Never closed
Solution:
// Always close cursors
const cursor = db.collection.find();
try {
await cursor.forEach(doc => { /* process */ });
} finally {
await cursor.close();
}
3. Too many connections
Problem:
// One connection per user session
const connections = new Map();
users.forEach(user => {
connections.set(user.id, new MongoClient(uri));
});
Solution:
// Share connection pool across application
const client = new MongoClient(uri);
// All users share the same pool
Indexing pitfalls
1. Too many indexes
Problem:
// Index on every field
db.collection.createIndex({ field1: 1 })
db.collection.createIndex({ field2: 1 })
db.collection.createIndex({ field3: 1 })
// ... 20+ indexes
Impact: Slows down writes and increases storage.
Solution: Keep only necessary indexes and use compound indexes.
2. Wrong index order in compound indexes
Problem:
// Query: { status: "active", createdAt: { $gt: date } }
// Index: { createdAt: 1, status: 1 } // Wrong order
Solution:
// Correct order: equality first, range second
db.collection.createIndex({ status: 1, createdAt: 1 })
3. Not using covered queries
Problem:
// Index exists but query not covered
db.users.createIndex({ email: 1 })
db.users.find({ email: "user@example.com" }, { name: 1, email: 1 })
// Still fetches documents
Solution:
// Include all projected fields in index
db.users.createIndex({ email: 1, name: 1 })
db.users.find({ email: "user@example.com" }, { name: 1, email: 1, _id: 0 })
Appendix: metrics thresholds
Recommended thresholds for key performance metrics.
| Metric | Warning threshold | Critical threshold | Recommended action |
|---|---|---|---|
| CPU utilization | > 75% | > 90% | Scale CPU cores |
| Memory utilization | > 80% | > 95% | Scale memory allocation |
| Disk utilization | > 80% | > 90% | Scale disk space |
| Disk IOPS | > 80% of limit | > 95% of limit | Increase disk size for more IOPS |
| Active connections | > 80% of limit | > 95% of limit | Scale plan or optimize connection pooling |
| Replication lag | > 5 seconds | > 30 seconds | Investigate and scale if needed |
| Cache hit ratio | < 95% | < 90% | Scale memory or optimize queries |
| Query execution time | > 100ms (avg) | > 1000ms (avg) | Optimize queries and indexes |
| Lock wait time | > 100ms | > 1000ms | Optimize operations and kill long-running queries |
| Page faults | > 100/sec | > 1000/sec | Scale memory |
| Network latency | > 10ms | > 50ms | Check network configuration |
| Backup duration | > 1 hour | > 4 hours | Consider scaling or optimization |
Monitoring frequency recommendations
| Metric category | Check frequency | Retention period |
|---|---|---|
| Resource utilization | Every 1 minute | 30 days |
| Query performance | Every 5 minutes | 14 days |
| Replication status | Every 1 minute | 30 days |
| Connection statistics | Every 5 minutes | 14 days |
| Backup status | Every 1 hour | 90 days |
| Disk growth | Every 1 hour | 90 days |
Alert configuration examples
CPU alert
Condition: CPU > 80% for 10 consecutive minutes
Action: Send notification to ops team
Escalation: Page on-call if > 90% for 15 minutes
Memory alert
Condition: Memory > 85% for 15 consecutive minutes
Action: Send notification to ops team
Escalation: Auto-scale if > 95% for 10 minutes
Replication lag alert
Condition: Lag > 10 seconds
Action: Send notification immediately
Escalation: Page on-call if > 60 seconds
Disk space alert
Condition: Disk > 80%
Action: Send notification to ops team
Escalation: Create incident if > 90%
Best practices summary
| Area | Recommendation |
|---|---|
| Indexing | Regularly review and remove unused indexes |
| Monitoring | Configure alerts for CPU, memory, disk, and replication lag |
| Capacity planning | Keep disk usage below 80% and scale proactively |
| Query design | Use explain plans during development |
| Scaling | Scale proactively before saturation |
| Connection pooling | Use connection pools and avoid per-request connections |
| Read preferences | Use secondaries for read-heavy workloads |
| Write concern | Balance durability with performance needs |
| Schema design | Avoid unbounded arrays and excessive embedding |
| Backup planning | Schedule during low-traffic periods |
| Network | Use private endpoints for IBM Cloud workloads |
| Security | Rotate credentials regularly and use IP allowlisting |
| Documentation | Document baseline metrics and normal patterns |
| Testing | Test performance changes in non-production first |
| Support | Gather diagnostics before contacting support |