Troubleshooting performance for Databases for MongoDB

Use this guide to help you identify and resolve performance issues in your Databases for MongoDB deployment running on IBM Cloud and powered by MongoDB.

You can also find more information about solving performance problems as follows:

If your applications are experiencing slow responses, timeouts, or inconsistent database performance, consider the following steps and information.

Symptoms of performance issues

You might observe some of the following symptoms that indicate problems with performance:

Increased application latency
Slow query log entries
High CPU or memory utilization
Increased disk latency
Replication lag
Connection timeouts

Complete the following steps to determine the cause of the issues:

Step 1: Check resource utilization

Log in to the IBM Cloud console and navigate to your MongoDB deployment.
Review the Monitoring section for:
- CPU utilization
- Memory usage
- Disk IOPS and latency
- Active connections

What to look for:

CPU consistently above 75%
Memory consistently above 80%
Disk latency increasing over time
Connections approaching plan limits

Recommended actions:

Increase storage or IOPS if disk latency is high.
Review workload spikes in your application.

If resource usage remains elevated for sustained periods, scaling is recommended.

Step 2: Identify slow queries

Slow queries are one of the most common causes of degraded performance.

Enable profiling:

db.setProfilingLevel(1, { slowms: 100 })

Review recent slow operations:

db.system.profile.find().sort({ ts: -1 }).limit(20)

Analyze query execution:

db.collection.find({ ... }).explain("executionStats")

What to look for:

COLLSCAN (collection scan instead of index usage)
High totalDocsExamined compared to nReturned

Recommended actions:

Create appropriate indexes.
Use compound indexes for multi-field queries.
Ensure aggregation pipelines begin with $match.
Avoid large skip() pagination.

Step 3: Review connection usage

High or poorly managed connections can impact performance.

Check connection statistics:

db.serverStatus().connections

Recommended actions:

Use connection pooling in your application.
Avoid opening a new connection for each request.
Close unused cursors.

Connection limits are determined by your deployment plan.

Step 4: Check replication health

Replication lag can affect read performance and data freshness.

Check replication status:

rs.printSecondaryReplicationInfo()

Common causes of lag:

High write throughput
Disk bottlenecks
Network latency

Recommended actions:

Scale storage performance.
Review write concern settings.
Scale to a higher plan if lag is persistent.

Step 5: Sharded cluster considerations (if applicable)

You might need sharding in the following situations:

Working set is greater than RAM
Single-node IOPS maxed out even after scaling
Horizontal write scaling is required
Collections exceed 1–2 TB

For more information, see performance tuning and sharding.

If your deployment uses sharding, run:

sh.status()

Check for:

Uneven chunk distribution
Jumbo chunks
Traffic concentrated on a single shard

Recommended actions:

Review shard key selection.
Avoid monotonically increasing shard keys.
Consider hashed shard keys.

Improper shard key selection can significantly affect performance at scale.

Step 6: After large data deletions

Deleting a significant percentage of data does not immediately reduce disk usage at the operating system level.

Possible impacts:

Internal fragmentation
High disk utilization
Reduced performance

Recommended actions:

Plan compaction operations carefully.
Consider dump and restore for severe fragmentation.
Keep disk utilization below 80–85%.

Schedule maintenance activities appropriately.

Step 7: Check for lock contention

Lock contention can severely impact concurrent operations and overall throughput.

Check global lock statistics:
```
db.serverStatus().locks
```

Check current operations for locks:

db.currentOp({
  $or: [
    { waitingForLock: true },
    { "locks.Global": "w" }
  ]
})

Analyze lock wait time:
```
db.serverStatus().globalLock
```

What to look for:

High currentQueue values (readers or writers).
Operations with waitingForLock: true.
Long-running operations holding locks.
Index builds that block operations.

Common causes:

Long-running queries without proper indexes.
Large write operations.
Index builds on large collections.
Administrative commands (compact, repairDatabase).

Recommended actions:

Kill long-running operations if necessary:
```
db.killOp(opid)
```

Build indexes in the background:

db.collection.createIndex({ field: 1 }, { background: true })

Break large operations into smaller batches.
Schedule maintenance operations during low-traffic periods.
Use read concern and write concern appropriately.

Step 8: Analyze workload patterns

Understanding your workload patterns helps identify optimization opportunities.

Check operation counters:
```
db.serverStatus().opcounters
```
Analyze operations over time:
```
db.serverStatus().opcountersRepl
```
Identify hot collections:
```
db.adminCommand({ top: 1 })
```

Check the read ratio compared to the write ratio:

var stats = db.serverStatus().opcounters;
print("Read ratio: " + (stats.query + stats.getmore) / (stats.query + stats.getmore + stats.insert + stats.update + stats.delete));

What to look for:

Disproportionate operations on specific collections
High read-to-write or write-to-read ratios
Sudden spikes in operation counts
Time-based patterns (peak hours)

Recommended actions:

Optimize frequently accessed collections first.
Consider read replicas for read-heavy workloads.
Use appropriate read preferences.
Implement caching for frequently read data.
Review indexing strategy for hot collections.
Consider sharding for write-heavy collections.

Step 9: Investigate memory pressure and cache efficiency

MongoDB's WiredTiger storage engine relies heavily on cache efficiency.

Check WiredTiger cache statistics:
```
db.serverStatus().wiredTiger.cache
```

Review key metrics:

var cache = db.serverStatus().wiredTiger.cache;
print("Cache size: " + cache["bytes currently in the cache"]);
print("Max cache size: " + cache["maximum bytes configured"]);
print("Pages read into cache: " + cache["pages read into cache"]);
print("Pages written from cache: " + cache["pages written from cache"]);
print("Cache hit ratio: " + (1 - cache["pages read into cache"] / (cache["pages read into cache"] + cache["pages requested from the cache"])));

Check for eviction pressure:

db.serverStatus().wiredTiger.cache["pages evicted by application threads"]

What to look for:

Cache hit ratio below 95%
High eviction rates
Cache size consistently at maximum
Application threads performing evictions

Estimate working set size:

db.serverStatus().wiredTiger.cache["tracked dirty bytes in the cache"]

Recommended actions:

Scale to a plan with more memory if the cache is consistently full.
Review and optimize indexes (remove unused indexes).
Limit result set sizes in queries.
Use projections to reduce document size.
Consider archiving old data.
Monitor working set size trends.

Memory allocation best practices

The WiredTiger cache should be 50% of available RAM (default).
Leave sufficient memory for other processes.
Monitor swap usage, which should be minimal.

Step 10: Review write concern and read preference settings

Write concern and read preference settings significantly impact performance and consistency.

Check current write concern:
```
db.getWriteConcern()
```
Check replica set configuration:
```
rs.conf()
```

Write concern options:

Write concern options
Write concern	Durability	Performance	Use case
`w: 1`	Low	High	Non-critical data, high throughput
`w: "majority"`	High	Medium	Default, balanced approach
`w: <number>`	Medium-High	Medium-Low	Specific replica count
`j: true`	Highest	Lowest	Critical data requiring journal sync

Read preference options:

Read preference options
Read preference	Consistency	Performance	Use case
`primary`	Highest	Medium	Default, strong consistency
`primaryPreferred`	High	Medium-High	Fallback to secondary
`secondary`	Eventual	High	Analytics, reporting
`secondaryPreferred`	Eventual	High	Read scaling
`nearest`	Eventual	Highest	Lowest latency

Check read preference in your application:

// Example in Node.js driver
db.collection('users').find({}).readPreference('secondary')

What to look for:

Overly strict write concerns for non-critical data
Using primary read preference when eventual consistency is acceptable
Not leveraging secondaries for read-heavy workloads

Recommended actions:

Use w: 1 for high-throughput, non-critical writes.
Use w: "majority" for important data (default).
Use secondary or secondaryPreferred for analytics queries.
Consider nearest for geographically distributed applications.
Balance consistency requirements with performance needs.
Test different configurations under load.

Step 11: Monitor backup and maintenance impact

Backup operations and maintenance tasks can temporarily affect performance.

IBM Cloud backup schedule

Databases for MongoDB automatically does a backup. Check your backup schedule in the IBM Cloud console under Backups.

Check for ongoing backup operations:

db.currentOp({
  $or: [
    { op: "command", "command.backup": { $exists: true } },
    { desc: /^conn/ }
  ]
})

What to look for:

Performance degradation during backup windows
Increased disk I/O during backups
Replication lag during backups

Recommended actions:

Monitor performance metrics during backup times.
Consider scaling if backups consistently impact performance.
Review backup retention policies.
Plan for increased resource usage during restore operations.

Maintenance operation best practices

Schedule index builds during low-traffic periods.
Use background index builds when possible.
Monitor replication lag during maintenance.
Test maintenance operations in non-production first.
Coordinate with IBM Cloud maintenance windows.