IBM Cloud Docs
Troubleshooting performance for Databases for MongoDB

Troubleshooting performance for Databases for MongoDB

Use this guide to help you identify and resolve performance issues in your Databases for MongoDB deployment running on IBM Cloud and powered by MongoDB.

You can also find more information about solving performance problems as follows:

If your applications are experiencing slow responses, timeouts, or inconsistent database performance, consider the following steps and information.

Symptoms of performance issues

You might observe some of the following symptoms that indicate problems with performance:

  • Increased application latency
  • Slow query log entries
  • High CPU or memory utilization
  • Increased disk latency
  • Replication lag
  • Connection timeouts

Complete the following steps to determine the cause of the issues:

Step 1: Check resource utilization

  1. Log in to the IBM Cloud console and navigate to your MongoDB deployment.

  2. Review the Monitoring section for:

    • CPU utilization
    • Memory usage
    • Disk IOPS and latency
    • Active connections

What to look for:

  • CPU consistently above 75%
  • Memory consistently above 80%
  • Disk latency increasing over time
  • Connections approaching plan limits

Recommended actions:

  • Increase storage or IOPS if disk latency is high.
  • Review workload spikes in your application.

If resource usage remains elevated for sustained periods, scaling is recommended.

Step 2: Identify slow queries

Slow queries are one of the most common causes of degraded performance.

  1. Enable profiling:

    db.setProfilingLevel(1, { slowms: 100 })
    
  2. Review recent slow operations:

    db.system.profile.find().sort({ ts: -1 }).limit(20)
    
  3. Analyze query execution:

    db.collection.find({ ... }).explain("executionStats")
    

What to look for:

  • COLLSCAN (collection scan instead of index usage)
  • High totalDocsExamined compared to nReturned

Recommended actions:

  • Create appropriate indexes.
  • Use compound indexes for multi-field queries.
  • Ensure aggregation pipelines begin with $match.
  • Avoid large skip() pagination.

Step 3: Review connection usage

High or poorly managed connections can impact performance.

Check connection statistics:

db.serverStatus().connections

Recommended actions:

  • Use connection pooling in your application.
  • Avoid opening a new connection for each request.
  • Close unused cursors.

Connection limits are determined by your deployment plan.

Step 4: Check replication health

Replication lag can affect read performance and data freshness.

Check replication status:

rs.printSecondaryReplicationInfo()

Common causes of lag:

  • High write throughput
  • Disk bottlenecks
  • Network latency

Recommended actions:

  • Scale storage performance.
  • Review write concern settings.
  • Scale to a higher plan if lag is persistent.

Step 5: Sharded cluster considerations (if applicable)

You might need sharding in the following situations:

  • Working set is greater than RAM
  • Single-node IOPS maxed out even after scaling
  • Horizontal write scaling is required
  • Collections exceed 1–2 TB

For more information, see performance tuning and sharding.

If your deployment uses sharding, run:

sh.status()

Check for:

  • Uneven chunk distribution
  • Jumbo chunks
  • Traffic concentrated on a single shard

Recommended actions:

  • Review shard key selection.
  • Avoid monotonically increasing shard keys.
  • Consider hashed shard keys.

Improper shard key selection can significantly affect performance at scale.

Step 6: After large data deletions

Deleting a significant percentage of data does not immediately reduce disk usage at the operating system level.

Possible impacts:

  • Internal fragmentation
  • High disk utilization
  • Reduced performance

Recommended actions:

  • Plan compaction operations carefully.
  • Consider dump and restore for severe fragmentation.
  • Keep disk utilization below 80–85%.

Schedule maintenance activities appropriately.

Step 7: Check for lock contention

Lock contention can severely impact concurrent operations and overall throughput.

  • Check global lock statistics:

    db.serverStatus().locks
    
  • Check current operations for locks:

    db.currentOp({
      $or: [
        { waitingForLock: true },
        { "locks.Global": "w" }
      ]
    })
    
  • Analyze lock wait time:

    db.serverStatus().globalLock
    

What to look for:

  • High currentQueue values (readers or writers).
  • Operations with waitingForLock: true.
  • Long-running operations holding locks.
  • Index builds that block operations.

Common causes:

  • Long-running queries without proper indexes.
  • Large write operations.
  • Index builds on large collections.
  • Administrative commands (compact, repairDatabase).

Recommended actions:

  • Kill long-running operations if necessary:
    db.killOp(opid)
    
  • Build indexes in the background:
    db.collection.createIndex({ field: 1 }, { background: true })
    
  • Break large operations into smaller batches.
  • Schedule maintenance operations during low-traffic periods.
  • Use read concern and write concern appropriately.

Step 8: Analyze workload patterns

Understanding your workload patterns helps identify optimization opportunities.

  • Check operation counters:

    db.serverStatus().opcounters
    
  • Analyze operations over time:

    db.serverStatus().opcountersRepl
    
  • Identify hot collections:

    db.adminCommand({ top: 1 })
    
  • Check the read ratio compared to the write ratio:

    var stats = db.serverStatus().opcounters;
    print("Read ratio: " + (stats.query + stats.getmore) / (stats.query + stats.getmore + stats.insert + stats.update + stats.delete));
    

What to look for:

  • Disproportionate operations on specific collections
  • High read-to-write or write-to-read ratios
  • Sudden spikes in operation counts
  • Time-based patterns (peak hours)

Recommended actions:

  • Optimize frequently accessed collections first.
  • Consider read replicas for read-heavy workloads.
  • Use appropriate read preferences.
  • Implement caching for frequently read data.
  • Review indexing strategy for hot collections.
  • Consider sharding for write-heavy collections.

Step 9: Investigate memory pressure and cache efficiency

MongoDB's WiredTiger storage engine relies heavily on cache efficiency.

  • Check WiredTiger cache statistics:

    db.serverStatus().wiredTiger.cache
    
  • Review key metrics:

    var cache = db.serverStatus().wiredTiger.cache;
    print("Cache size: " + cache["bytes currently in the cache"]);
    print("Max cache size: " + cache["maximum bytes configured"]);
    print("Pages read into cache: " + cache["pages read into cache"]);
    print("Pages written from cache: " + cache["pages written from cache"]);
    print("Cache hit ratio: " + (1 - cache["pages read into cache"] / (cache["pages read into cache"] + cache["pages requested from the cache"])));
    
  • Check for eviction pressure:

    db.serverStatus().wiredTiger.cache["pages evicted by application threads"]
    

What to look for:

  • Cache hit ratio below 95%
  • High eviction rates
  • Cache size consistently at maximum
  • Application threads performing evictions

Estimate working set size:

db.serverStatus().wiredTiger.cache["tracked dirty bytes in the cache"]

Recommended actions:

  • Scale to a plan with more memory if the cache is consistently full.
  • Review and optimize indexes (remove unused indexes).
  • Limit result set sizes in queries.
  • Use projections to reduce document size.
  • Consider archiving old data.
  • Monitor working set size trends.

Memory allocation best practices

  • The WiredTiger cache should be 50% of available RAM (default).
  • Leave sufficient memory for other processes.
  • Monitor swap usage, which should be minimal.

Step 10: Review write concern and read preference settings

Write concern and read preference settings significantly impact performance and consistency.

  • Check current write concern:

    db.getWriteConcern()
    
  • Check replica set configuration:

    rs.conf()
    
  • Write concern options:

    Write concern options
    Write concern Durability Performance Use case
    w: 1 Low High Non-critical data, high throughput
    w: "majority" High Medium Default, balanced approach
    w: <number> Medium-High Medium-Low Specific replica count
    j: true Highest Lowest Critical data requiring journal sync
  • Read preference options:

    Read preference options
    Read preference Consistency Performance Use case
    primary Highest Medium Default, strong consistency
    primaryPreferred High Medium-High Fallback to secondary
    secondary Eventual High Analytics, reporting
    secondaryPreferred Eventual High Read scaling
    nearest Eventual Highest Lowest latency
  • Check read preference in your application:

    // Example in Node.js driver
    db.collection('users').find({}).readPreference('secondary')
    

What to look for:

  • Overly strict write concerns for non-critical data
  • Using primary read preference when eventual consistency is acceptable
  • Not leveraging secondaries for read-heavy workloads

Recommended actions:

  • Use w: 1 for high-throughput, non-critical writes.
  • Use w: "majority" for important data (default).
  • Use secondary or secondaryPreferred for analytics queries.
  • Consider nearest for geographically distributed applications.
  • Balance consistency requirements with performance needs.
  • Test different configurations under load.

Step 11: Monitor backup and maintenance impact

Backup operations and maintenance tasks can temporarily affect performance.

IBM Cloud backup schedule

Databases for MongoDB automatically does a backup. Check your backup schedule in the IBM Cloud console under Backups.

Check for ongoing backup operations:

db.currentOp({
  $or: [
    { op: "command", "command.backup": { $exists: true } },
    { desc: /^conn/ }
  ]
})

What to look for:

  • Performance degradation during backup windows
  • Increased disk I/O during backups
  • Replication lag during backups

Recommended actions:

  • Monitor performance metrics during backup times.
  • Consider scaling if backups consistently impact performance.
  • Review backup retention policies.
  • Plan for increased resource usage during restore operations.

Maintenance operation best practices

  • Schedule index builds during low-traffic periods.
  • Use background index builds when possible.
  • Monitor replication lag during maintenance.
  • Test maintenance operations in non-production first.
  • Coordinate with IBM Cloud maintenance windows.