Amazon EBS latency: understanding, measuring, and fixing slow storage performance

Ott Salmar

Co-Founder | Hykell

Your database queries are crawling. Application logs show intermittent timeouts. Yet your monitoring dashboard shows 100% CPU utilization—on the storage side. EBS latency doesn’t announce itself with clear error messages; it quietly degrades user experience, inflates infrastructure costs, and forces you to overprovision just to maintain acceptable performance.

Understanding EBS latency means understanding where your storage actually spends time. Matching IOPS limits, queue depth, burst credits, and EC2 instance capabilities to your workload’s I/O pattern requires systematic measurement and targeted remediation. This guide walks you through the CloudWatch metrics that reveal latency bottlenecks, the root causes behind them, and the step-by-step fixes that restore performance without wasting AWS spend.

What EBS latency actually measures

EBS latency is the time between when your application issues an I/O request and when that request completes. Unlike throughput (measured in MiB/s) or IOPS (operations per second), latency captures the real-world delay your users experience. A database query waiting 50 milliseconds for a single read operation feels sluggish, even if your volume delivers thousands of IOPS on paper.

Blackboard-style diagram of EBS latency: I/O request to EBS volume with queue depth and IOPS limit indicator.

Latency varies significantly by volume type. General Purpose SSD (gp3) targets single-digit millisecond latency for typical workloads, while Provisioned IOPS SSD (io2) guarantees sub-millisecond response times at high percentiles. Throughput Optimized HDD (st1) and Cold HDD (sc1) sacrifice latency for cost, often seeing tens of milliseconds per operation—acceptable for large sequential reads but painful for the random I/O patterns common in transactional databases.

The complication: latency isn’t static. It fluctuates based on queue depth, burst credit depletion, IOPS limits, and even the EC2 instance type you’ve chosen. A volume performing beautifully at 1,000 IOPS can suddenly lag when demand spikes to 5,000 IOPS if you haven’t provisioned enough capacity. Understanding these dynamics is crucial for maintaining consistent performance, which our EBS performance insights explore in depth.

CloudWatch metrics that expose EBS latency

AWS CloudWatch publishes several metrics that reveal EBS latency behavior. These metrics are uniquely defined by name, namespace, and dimensions, with data points aggregated over specified time periods. Understanding which metrics matter—and how to interpret them—separates effective troubleshooting from blind guesswork.

Blackboard-style chart of CloudWatch EBS metrics: latency stopwatch, IOPS (ops/sec) and throughput (MiB/s).

VolumeReadBytes and VolumeWriteBytes report the total bytes read from or written to your EBS volume during the period. Dividing by the period duration gives you throughput in bytes per second. If your application performs large sequential reads (like video streaming or data analytics), high throughput with acceptable latency indicates healthy performance. Low throughput paired with high latency suggests a bottleneck.

VolumeReadOps and VolumeWriteOps count the number of completed read and write operations in the specified period. To calculate IOPS, divide the total operations by the number of seconds: for basic monitoring (5-minute intervals), divide by 300; for detailed monitoring (1-minute intervals), divide by 60. CloudWatch metric math offers a cleaner approach with the formula m1/(DIFF_TIME(m1)) where m1 represents the graphed metric. Comparing observed IOPS against your provisioned limits tells you if you’re hitting a ceiling. A gp3 volume provisioned for 3,000 IOPS that consistently reports 3,000 IOPS is maxed out—any additional I/O requests will queue, increasing latency.

VolumeThroughputPercentage shows how much of your provisioned throughput you’re consuming. A value near 100% indicates you’re saturating your volume’s throughput capacity, which drives up latency for subsequent requests. VolumeConsumedReadWriteOps measures the total IOPS used, including those that may have queued or been throttled due to burst credit depletion on gp2 or st1/sc1 volumes.

VolumeQueueLength is one of the most direct indicators of latency trouble. It measures the number of pending I/O requests waiting to be serviced by your EBS volume. A consistently high queue length—say, above 10 for a gp3 or io2 volume—suggests your application is issuing I/O faster than the volume can handle. As the queue grows, each request waits longer, and average latency climbs. For databases and latency-sensitive applications, you want queue length to hover near 1 during normal load. Spikes during traffic bursts are expected, but sustained high queue length means you need to either increase provisioned IOPS or reduce the application’s I/O intensity.

BurstBalance (applicable only to gp2, st1, and sc1 volumes) tracks your remaining burst credits. These volume types allow short bursts above baseline performance by consuming credits. When credits run low, performance drops to the baseline, which can be significantly lower than burst capacity. A gp2 volume under 1 TiB might burst to 3,000 IOPS but sustain only a few hundred IOPS once credits deplete. This sudden performance cliff translates directly to increased latency. Monitoring BurstBalance helps you predict when performance will degrade. If BurstBalance trends toward zero during peak hours, migrating to gp3—which eliminates burst credits and provides consistent baseline performance—prevents latency spikes tied to credit exhaustion.

CloudWatch doesn’t publish a single “latency” metric by default, but you can derive average I/O size and infer latency characteristics by combining metrics. For example, on Nitro-based instances, the average read size formula is (Sum(VolumeReadBytes) / Sum(VolumeReadOps)) / 1024, which reveals whether your workload performs small random reads (typically higher latency per operation) or large sequential reads (lower latency per byte transferred). AWS also exposes latency histograms for certain volume types, showing the distribution of I/O completion times. These histograms reveal tail latency—the worst-case delays that affect a small percentage of requests but disproportionately impact user experience.

Common causes of high EBS latency

EBS latency spikes rarely have a single cause. They emerge from the interaction of workload behavior, volume configuration, EC2 instance limits, and architectural choices. Identifying the root cause requires correlating multiple signals.

Hitting provisioned IOPS or throughput limits is the most straightforward bottleneck. Every EBS volume type has defined IOPS and throughput ceilings. A gp3 volume defaults to 3,000 IOPS and 125 MiB/s, but you can provision up to 16,000 IOPS and 1,000 MiB/s independently. If your application consistently demands 10,000 IOPS but you’ve only provisioned 3,000, the excess I/O requests queue and wait, increasing latency. Check your CloudWatch metrics: if VolumeReadOps or VolumeWriteOps divided by the period duration approaches your provisioned IOPS, and VolumeQueueLength climbs, you’re saturating the volume. The fix is straightforward—provision more IOPS—but it comes with cost implications. Balancing performance and spend means provisioning just enough IOPS to keep queue length low during normal load, with headroom for traffic spikes.

Burst credit depletion affects legacy gp2 volumes and HDD-based st1/sc1 volumes. Gp2 volumes earn burst credits based on volume size. Smaller gp2 volumes (under 1 TiB) earn credits slowly and can sustain only 100 IOPS per GiB as a baseline. During intensive I/O—say, a database rebuild or batch ETL job—you can burn through credits in minutes. Once depleted, performance drops to baseline, and latency jumps. If BurstBalance correlates inversely with latency, credits are your problem. Migrating to gp3 eliminates burst mechanics entirely, delivering predictable performance regardless of workload intensity.

Insufficient queue depth prevents your application from utilizing available IOPS. Queue depth is the number of outstanding I/O requests your application submits to the storage layer simultaneously. If your application issues one I/O request at a time and waits for each to complete before issuing the next, you’re effectively running at queue depth 1. This serialized pattern prevents the volume from reaching its full IOPS potential, leaving provisioned capacity idle while latency remains high due to round-trip delays. For high-IOPS workloads, you need queue depth proportional to your target IOPS. A rough guideline: queue depth should be at least your target IOPS divided by 1,000, but not so high that it overwhelms the volume. Tuning queue depth involves application-level changes (increasing concurrency or async I/O), OS-level tuning (adjusting I/O scheduler settings), and ensuring your EC2 instance can handle the parallelism.

EC2 instance EBS bandwidth limits create hidden ceilings. Your EBS volume’s performance doesn’t exist in isolation. The EC2 instance type determines the maximum EBS bandwidth available. A t3.medium instance, for example, caps EBS throughput at 2,085 Mbps—roughly 260 MiB/s. Even if you attach an io2 volume provisioned for 64,000 IOPS and 4,000 MiB/s, the instance bottleneck limits actual performance. Always verify your instance’s EBS bandwidth capacity matches or exceeds your volume’s provisioned throughput. Larger instances or those in the EBS-optimized family provide dedicated bandwidth for storage, preventing network traffic from competing with storage I/O, as discussed in our AWS EC2 performance tuning guide.

Cross-AZ latency and placement introduce network overhead. EBS volumes exist within a single Availability Zone and must be attached to instances in the same AZ for optimal performance. If you’re accessing data across AZs—perhaps through an application-level replication layer—network latency adds to storage latency. While AWS’s network fabric is fast, cross-AZ hops introduce a few milliseconds of overhead per request. Co-locating your EC2 instance and EBS volume in the same AZ minimizes this overhead. For applications requiring low tail latency, placement groups can further reduce network jitter, though they don’t directly affect storage latency.

Noisy neighbors and resource contention, while rare, can cause unexplained spikes. Although AWS abstracts much of the underlying infrastructure, EBS volumes share physical hardware. During periods of high demand on the shared substrate, you may experience temporary latency increases. These effects are uncommon with modern Provisioned IOPS volumes (io2), which offer 99.999% durability and more consistent latency guarantees. If you observe unexplained latency spikes that don’t correlate with your own workload or CloudWatch metrics, consider testing at different times or moving to a higher-tier volume type designed for predictable performance.

Step-by-step troubleshooting workflow

When latency becomes a problem, a methodical approach prevents wasted effort. This workflow isolates the cause and applies targeted fixes.

Blackboard-style comparison of queue length 1 vs 10 showing lower vs higher EBS latency.

Establish your baseline. Before you can identify abnormal latency, you need to know what “normal” looks like for your workload. Use CloudWatch to review VolumeReadOps, VolumeWriteOps, VolumeQueueLength, and VolumeThroughputPercentage over the past week. Note typical IOPS levels, average queue length, and throughput during both peak and off-peak hours. This baseline tells you whether your current issue is a sudden anomaly or a gradual degradation. It also helps you avoid overprovisioning—paying for 16,000 IOPS when your workload consistently uses only 2,000.

Check IOPS and throughput utilization. Compare your observed IOPS (calculated from VolumeReadOps and VolumeWriteOps) against your provisioned limits. If you’re at or near 100% utilization, you’ve found your bottleneck. Check VolumeThroughputPercentage as well—some workloads hit throughput limits before IOPS limits, especially large sequential I/O patterns. Use CloudWatch metric math to create custom views that divide total ops by period duration for a clear IOPS-per-second value. This makes it easier to spot patterns and correlate with application logs.

Monitor VolumeQueueLength. A high queue length is the smoking gun for capacity issues. If queue length consistently exceeds 10 and correlates with user-reported slowness, your volume can’t keep up with demand. The solution depends on the cause: if you’re hitting IOPS limits, provision more IOPS; if you’re at your EC2 instance’s bandwidth limit, upgrade the instance. Queue length also reveals whether the problem is sustained or bursty. Occasional spikes during traffic peaks may be acceptable if they resolve quickly. Sustained high queue length demands immediate action.

Verify burst credits if applicable. For gp2, st1, or sc1 volumes, check BurstBalance. If balance hovers near zero during the latency issue, burst credit depletion is your culprit. The quickest fix is migrating to gp3, which offers consistent baseline performance without burst mechanics. The alternative—increasing volume size to earn more credits—is less cost-effective and still subject to burst exhaustion under heavy load.

Review EC2 instance bandwidth and optimize settings. Navigate to your instance type’s specifications and confirm the maximum EBS bandwidth. If your volume’s provisioned throughput exceeds the instance’s capability, you’re bottlenecked at the instance level. Upgrading to a larger instance or one with EBS-optimized bandwidth eliminates this constraint. Also verify that EBS-optimized mode is enabled for your instance. Older instance types require explicitly enabling this feature to gain dedicated storage bandwidth. On modern Nitro-based instances, it’s enabled by default, but confirming prevents overlooking a simple fix.

Test with synthetic workloads. If CloudWatch metrics don’t clearly point to a bottleneck, run controlled tests using fio (Flexible I/O Tester). This tool lets you simulate various I/O patterns and measure latency directly. For example, a random read test with a queue depth matching your provisioned IOPS should achieve latencies near the volume type’s expected range. Compare fio results against your provisioned IOPS and instance bandwidth. Significant discrepancies indicate misconfiguration or unrealistic expectations.

Correlate with application-level metrics. CloudWatch reveals storage-layer performance, but application-level observability is essential for understanding user impact. Use distributed tracing (AWS X-Ray) or APM tools to identify which application requests trigger high-latency storage I/O. You may discover that a slow query or inefficient code path generates excessive I/O, amplifying latency problems. Combining cloud native application monitoring with CloudWatch metrics gives you the full picture—from user request to storage operation.

Remediation strategies for reducing EBS latency

Once you’ve identified the root cause, apply the appropriate fix. Some solutions are quick configuration changes; others require architectural adjustments.

Increase provisioned IOPS or throughput when you’re hitting volume limits. For gp3, you can independently adjust IOPS and throughput without changing volume size using the AWS CLI:

aws ec2 modify-volume --volume-id vol-xxxxxx --iops 10000 --throughput 500

Monitor the modification state until it reaches “optimizing” and then “completed.” Performance improvements are often immediate, but the volume remains online throughout the process. This approach incurs additional cost—each provisioned IOPS adds to your monthly bill—so provision only what your workload requires, with a buffer for spikes.

Migrate from gp2 to gp3 if burst credit depletion is driving latency spikes. Gp3 provides 3,000 IOPS and 125 MiB/s as a baseline regardless of volume size, and you can scale IOPS and throughput independently. This eliminates the performance cliff associated with gp2 burst credit exhaustion. The migration is straightforward: modify the existing volume’s type from gp2 to gp3 using the AWS console or CLI. Many workloads see immediate latency improvements and cost reductions, as gp3 is typically ~20% cheaper per GiB than gp2.

Upgrade to io2 for mission-critical workloads that demand single-digit millisecond latency at high percentiles. Io2 volumes support up to 64,000 IOPS and 1,000 MiB/s per volume, with 99.999% durability and consistent low latency. The trade-off is cost: io2 charges per GiB plus per provisioned IOPS. It’s justified for transactional databases (OLTP), high-frequency trading platforms, or any workload where tail latency directly impacts revenue or user experience. Understanding AWS performance SLAs helps you determine when io2’s guarantees are worth the investment.

Tune queue depth and I/O patterns at the application layer. If your workload runs at queue depth 1, increasing concurrency can dramatically improve IOPS utilization and reduce effective latency. This often requires application changes—batching queries, enabling asynchronous I/O, or parallelizing data access. At the OS level, tuning the I/O scheduler (e.g., switching from cfq to noop or deadline on Linux) can reduce latency for SSD-backed volumes. Also adjust read-ahead settings and mount options (like noatime to reduce unnecessary writes) to optimize performance.

Right-size EC2 instances for EBS bandwidth. If your EC2 instance is the bottleneck, upgrading to a larger instance type or switching to an EBS-optimized instance provides more bandwidth. For example, moving from a c5.large (2,250 Mbps EBS bandwidth) to a c5.2xlarge (9,000 Mbps) quadruples your storage throughput ceiling. Consider whether your workload benefits from Nitro-based instances, which offer lower latency and higher throughput for EBS operations.

Implement multi-volume striping (RAID 0) when a single volume can’t meet your needs. If even maximum provisioning falls short, you can stripe multiple EBS volumes together using software RAID. This aggregates IOPS and throughput across volumes, effectively multiplying capacity. For example, striping four gp3 volumes each provisioned for 10,000 IOPS yields a 40,000 IOPS logical volume. The downside is increased complexity (more volumes to manage, higher risk if one fails) and cost. It’s a solution for extreme performance requirements, not a general-purpose recommendation.

Use caching layers to reduce I/O demand. Not all I/O needs to hit EBS. Implementing an in-memory cache (Amazon ElastiCache, application-level caching) can absorb read-heavy workloads, reducing the number of I/O requests that reach your EBS volume. This lowers queue length, reduces IOPS consumption, and improves overall latency. For write-heavy workloads, consider batching writes or using asynchronous replication to smooth out I/O spikes. These architectural changes require careful design but can yield significant latency and cost improvements, as explored in our guide to cloud latency reduction techniques.

Continuous monitoring and optimization

Fixing EBS latency isn’t a one-time task. Workloads evolve, traffic patterns shift, and what works today may not suffice next quarter. Continuous monitoring and automated optimization prevent latency from creeping back.

Set up CloudWatch alarms for proactive alerting. Configure CloudWatch alarms to notify you when VolumeQueueLength exceeds acceptable thresholds, VolumeThroughputPercentage approaches 100%, or BurstBalance (for gp2) drops below a critical level. Proactive alerts let you address issues before they impact users. For example, set an alarm if queue length stays above 10 for more than five consecutive minutes during business hours. This triggers investigation while the issue is still manageable, rather than after users start reporting slow response times.

Regularly review CloudWatch dashboards. Create custom CloudWatch dashboards that display EBS metrics alongside application-level metrics (query latency, error rates, user session duration). This holistic view reveals how storage performance affects business outcomes. Schedule monthly or quarterly reviews to identify trends—gradual increases in IOPS utilization, shifting workload patterns, or opportunities to downsize over-provisioned volumes. These practices align with cloud performance benchmarking methodologies that establish baselines and track improvements over time.

Automate right-sizing and cost optimization. Manual tuning is tedious and error-prone. Automate volume modifications using AWS Lambda functions triggered by CloudWatch metrics or scheduled reviews. For instance, a Lambda function can analyze utilization data and automatically adjust IOPS provisioning within predefined boundaries. Hykell’s automated cloud cost optimization services take this further, continuously analyzing your EBS usage and applying best practices without manual intervention. By correlating performance metrics with cost data, you can optimize for both latency and spend—achieving sub-10ms response times without overprovisioning.

Integrate EBS optimization into your broader performance strategy. EBS latency doesn’t exist in isolation. It interacts with EC2 instance performance, network configuration, application architecture, and database query patterns. Effective optimization requires a comprehensive approach that addresses each layer, as outlined in our framework for cloud resource utilization analysis. Consider how EBS performance fits into your overall strategy: reducing storage latency by 20ms matters less if your application spends 500ms on an inefficient database query.

Balancing performance and cost

Every IOPS you provision, every instance upgrade, and every volume type change carries a cost. The goal isn’t to achieve the lowest possible latency at any expense—it’s to deliver the performance your users require while controlling spend.

Start by defining acceptable latency thresholds for your application. A real-time trading platform may require sub-5ms latency at the 99th percentile; a batch ETL job may tolerate 50ms without issue. Provisioning io2 volumes for the ETL job wastes money; underprovisioning the trading platform risks losing revenue. Use CloudWatch metrics to measure actual latency, not assumptions. Provision just enough capacity to meet your SLAs with a reasonable buffer for traffic spikes. Regularly review utilization—if you’re consistently using only 60% of provisioned IOPS, downsize and save the difference.

Hykell specializes in this balance. Our automated optimization identifies underutilized EBS volumes, recommends right-sized configurations, and implements changes on autopilot—all while monitoring performance to ensure no degradation. This cost-aware approach to EBS latency management aligns with the FinOps framework, turning cloud spending into a strategic investment rather than a runaway expense.

Achieving predictable EBS performance

Amazon EBS latency emerges from the interplay of volume type, provisioned capacity, instance bandwidth, burst credits, queue depth, and application behavior. Effective troubleshooting starts with understanding the CloudWatch metrics that reveal bottlenecks—VolumeQueueLength, IOPS utilization, burst balance, and throughput percentage. Once you’ve identified the root cause, apply targeted remediation: provision more IOPS, migrate to gp3, upgrade to io2, tune queue depth, or right-size your EC2 instance.

Don’t stop at a one-time fix. Continuous monitoring, automated alerting, and regular reviews ensure latency stays within acceptable bounds as your workload evolves. Remember, optimization isn’t just about performance—it’s about maximizing value. Overprovisioned IOPS waste money; underprovisioned volumes frustrate users. The right approach balances latency, cost, and reliability.

Ready to optimize your EBS performance without the guesswork? Hykell’s automated cloud cost optimization analyzes your storage utilization, identifies savings opportunities, and implements best practices to reduce your AWS costs by up to 40%—all while keeping latency in check. Discover how much you can save and let us handle the heavy lifting so you can focus on building great applications.