Step-by-Step BenLTscale Benchmarking: Real-World Example and Results
Overview
This article walks through a practical benchmarking run using BenLTscale — a hypothetical distributed load-testing tool — showing setup, test design, execution, analysis, and conclusions. Assumed defaults: testing a REST API endpoint that returns JSON, target environment hosted on AWS, 100 concurrent users peak, 1,000 total requests, test duration 10 minutes.
Goals
- Primary: Measure throughput (RPS), average latency, and error rate under load.
- Secondary: Identify bottlenecks, validate autoscaling, and collect resource metrics (CPU, memory).
Test environment
- SUT (system under test): API server behind an Application Load Balancer (ALB), autoscaling group (2–10 instances), t3.medium equivalent.
- BenLTscale controller: single manager node (m5.large) coordinating workers.
- BenLTscale workers: 5 distributed workers (c5.large equivalent) in same region.
- Monitoring: Prometheus + Grafana for metrics, CloudWatch for autoscaling events, and server-side logs.
Test design
- Endpoint: POST /v1/orders (JSON payload ~2 KB).
- Authentication: Bearer token header.
- Warm-up: 60 seconds at 10% load.
- Ramp-up: linear increase to 100 concurrent users over 4 minutes.
- Steady state: maintain 100 concurrent users for 4 minutes.
- Ramp-down: 60 seconds.
- Total duration: ~10 minutes.
- Assertions: error rate <1%, 95th percentile latency <500 ms.
BenLTscale configuration (example)
- Test plan: 5 workers, each spawning 20 virtual users, total 100.
- Think time: random 200–500 ms between requests.
- Request timeout: 10s.
- Payload generator: fixed sample order JSON.
- Metrics export: push to Prometheus gateway every 5s.
Example BenLTscale YAML snippet:
yaml
test: name: orders-load-test duration: 10m warmup: 1m ramp_up: 4m ramp_down: 1m workers: 5 users_per_worker: 20 request: endpoint: https://api.example.com/v1/orders method: POST headers: Authorization: “Bearer” body_file: order_sample.json think_time: [200,500] metrics: prometheus_push_interval: 5s export_tags: [instance_id, region]
Execution steps
- Provision BenLTscale controller and workers in same region as SUT.
- Upload test plan and payload to controller.
- Start Prometheus and Grafana dashboards; ensure CloudWatch export is enabled.
- Run a short smoke test (10 users, 1 minute) to validate authentication and payload.
- Execute the full test plan.
- Collect BenLTscale logs, Prometheus metrics, server logs, and autoscaling events.
Real-world results (example run)
Summary metrics:
- Total requests: 1,000
- Average throughput: 100 RPS (during steady state)
- Average latency: 210 ms
- Median latency (50th): 180 ms
- 95th percentile latency: 470 ms
- 99th percentile latency: 820 ms
- Error rate: 0.8% (8 errors, mostly 503 from occasional instance restarts)
- CPU average (instances): 68% during steady state
- Memory average: 54%
Grafana snapshot highlights:
- Smooth ramp-up in RPS with small spikes at 3:30 and 7:10.
- Latency correlated with a scale-in event at 6:45 causing 503s for ~30s.
- Request queue length briefly increased from 0 to 15 during scale event.
Analysis
- Performance met the 95th percentile latency goal (470 ms < 500 ms).
- 99th percentile exceeded target due to transient errors during autoscaling.
- Error rate under 1% but concentrated around a scale-in. Root cause likely graceful shutdown not draining traffic quickly enough.
- CPU at 68% indicates healthy utilization; reducing instance size might risk higher latency under spikes.
Actionable recommendations
- Implement connection draining with a longer timeout during instance termination to avoid 503s.
- Add a brief cool-down before scale-in or adjust autoscaling policy to scale earlier using CPU + request queue metrics.
- Reduce think time slightly or increase worker count for more realistic pacing if production traffic has shorter intervals.
- Retest after changes, adding a longer steady-state run (30m) and higher peak concurrency to validate stability.
Conclusion
This BenLTscale benchmark showed the system meets primary latency targets but revealed autoscaling-induced tail-latency issues. Apply connection draining and autoscaling tuning, then rerun the test with an extended steady state and higher load to confirm improvements.
Leave a Reply