High-Performance Load Testing Suite for Web Durability and Speed
Resilio is a professional-grade performance engineering toolkit designed for QA Engineers, Developers, and DevOps practitioners. It provides a structured, technology-agnostic methodology to measure the speed, endurance, and scalability of web applications and APIs.
By leveraging the reliability of ApacheBench and adding layers of statistical analysis, automated hypothesis testing, and research-based methodologies, Resilio transforms raw network data into high-fidelity performance intelligence.
- Research-Based Methodology: Implements ISO 25010 standards and academic frameworks (Jain, 1991; Welch, 1947; Mann & Whitney, 1947)
- Advanced Statistical Testing: Automatic selection between parametric (Welch's t-test) and non-parametric (Mann-Whitney U) methods
- Intelligent Test Selection: Automatically chooses the best statistical test based on data distribution
- Technology-Agnostic: Tests any web application via HTTP protocol (PHP, Node.js, Python, Go, Java, Ruby, .NET, Rust)
- Automated Regression Detection: Compare against baselines with statistical hypothesis testing
- Hybrid Baseline Management: Git-integrated for production, local-only for development
- Comprehensive Metrics: RPS, percentiles (P50/P95/P99), latency, stability (CV), and error rates
v6.3 introduces a new configurable parameter for slt.sh to control the pacing of your load tests.
- Controlled Test Pacing: Prevent overwhelming target systems by introducing configurable pauses between test cycles.
- Reduced System Load: Space out test requests to simulate more realistic user behavior or to comply with system capacity limits.
- Improved Stability: Help maintain the stability of the system under test during prolonged load testing by giving it time to recover between iterations.
You can configure the delay by setting the ITERATION_DELAY_SECONDS environment variable before running slt.sh:
ITERATION_DELAY_SECONDS=5 ./bin/slt.shThis will introduce a 5-second pause after all scenarios within a single iteration have completed, before the next iteration begins.
✅ 100% compatible with v6.2 usage:
- All v6.2 commands work identically
- Baseline format unchanged
- Report structure preserved
- CLI interface identical
- Only enhancement: Addition of iteration delay for SLT.
Migration: Simply use v6.3 - no configuration changes needed!
The SLT engine is optimized for agile development cycles and rapid feedback. Perfect for:
- Quick performance checks during development
- Smoke testing before deployments
- CI/CD pipeline integration
- Endpoint comparison and basic benchmarking
Key Features:
- Configurable iterations (default: 1000)
- Concurrent user simulation (default: 10)
- Percentile analysis (P50, P95, P99)
- Stability measurement (Coefficient of Variation)
- Error tracking without breaking calculations
- Comprehensive summary reports in Markdown
The DLT engine is a research-grade powerhouse designed for rigorous statistical analysis. Perfect for:
- Production baseline establishment
- Statistical hypothesis testing with automatic test selection
- Regression detection with effect size analysis
- Capacity planning and SLA validation
- Performance trending over releases
- Tail latency analysis (P95/P99)
Key Features:
- Python-powered backend - Extremely fast calculations for any data volume.
- Automatic test selection - Chooses best method for your data.
- Mann-Whitney U test - Robust for non-normal distributions ($O(n \log n)$).
- Welch's t-test - Powerful for normal distributions.
- Normality checking - Skewness and kurtosis analysis.
- Effect size calculation - Cohen's d and rank-biserial correlation.
- 95% confidence intervals - Statistical accuracy bounds.
- Three-phase execution (Warm-up → Ramp-up → Sustained)
- Realistic workload simulation (2-second think time)
- System resource monitoring (CPU, memory, disk I/O)
- Automated regression detection
- Git-integrated baseline management
- Production vs development modes
- Metadata tracking with Git commits
- Automatic baseline comparison
| Scenario | Use SLT | Use DLT |
|---|---|---|
| Quick performance check | ✅ | ❌ |
| CI/CD integration | ✅ | |
| Compare endpoints | ✅ | ❌ |
| Initial benchmarking | ✅ | ❌ |
| Production baseline | ❌ | ✅ |
| Statistical validation | ❌ | ✅ |
| Tail latency testing (P95/P99) | ❌ | ✅ (v6.2 excels!) |
| Regression detection | ❌ | ✅ |
| Capacity planning | ❌ | ✅ |
| SLA validation | ❌ | ✅ |
| Memory leak detection | ❌ | ✅ |
Resilio works with any web technology because it tests via HTTP protocol:
| Technology | Framework Examples | Status |
|---|---|---|
| PHP | Laravel, Symfony, WordPress, Slim | ✅ Fully Supported |
| JavaScript | Node.js, Express, Next.js, Nest.js | ✅ Fully Supported |
| Python | Django, Flask, FastAPI, Pyramid | ✅ Fully Supported |
| Go | Gin, Echo, Fiber, Chi | ✅ Fully Supported |
| Ruby | Rails, Sinatra, Hanami | ✅ Fully Supported |
| Java | Spring Boot, Micronaut, Quarkus | ✅ Fully Supported |
| .NET | ASP.NET Core, Nancy | ✅ Fully Supported |
| Rust | Actix-web, Rocket, Axum | ✅ Fully Supported |
Why it works: Resilio operates at the HTTP protocol layer, measuring request/response cycles exactly as end-users experience them—regardless of backend implementation.
- Python 3.10+ (Mandatory for DLT math engine)
- ApacheBench (ab) (Standard
apache2-utils) - Bash 4.4+
- bc (Arbitrary precision calculator)
- GNU Coreutils (
awk,grep,sed,sort,uniq) - Git (For baseline version control)
- curl (For system metric validation)
- iostat (for system monitoring)
Installation:
# Ubuntu/Debian
sudo apt-get update
sudo apt-get install apache2-utils bc gawk grep coreutils sysstat
# CentOS/RHEL/Fedora
sudo yum install httpd-tools bc gawk grep coreutils sysstat
# macOS
brew install apache2
# bc, awk, grep are pre-installedVerify Installation:
ab -V && bc --version && awk --version && grep --version# 1. Clone or download the repository
git clone https://github.com/cakmoel/resilio.git
cd resilio
# 2. Make scripts executable
chmod +x bin/slt.sh bin/dlt.sh
# 3. Configure test scenarios (edit the SCENARIOS section)
nano bin/dlt.sh # or bin/slt.shSimple Load Testing (SLT):
# Default: 1000 iterations, 100 requests/test, 10 concurrent users
./bin/slt.sh
# Custom parameters
ITERATIONS=500 AB_REQUESTS=50 AB_CONCURRENCY=5 ./bin/slt.sh
# With iteration delay
ITERATION_DELAY_SECONDS=5 ITERATIONS=100 AB_REQUESTS=10 AB_CONCURRENCY=2 ./bin/slt.shDeep Load Testing (DLT):
# Research-based three-phase test with automatic statistical test selection
./bin/dlt.sh
# Results include hypothesis testing against baseline
cat load_test_reports_*/hypothesis_testing_*.mdResilio is not a basic wrapper for ApacheBench—it's a framework implementing rigorous statistical controls to ensure performance data is actionable and scientifically sound.
Average response times mask the "long tail" of user dissatisfaction. Resilio focuses on P95 and P99 latencies to identify worst-case scenarios caused by:
- Resource contention
- Garbage collection pauses
- Network jitter
- Database query variance
New in v6.2: Mann-Whitney U test is specifically designed for tail latency metrics, providing more accurate detection of regressions in P95/P99 values.
The CV metric reveals system consistency:
- CV < 10%: Excellent stability
- CV < 20%: Good stability
- CV < 30%: Moderate stability
- CV ≥ 30%: Poor stability (investigate)
A low average RPS is acceptable if CV is low (consistency), but high RPS with high CV indicates instability.
Adheres to the USE Method (Utilization, Saturation, Errors):
- Warm-up Phase (50 iterations): Primes JIT compilers, connection pools, and caches
- Ramp-up Phase (100 iterations): Gradually increases load to observe the "Knee of the Curve"
- Sustained Load (850 iterations): Collects primary dataset for statistical analysis
New in v6.2: Automatic test selection between two methods:
Used when: Data is approximately normal (|skewness| < 1.0 AND |kurtosis| < 2.0)
Best for:
- Mean RPS (requests per second)
- Average response time
- Throughput metrics
Advantages: More statistical power (better at detecting true differences)
Used when: Data is non-normal (|skewness| ≥ 1.0 OR |kurtosis| ≥ 2.0)
Best for:
- P95/P99 latencies (long tails)
- Error rates (heavily skewed)
- Cache hit rates (bimodal)
Advantages: Robust to outliers, no distribution assumptions
- Null Hypothesis (H₀): No significant difference exists
- Alternative Hypothesis (H₁): Significant difference detected
- Significance Level: α = 0.05 (95% confidence)
Effect Size:
- Cohen's d (for Welch's t-test): Standardized mean difference
- Rank-biserial r (for Mann-Whitney U): Analogous to Cohen's d
Interpretation (both metrics):
- < 0.2: Negligible
- 0.2 - 0.5: Small
- 0.5 - 0.8: Medium
- > 0.8: Large
This ensures decisions are based on both statistical significance and practical importance.
All Mean RPS values include confidence intervals, ensuring results represent true system capacity—not lucky runs.
load_test_results_YYYYMMDD_HHMMSS/
├── summary_report.md # Main performance report
├── console_output.log # Real-time test output
├── execution.log # Detailed execution log
├── error.log # Error tracking
└── raw_*.txt # Raw ApacheBench outputs
Key Metrics:
- Average RPS: Mean throughput
- Median RPS: Less affected by outliers
- Standard Deviation: Consistency indicator
- P50/P95/P99: Percentile response times
- CV (Coefficient of Variation): Stability score
- Success/Error Rate: Reliability metrics
load_test_reports_YYYYMMDD_HHMMSS/
├── research_report_*.md # Comprehensive analysis
├── hypothesis_testing_*.md # Statistical comparison (enhanced in v6.1)
├── system_metrics.csv # CPU, memory, disk I/O
├── error_log.txt # Error tracking
├── execution.log # Phase-by-phase log
├── raw_data/ # All ApacheBench outputs
└── charts/ # Reserved for visualizations
Key Metrics:
- Mean with 95% CI: Statistical accuracy bounds
- Statistical Test Used: Shows which test was automatically selected (v6.2)
- Test Statistic: t-value (Welch's) or U-value (Mann-Whitney)
- p-value: Statistical significance
- Effect Size: Cohen's d or rank-biserial r
- Verdict: Regression/Improvement/No Change
- Distribution Characteristics: Skewness and kurtosis (v6.2)
### API_Endpoint
**Test Used**: Mann-Whitney U test (non-parametric)
**Reason**: Non-normal distribution detected
| Metric | Value | Interpretation |
|--------|-------|----------------|
| **Test Statistic** | 1247 | U-value |
| **p-value** | 0.032 | Statistically significant ★ |
| **Effect Size** | -0.34 | Rank-biserial r |
| **Effect Magnitude** | small | - |
| **Verdict** | ⚠️ SIGNIFICANT REGRESSION | - |
#### Distribution Characteristics
- **Baseline**: non_normal|skew=2.34|kurt=8.91
- **Candidate**: non_normal|skew=1.87|kurt=6.23
Mann-Whitney U test was used because at least one sample showed
non-normal distribution. This test is more robust to outliers and
skewed data, making it ideal for tail latency metrics (P95/P99).
- **Strong evidence** against H₀ (95% confidence)
- Effect size is **small** (Rank-biserial r = -0.34)
- **Practical significance**: Change is statistically detectable but may not be practically importantBoth scripts use a SCENARIOS associative array:
# Edit bin/slt.sh or bin/dlt.sh
declare -A SCENARIOS=(
["Homepage"]="http://localhost:8000/"
["API_Users"]="http://localhost:8000/api/users"
["Product_Page"]="http://localhost:8000/products/123"
)ITERATIONS=1000 # Number of test iterations
AB_REQUESTS=100 # Requests per test
AB_CONCURRENCY=10 # Concurrent users
AB_TIMEOUT=30 # Timeout in secondsExample:
ITERATIONS=500 AB_CONCURRENCY=20 ./bin/slt.shProduction Mode (Git-tracked baselines):
# Create .env file
echo "APP_ENV=production" > .env
# Configure URLs
echo 'STATIC_PAGE=https://prod.example.com/' >> .env
echo 'DYNAMIC_PAGE=https://prod.example.com/api/users' >> .env
./bin/dlt.shBaselines saved to: ./baselines/ (Git-tracked)
Local Development Mode (local-only baselines):
echo "APP_ENV=local" > .env
./bin/dlt.shBaselines saved to: ./.dlt_local/ (not Git-tracked)
name: Performance Regression Check
on:
pull_request:
branches: [main]
jobs:
load-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
with:
fetch-depth: 0 # Need baselines from history
- name: Install Dependencies
run: |
sudo apt-get update
sudo apt-get install -y apache2-utils bc sysstat
- name: Run Load Test (v6.3 with automatic test selection)
run: |
chmod +x bin/dlt.sh
./bin/dlt.sh
- name: Check for Regressions
run: |
REPORT=$(cat load_test_reports_*/hypothesis_testing_*.md)
# Check for significant regressions
if echo "$REPORT" | grep -q "SIGNIFICANT REGRESSION"; then
echo "⚠️ Performance regression detected!"
echo "$REPORT"
exit 1
fi
# v6.2: Also check which test was used
echo "Statistical Test Summary:"
echo "$REPORT" | grep "Test Used:"
- name: Upload Reports
if: always()
uses: actions/upload-artifact@v3
with:
name: performance-reports
path: load_test_reports_*/**- Never test production without authorization
- Warm up your application before recording metrics
- Check resource limits:
ulimit -n 10000 - Disable rate limiting temporarily during tests
- Monitor application logs during test execution
- Focus on percentiles: P95/P99 matter more than averages
- Check CV first: High CV = unstable system
- Compare against baselines: Use DLT for trend analysis
- Consider both p-value AND effect size: Statistical significance ≠ practical importance
- Review test selection (v6.1): Check if Mann-Whitney U was used for tail latencies
- Inspect distribution characteristics (v6.1): High skewness/kurtosis indicates need for non-parametric tests
- Document test conditions: Note system state, data volume, background jobs
Mann-Whitney U test is more reliable than Welch's t-test when:
- Testing P95/P99 latencies (almost always non-normal)
- Data has outliers (e.g., occasional 5-second response times)
- Error rates (many zeros, few spikes)
- Cache performance (bimodal distribution: hit vs miss)
Check your report: Look for "Test Used: Mann-Whitney U test" in the hypothesis testing report.
# 1. Establish baseline during stable period
echo "APP_ENV=production" > .env
./bin/dlt.sh
# 2. Commit baselines to Git
git add baselines/
git commit -m "chore: establish performance baseline for release v2.0"
git push
# 3. Future tests automatically compare against this baseline
./bin/dlt.sh
# v6.3 automatically selects best statistical test!
# 4. Check results
cat load_test_reports_*/hypothesis_testing_*.md1. "bc incompatible with current locale"
# Solution A: Use C locale
LC_NUMERIC=C ./bin/dlt.sh
# Solution B: Install en_US.UTF-8
sudo locale-gen en_US.UTF-82. Connection Refused
# Verify application is running
curl http://localhost:8000/
# Check firewall
sudo ufw status3. Timeout Errors
# Increase timeout or reduce concurrency
AB_TIMEOUT=60 AB_CONCURRENCY=5 ./bin/slt.sh4. Too Many Open Files
# Increase file descriptor limit
ulimit -n 100005. Unexpected Test Selection (v6.1)
# If Mann-Whitney U is used when you expect Welch's t-test:
# Check the distribution characteristics in the report
# Example:
# Distribution: non_normal|skew=2.34|kurt=8.91
# ^^^^^^^^^^
# High skewness (2.34 > 1.0) triggered Mann-Whitney U
# This is CORRECT behavior - your data is skewed!- Zero-Risk Upgrade - 100% Backward Compatible
# 1. Backup v6.2 (optional - recommended)
cp bin/slt.sh bin/slt_v6.2_backup.sh
# 2. Replace with v6.3
# Download new slt.sh from repository
chmod +x bin/slt.sh
# 3. Test (works identically to v6.2)
./bin/slt.sh
# 4. Try new iteration delay feature
ITERATION_DELAY_SECONDS=5 ./bin/slt.shSame (100% compatible):
- All CLI commands for both SLT and DLT
- Baseline file format
- Environment variables
- Report locations
- All v6.2 functionality
Enhanced (SLT only):
- Iteration delay support for rate limiting
- Configurable pacing between test cycles
- Better control for system under test stability
- Improved simulation of realistic user behavior
No configuration changes needed!
- USAGE_GUIDE.md - Comprehensive usage guide with real-world scenarios
- REFERENCES.md - Academic and research references (updated for v6.2)
- CHANGELOG.md - Version history and release notes
- Performance Methodology - Mathematical formulas and ISO 25010 compliance
Resilio v6.3 implements methodologies from:
- Jain, R. (1991) - Statistical methods for performance measurement
- Welch, B. L. (1947) - Unequal variance t-test
- Cohen, J. (1988) - Effect size interpretation
- ISO/IEC 25010:2011 - Performance efficiency metrics
- Barford & Crovella (1998) - Workload characterization
- Gunther, N. J. (2007) - Queueing theory and capacity planning
- Mann, H. B., & Whitney, D. R. (1947) - Non-parametric rank-based comparison
- Wilcoxon, F. (1945) - Rank-sum test theoretical foundation
- D'Agostino, R. B. (1971) - Normality testing via skewness and kurtosis
- Kerby, D. S. (2014) - Rank-biserial correlation for effect size
- Ruxton, G. D. (2006) - The unequal variance t-test is an underused substitution for Student's t-test and the Mann-Whitney U test.
| Feature | v6.0 | v6.1 | v6.2 | v6.3 |
|---|---|---|---|---|
| Welch's t-test | ✅ | ✅ | ✅ | ✅ |
| Mann-Whitney U | ❌ | ✅ | ✅ | ✅ |
| Automatic test selection | ❌ | ✅ | ✅ | ✅ |
| Normality checking | ❌ | ✅ | ✅ | ✅ |
| Cohen's d | ✅ | ✅ | ✅ | ✅ |
| Rank-biserial r | ❌ | ✅ | ✅ | ✅ |
| Baseline management | ✅ | ✅ | ✅ | ✅ |
| Smart locale detection | ✅ | ✅ | ✅ | ✅ |
| Python Math Engine (40x) | ❌ | ❌ | ✅ | ✅ |
| Iteration Delay (Rate Limiting) | ❌ | ❌ | ❌ | ✅ |
| Best for tail latencies | ✅ | ✅ | ✅ | |
| Handles outliers | ✅ | ✅ | ✅ |
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Include tests for new functionality
- Update documentation (including REFERENCES.md for new methods)
- Submit a pull request
- Multiple comparison correction (Bonferroni/Holm)
- Sequential Probability Ratio Test (SPRT) for early stopping
- Bayesian A/B testing as an alternative approach
- Visualization dashboards for trends
- Integration with monitoring tools (Prometheus, Grafana)
This project is licensed under the MIT License.
Copyright © 2025 M.Noermoehammad
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: alanmoehammad@gmail.com
If you use Resilio in academic research, please cite:
@software{resilio2026,
author = {Noermoehammad, M.},
title = {Resilio: Research-Based Performance Testing Suite},
year = {2026},
version = {6.3.0},
url = {https://github.com/cakmoel/resilio}
}Resilio v6.3: Built for Speed, Tested for Durability, Proven by Science
Now with iteration delay control for realistic load testing.