Resilio

High-Performance Load Testing Suite for Web Durability and Speed

Overview

Resilio is a professional-grade performance engineering toolkit designed for QA Engineers, Developers, and DevOps practitioners. It provides a structured, technology-agnostic methodology to measure the speed, endurance, and scalability of web applications and APIs.

By leveraging the reliability of ApacheBench and adding layers of statistical analysis, automated hypothesis testing, and research-based methodologies, Resilio transforms raw network data into high-fidelity performance intelligence.

Why Resilio?

Research-Based Methodology: Implements ISO 25010 standards and academic frameworks (Jain, 1991; Welch, 1947; Mann & Whitney, 1947)
Advanced Statistical Testing: Automatic selection between parametric (Welch's t-test) and non-parametric (Mann-Whitney U) methods
Intelligent Test Selection: Automatically chooses the best statistical test based on data distribution
Technology-Agnostic: Tests any web application via HTTP protocol (PHP, Node.js, Python, Go, Java, Ruby, .NET, Rust)
Automated Regression Detection: Compare against baselines with statistical hypothesis testing
Hybrid Baseline Management: Git-integrated for production, local-only for development
Comprehensive Metrics: RPS, percentiles (P50/P95/P99), latency, stability (CV), and error rates

🆕 What's New in v6.3

New Feature: Iteration Delay for Rate Limiting

v6.3 introduces a new configurable parameter for slt.sh to control the pacing of your load tests.

Key Benefits:

Controlled Test Pacing: Prevent overwhelming target systems by introducing configurable pauses between test cycles.
Reduced System Load: Space out test requests to simulate more realistic user behavior or to comply with system capacity limits.
Improved Stability: Help maintain the stability of the system under test during prolonged load testing by giving it time to recover between iterations.

How to Use:

You can configure the delay by setting the ITERATION_DELAY_SECONDS environment variable before running slt.sh:

ITERATION_DELAY_SECONDS=5 ./bin/slt.sh

This will introduce a 5-second pause after all scenarios within a single iteration have completed, before the next iteration begins.

Backward Compatibility

✅ 100% compatible with v6.2 usage:

All v6.2 commands work identically
Baseline format unchanged
Report structure preserved
CLI interface identical
Only enhancement: Addition of iteration delay for SLT.

Migration: Simply use v6.3 - no configuration changes needed!

Core Engines

Resilio SLT (Simple Load Testing) - `bin/slt.sh` v2.3 (Suite v6.3)

The SLT engine is optimized for agile development cycles and rapid feedback. Perfect for:

Quick performance checks during development
Smoke testing before deployments
CI/CD pipeline integration
Endpoint comparison and basic benchmarking

Key Features:

Configurable iterations (default: 1000)
Concurrent user simulation (default: 10)
Percentile analysis (P50, P95, P99)
Stability measurement (Coefficient of Variation)
Error tracking without breaking calculations
Comprehensive summary reports in Markdown

Resilio DLT (Deep Load Testing) - `bin/dlt.sh` v6.3

The DLT engine is a research-grade powerhouse designed for rigorous statistical analysis. Perfect for:

Production baseline establishment
Statistical hypothesis testing with automatic test selection
Regression detection with effect size analysis
Capacity planning and SLA validation
Performance trending over releases
Tail latency analysis (P95/P99)

Key Features:

Statistical Testing (v6.2)

Python-powered backend - Extremely fast calculations for any data volume.
Automatic test selection - Chooses best method for your data.
Mann-Whitney U test - Robust for non-normal distributions ($O(n \log n)$).
Welch's t-test - Powerful for normal distributions.
Normality checking - Skewness and kurtosis analysis.
Effect size calculation - Cohen's d and rank-biserial correlation.
95% confidence intervals - Statistical accuracy bounds.

Test Execution

Three-phase execution (Warm-up → Ramp-up → Sustained)
Realistic workload simulation (2-second think time)
System resource monitoring (CPU, memory, disk I/O)
Automated regression detection

Baseline Management

Git-integrated baseline management
Production vs development modes
Metadata tracking with Git commits
Automatic baseline comparison

When to Use Each Engine

Scenario	Use SLT	Use DLT
Quick performance check	✅	❌
CI/CD integration	✅	⚠️ (time-consuming)
Compare endpoints	✅	❌
Initial benchmarking	✅	❌
Production baseline	❌	✅
Statistical validation	❌	✅
Tail latency testing (P95/P99)	❌	✅ (v6.2 excels!)
Regression detection	❌	✅
Capacity planning	❌	✅
SLA validation	❌	✅
Memory leak detection	❌	✅

Technology Compatibility

Resilio works with any web technology because it tests via HTTP protocol:

Technology	Framework Examples	Status
PHP	Laravel, Symfony, WordPress, Slim	✅ Fully Supported
JavaScript	Node.js, Express, Next.js, Nest.js	✅ Fully Supported
Python	Django, Flask, FastAPI, Pyramid	✅ Fully Supported
Go	Gin, Echo, Fiber, Chi	✅ Fully Supported
Ruby	Rails, Sinatra, Hanami	✅ Fully Supported
Java	Spring Boot, Micronaut, Quarkus	✅ Fully Supported
.NET	ASP.NET Core, Nancy	✅ Fully Supported
Rust	Actix-web, Rocket, Axum	✅ Fully Supported

Why it works: Resilio operates at the HTTP protocol layer, measuring request/response cycles exactly as end-users experience them—regardless of backend implementation.

Quick Start

Prerequisites

Python 3.10+ (Mandatory for DLT math engine)
ApacheBench (ab) (Standard apache2-utils)
Bash 4.4+
bc (Arbitrary precision calculator)
GNU Coreutils (awk, grep, sed, sort, uniq)
Git (For baseline version control)
curl (For system metric validation)
iostat (for system monitoring)

Installation:

# Ubuntu/Debian
sudo apt-get update
sudo apt-get install apache2-utils bc gawk grep coreutils sysstat

# CentOS/RHEL/Fedora
sudo yum install httpd-tools bc gawk grep coreutils sysstat

# macOS
brew install apache2
# bc, awk, grep are pre-installed

Verify Installation:

ab -V && bc --version && awk --version && grep --version

Installation

# 1. Clone or download the repository
git clone https://github.com/cakmoel/resilio.git
cd resilio

# 2. Make scripts executable
chmod +x bin/slt.sh bin/dlt.sh

# 3. Configure test scenarios (edit the SCENARIOS section)
nano bin/dlt.sh  # or bin/slt.sh

Basic Usage

Simple Load Testing (SLT):

# Default: 1000 iterations, 100 requests/test, 10 concurrent users
./bin/slt.sh

# Custom parameters
ITERATIONS=500 AB_REQUESTS=50 AB_CONCURRENCY=5 ./bin/slt.sh

# With iteration delay
ITERATION_DELAY_SECONDS=5 ITERATIONS=100 AB_REQUESTS=10 AB_CONCURRENCY=2 ./bin/slt.sh

Deep Load Testing (DLT):

# Research-based three-phase test with automatic statistical test selection
./bin/dlt.sh

# Results include hypothesis testing against baseline
cat load_test_reports_*/hypothesis_testing_*.md

Performance Methodology

Resilio is not a basic wrapper for ApacheBench—it's a framework implementing rigorous statistical controls to ensure performance data is actionable and scientifically sound.

1. Tail Latency Analysis (P95/P99)

Average response times mask the "long tail" of user dissatisfaction. Resilio focuses on P95 and P99 latencies to identify worst-case scenarios caused by:

Resource contention
Garbage collection pauses
Network jitter
Database query variance

New in v6.2: Mann-Whitney U test is specifically designed for tail latency metrics, providing more accurate detection of regressions in P95/P99 values.

2. Stability Measurement (Coefficient of Variation)

The CV metric reveals system consistency:

CV < 10%: Excellent stability
CV < 20%: Good stability
CV < 30%: Moderate stability
CV ≥ 30%: Poor stability (investigate)

A low average RPS is acceptable if CV is low (consistency), but high RPS with high CV indicates instability.

3. Three-Phase Execution (DLT Only)

Adheres to the USE Method (Utilization, Saturation, Errors):

Warm-up Phase (50 iterations): Primes JIT compilers, connection pools, and caches
Ramp-up Phase (100 iterations): Gradually increases load to observe the "Knee of the Curve"
Sustained Load (850 iterations): Collects primary dataset for statistical analysis

4. Statistical Hypothesis Testing (DLT Only)

New in v6.2: Automatic test selection between two methods:

Welch's t-test (Parametric)

Used when: Data is approximately normal (|skewness| < 1.0 AND |kurtosis| < 2.0)

Best for:

Mean RPS (requests per second)
Average response time
Throughput metrics

Advantages: More statistical power (better at detecting true differences)

Mann-Whitney U Test (Non-Parametric) - NEW!

Used when: Data is non-normal (|skewness| ≥ 1.0 OR |kurtosis| ≥ 2.0)

Best for:

P95/P99 latencies (long tails)
Error rates (heavily skewed)
Cache hit rates (bimodal)

Advantages: Robust to outliers, no distribution assumptions

Hypothesis Testing Framework

Null Hypothesis (H₀): No significant difference exists
Alternative Hypothesis (H₁): Significant difference detected
Significance Level: α = 0.05 (95% confidence)

Effect Size:

Cohen's d (for Welch's t-test): Standardized mean difference
Rank-biserial r (for Mann-Whitney U): Analogous to Cohen's d

Interpretation (both metrics):

< 0.2: Negligible
0.2 - 0.5: Small
0.5 - 0.8: Medium
> 0.8: Large

This ensures decisions are based on both statistical significance and practical importance.

5. 95% Confidence Intervals

All Mean RPS values include confidence intervals, ensuring results represent true system capacity—not lucky runs.

Understanding Results

SLT Output Structure

load_test_results_YYYYMMDD_HHMMSS/
├── summary_report.md          # Main performance report
├── console_output.log         # Real-time test output
├── execution.log              # Detailed execution log
├── error.log                  # Error tracking
└── raw_*.txt                  # Raw ApacheBench outputs

Key Metrics:

Average RPS: Mean throughput
Median RPS: Less affected by outliers
Standard Deviation: Consistency indicator
P50/P95/P99: Percentile response times
CV (Coefficient of Variation): Stability score
Success/Error Rate: Reliability metrics

DLT Output Structure

load_test_reports_YYYYMMDD_HHMMSS/
├── research_report_*.md         # Comprehensive analysis
├── hypothesis_testing_*.md      # Statistical comparison (enhanced in v6.1)
├── system_metrics.csv           # CPU, memory, disk I/O
├── error_log.txt                # Error tracking
├── execution.log                # Phase-by-phase log
├── raw_data/                    # All ApacheBench outputs
└── charts/                      # Reserved for visualizations

Key Metrics:

Mean with 95% CI: Statistical accuracy bounds
Statistical Test Used: Shows which test was automatically selected (v6.2)
Test Statistic: t-value (Welch's) or U-value (Mann-Whitney)
p-value: Statistical significance
Effect Size: Cohen's d or rank-biserial r
Verdict: Regression/Improvement/No Change
Distribution Characteristics: Skewness and kurtosis (v6.2)

Example: Enhanced v6.2 Report

### API_Endpoint

**Test Used**: Mann-Whitney U test (non-parametric)
**Reason**: Non-normal distribution detected

| Metric | Value | Interpretation |
|--------|-------|----------------|
| **Test Statistic** | 1247 | U-value |
| **p-value** | 0.032 | Statistically significant ★ |
| **Effect Size** | -0.34 | Rank-biserial r |
| **Effect Magnitude** | small | - |
| **Verdict** | ⚠️ SIGNIFICANT REGRESSION | - |

#### Distribution Characteristics

- **Baseline**: non_normal|skew=2.34|kurt=8.91
- **Candidate**: non_normal|skew=1.87|kurt=6.23

Mann-Whitney U test was used because at least one sample showed 
non-normal distribution. This test is more robust to outliers and 
skewed data, making it ideal for tail latency metrics (P95/P99).

- **Strong evidence** against H₀ (95% confidence)
- Effect size is **small** (Rank-biserial r = -0.34)
- **Practical significance**: Change is statistically detectable but may not be practically important

Configuration

Configuring Test Scenarios

Both scripts use a SCENARIOS associative array:

# Edit bin/slt.sh or bin/dlt.sh
declare -A SCENARIOS=(
    ["Homepage"]="http://localhost:8000/"
    ["API_Users"]="http://localhost:8000/api/users"
    ["Product_Page"]="http://localhost:8000/products/123"
)

Environment Variables (SLT)

ITERATIONS=1000          # Number of test iterations
AB_REQUESTS=100          # Requests per test
AB_CONCURRENCY=10        # Concurrent users
AB_TIMEOUT=30            # Timeout in seconds

Example:

ITERATIONS=500 AB_CONCURRENCY=20 ./bin/slt.sh

Environment Configuration (DLT)

Production Mode (Git-tracked baselines):

# Create .env file
echo "APP_ENV=production" > .env

# Configure URLs
echo 'STATIC_PAGE=https://prod.example.com/' >> .env
echo 'DYNAMIC_PAGE=https://prod.example.com/api/users' >> .env

./bin/dlt.sh

Baselines saved to: ./baselines/ (Git-tracked)

Local Development Mode (local-only baselines):

echo "APP_ENV=local" > .env
./bin/dlt.sh

Baselines saved to: ./.dlt_local/ (not Git-tracked)

CI/CD Integration

GitHub Actions Example

name: Performance Regression Check

on:
  pull_request:
    branches: [main]

jobs:
  load-test:
    runs-on: ubuntu-latest
    
    steps:
      - uses: actions/checkout@v3
        with:
          fetch-depth: 0  # Need baselines from history
      
      - name: Install Dependencies
        run: |
          sudo apt-get update
          sudo apt-get install -y apache2-utils bc sysstat
      
      - name: Run Load Test (v6.3 with automatic test selection)
        run: |
          chmod +x bin/dlt.sh
          ./bin/dlt.sh
      
      - name: Check for Regressions
        run: |
          REPORT=$(cat load_test_reports_*/hypothesis_testing_*.md)
          
          # Check for significant regressions
          if echo "$REPORT" | grep -q "SIGNIFICANT REGRESSION"; then
            echo "⚠️ Performance regression detected!"
            echo "$REPORT"
            exit 1
          fi
          
          # v6.2: Also check which test was used
          echo "Statistical Test Summary:"
          echo "$REPORT" | grep "Test Used:"
      
      - name: Upload Reports
        if: always()
        uses: actions/upload-artifact@v3
        with:
          name: performance-reports
          path: load_test_reports_*/**

Best Practices

Before Testing

Never test production without authorization
Warm up your application before recording metrics
Check resource limits: ulimit -n 10000
Disable rate limiting temporarily during tests
Monitor application logs during test execution

Interpreting Results (Updated for v6.2)

Focus on percentiles: P95/P99 matter more than averages
Check CV first: High CV = unstable system
Compare against baselines: Use DLT for trend analysis
Consider both p-value AND effect size: Statistical significance ≠ practical importance
Review test selection (v6.1): Check if Mann-Whitney U was used for tail latencies
Inspect distribution characteristics (v6.1): High skewness/kurtosis indicates need for non-parametric tests
Document test conditions: Note system state, data volume, background jobs

When to Trust Mann-Whitney U Results (v6.2)

Mann-Whitney U test is more reliable than Welch's t-test when:

Testing P95/P99 latencies (almost always non-normal)
Data has outliers (e.g., occasional 5-second response times)
Error rates (many zeros, few spikes)
Cache performance (bimodal distribution: hit vs miss)

Check your report: Look for "Test Used: Mann-Whitney U test" in the hypothesis testing report.

Production Baseline Management

# 1. Establish baseline during stable period
echo "APP_ENV=production" > .env
./bin/dlt.sh

# 2. Commit baselines to Git
git add baselines/
git commit -m "chore: establish performance baseline for release v2.0"
git push

# 3. Future tests automatically compare against this baseline
./bin/dlt.sh
# v6.3 automatically selects best statistical test!

# 4. Check results
cat load_test_reports_*/hypothesis_testing_*.md

Troubleshooting

Common Issues

1. "bc incompatible with current locale"

# Solution A: Use C locale
LC_NUMERIC=C ./bin/dlt.sh

# Solution B: Install en_US.UTF-8
sudo locale-gen en_US.UTF-8

2. Connection Refused

# Verify application is running
curl http://localhost:8000/

# Check firewall
sudo ufw status

3. Timeout Errors

# Increase timeout or reduce concurrency
AB_TIMEOUT=60 AB_CONCURRENCY=5 ./bin/slt.sh

4. Too Many Open Files

# Increase file descriptor limit
ulimit -n 10000

5. Unexpected Test Selection (v6.1)

# If Mann-Whitney U is used when you expect Welch's t-test:
# Check the distribution characteristics in the report

# Example:
# Distribution: non_normal|skew=2.34|kurt=8.91
#               ^^^^^^^^^^
# High skewness (2.34 > 1.0) triggered Mann-Whitney U

# This is CORRECT behavior - your data is skewed!

Upgrading from v6.2 to v6.3

Migration Guide

Zero-Risk Upgrade - 100% Backward Compatible

# 1. Backup v6.2 (optional - recommended)
cp bin/slt.sh bin/slt_v6.2_backup.sh

# 2. Replace with v6.3
# Download new slt.sh from repository
chmod +x bin/slt.sh

# 3. Test (works identically to v6.2)
./bin/slt.sh

# 4. Try new iteration delay feature
ITERATION_DELAY_SECONDS=5 ./bin/slt.sh

What Changed

Same (100% compatible):

All CLI commands for both SLT and DLT
Baseline file format
Environment variables
Report locations
All v6.2 functionality

Enhanced (SLT only):

Iteration delay support for rate limiting
Configurable pacing between test cycles
Better control for system under test stability
Improved simulation of realistic user behavior

No configuration changes needed!

Documentation

USAGE_GUIDE.md - Comprehensive usage guide with real-world scenarios
REFERENCES.md - Academic and research references (updated for v6.2)
CHANGELOG.md - Version history and release notes
Performance Methodology - Mathematical formulas and ISO 25010 compliance

Research Foundations

Resilio v6.3 implements methodologies from:

Original Foundations (v6.0 & v6.1)

Jain, R. (1991) - Statistical methods for performance measurement
Welch, B. L. (1947) - Unequal variance t-test
Cohen, J. (1988) - Effect size interpretation
ISO/IEC 25010:2011 - Performance efficiency metrics
Barford & Crovella (1998) - Workload characterization
Gunther, N. J. (2007) - Queueing theory and capacity planning
Mann, H. B., & Whitney, D. R. (1947) - Non-parametric rank-based comparison
Wilcoxon, F. (1945) - Rank-sum test theoretical foundation
D'Agostino, R. B. (1971) - Normality testing via skewness and kurtosis
Kerby, D. S. (2014) - Rank-biserial correlation for effect size

New in v6.2

Ruxton, G. D. (2006) - The unequal variance t-test is an underused substitution for Student's t-test and the Mann-Whitney U test.

Version Comparison

Feature	v6.0	v6.1	v6.2	v6.3
Welch's t-test	✅	✅	✅	✅
Mann-Whitney U	❌	✅	✅	✅
Automatic test selection	❌	✅	✅	✅
Normality checking	❌	✅	✅	✅
Cohen's d	✅	✅	✅	✅
Rank-biserial r	❌	✅	✅	✅
Baseline management	✅	✅	✅	✅
Smart locale detection	✅	✅	✅	✅
Python Math Engine (40x)	❌	❌	✅	✅
Iteration Delay (Rate Limiting)	❌	❌	❌	✅
Best for tail latencies	⚠️	✅	✅	✅
Handles outliers	⚠️	✅	✅	✅

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch
Include tests for new functionality
Update documentation (including REFERENCES.md for new methods)
Submit a pull request

Areas for Contribution

Multiple comparison correction (Bonferroni/Holm)
Sequential Probability Ratio Test (SPRT) for early stopping
Bayesian A/B testing as an alternative approach
Visualization dashboards for trends
Integration with monitoring tools (Prometheus, Grafana)

License

This project is licensed under the MIT License.

Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Email: alanmoehammad@gmail.com

Citation

If you use Resilio in academic research, please cite:

@software{resilio2026,
  author = {Noermoehammad, M.},
  title = {Resilio: Research-Based Performance Testing Suite},
  year = {2026},
  version = {6.3.0},
  url = {https://github.com/cakmoel/resilio}
}

Resilio v6.3: Built for Speed, Tested for Durability, Proven by Science

Now with iteration delay control for realistic load testing.

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
.github		.github
bin		bin
config		config
docs		docs
lib		lib
tests		tests
.env		.env
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
benchmark_stats.sh		benchmark_stats.sh
visualize_results.py		visualize_results.py

License

cakmoel/resilio

Folders and files

Latest commit

History

Repository files navigation

Resilio

Overview

Why Resilio?

🆕 What's New in v6.3

New Feature: Iteration Delay for Rate Limiting

Key Benefits:

How to Use:

Backward Compatibility

Core Engines

Resilio SLT (Simple Load Testing) - bin/slt.sh v2.3 (Suite v6.3)

Resilio DLT (Deep Load Testing) - bin/dlt.sh v6.3

Statistical Testing (v6.2)

Test Execution

Baseline Management

When to Use Each Engine

Technology Compatibility

Quick Start

Prerequisites

Installation

Basic Usage

Performance Methodology

1. Tail Latency Analysis (P95/P99)

2. Stability Measurement (Coefficient of Variation)

3. Three-Phase Execution (DLT Only)

4. Statistical Hypothesis Testing (DLT Only)

Welch's t-test (Parametric)

Mann-Whitney U Test (Non-Parametric) - NEW!

Hypothesis Testing Framework

5. 95% Confidence Intervals

Understanding Results

SLT Output Structure

DLT Output Structure

Example: Enhanced v6.2 Report

Configuration

Configuring Test Scenarios

Environment Variables (SLT)

Environment Configuration (DLT)

CI/CD Integration

GitHub Actions Example

Best Practices

Before Testing

Interpreting Results (Updated for v6.2)

When to Trust Mann-Whitney U Results (v6.2)

Production Baseline Management

Troubleshooting

Common Issues

Upgrading from v6.2 to v6.3

Migration Guide

What Changed

Documentation

Research Foundations

Original Foundations (v6.0 & v6.1)

New in v6.2

Version Comparison

Contributing

Areas for Contribution

License

Support

Citation

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 4

Uh oh!

Languages

Resilio SLT (Simple Load Testing) - `bin/slt.sh` v2.3 (Suite v6.3)

Resilio DLT (Deep Load Testing) - `bin/dlt.sh` v6.3