Benchmark Methodology

This page describes how walrust benchmarks are conducted, ensuring transparency and reproducibility.

Environment

Local Development

Benchmarks can be run locally using MinIO as an S3-compatible storage backend:

make bench-minio   # Start MinIO container
make bench-all     # Run full suite

This ensures consistent results regardless of network conditions.

CI Environment

GitHub Actions runs benchmarks on each release with:

Runner: ubuntu-latest (2-core CPU, 7 GB RAM)
Storage: MinIO service container
Litestream: v0.3.13 for comparison

Benchmark Categories

Memory Comparison (`bench/compare.py`)

Measures RSS memory usage of walrust vs litestream when watching N databases:

Create N SQLite databases with sample data
Start walrust/litestream watching all databases
Wait for initial sync to complete
Measure RSS memory via psutil
(Optional) Generate write load and measure “active” memory

Parameters:

Database counts: 1, 10, 100 (configurable)
Sample size: 100 rows per database
Measurement delay: 5 seconds after startup

Multi-Database Performance (`bench/multidb.py`)

Startup Time

Time from process start to ready state:

Create N pre-synced databases
Start walrust, measure time until watching
Repeat 3 times, report mean

Change Detection Latency

Time from SQLite commit to walrust detection:

Start walrust watching N databases
Insert row, record timestamp
Monitor walrust output for detection
Calculate latency, collect 100 samples per database count
Report p50, p95, p99 percentiles

CPU Scaling

CPU usage under load as database count increases:

Start walrust watching N databases
Generate concurrent write load
Sample CPU usage over 10 seconds
Report average CPU percentage

Real-World Benchmarks (`bench/realworld.py`)

Sync Latency

End-to-end time from write to S3:

Insert row into database
Wait for S3 object to appear
Measure total latency
Collect 50 samples, report percentiles

Restore Performance

Time to restore database from S3:

Sync database with sample data
Delete local database
Run walrust restore
Measure total time

LTX Throughput

Speed of LTX file generation:

Generate WAL frames programmatically
Convert to LTX format
Measure throughput in MB/s

Statistical Methods

Percentile Calculation

We use linear interpolation for percentiles (same as NumPy):

def percentile(data, p):
    sorted_data = sorted(data)
    k = (len(sorted_data) - 1) * p / 100
    f = int(k)
    c = f + 1
    if c >= len(sorted_data):
        return sorted_data[-1]
    return sorted_data[f] + (k - f) * (sorted_data[c] - sorted_data[f])

Outlier Handling

No outliers are removed from latency measurements
Results report both mean and percentiles for transparency
Standard deviation is included where applicable

Reproducibility

Running Locally

# Clone and build
git clone https://github.com/russellromney/walrust.git
cd walrust
cargo build --release

# Run with MinIO
make bench-minio
make bench-all

Verification

Compare your results with CI results:

CI results are uploaded as artifacts on each release
JSON output allows programmatic comparison

Known Limitations

CI variability: GitHub Actions runners have variable performance
Memory measurement: RSS includes shared libraries
Network latency: MinIO eliminates S3 latency, real S3 will be slower
Warm cache: Multiple runs may benefit from OS file cache

Output Format

All benchmarks output JSON with this schema:

{
  "version": "1.0",
  "environment": {
    "platform": "linux",
    "python_version": "3.12.0",
    "walrust_version": "0.3.0",
    "litestream_version": "0.3.13",
    "storage_backend": "minio",
    "timestamp": "2024-01-15T12:00:00Z"
  },
  "benchmarks": {
    "memory_scaling": [...],
    "startup_time": [...],
    ...
  }
}