Benchmark Methodology
This page describes how walrust benchmarks are conducted, ensuring transparency and reproducibility.
Environment
Section titled “Environment”Local Development
Section titled “Local Development”Benchmarks can be run locally using MinIO as an S3-compatible storage backend:
make bench-minio # Start MinIO containermake bench-all # Run full suiteThis ensures consistent results regardless of network conditions.
CI Environment
Section titled “CI Environment”GitHub Actions runs benchmarks on each release with:
- Runner:
ubuntu-latest(2-core CPU, 7 GB RAM) - Storage: MinIO service container
- Litestream: v0.3.13 for comparison
Benchmark Categories
Section titled “Benchmark Categories”Memory Comparison (bench/compare.py)
Section titled “Memory Comparison (bench/compare.py)”Measures RSS memory usage of walrust vs litestream when watching N databases:
- Create N SQLite databases with sample data
- Start walrust/litestream watching all databases
- Wait for initial sync to complete
- Measure RSS memory via
psutil - (Optional) Generate write load and measure “active” memory
Parameters:
- Database counts: 1, 10, 100 (configurable)
- Sample size: 100 rows per database
- Measurement delay: 5 seconds after startup
Multi-Database Performance (bench/multidb.py)
Section titled “Multi-Database Performance (bench/multidb.py)”Startup Time
Section titled “Startup Time”Time from process start to ready state:
- Create N pre-synced databases
- Start walrust, measure time until watching
- Repeat 3 times, report mean
Change Detection Latency
Section titled “Change Detection Latency”Time from SQLite commit to walrust detection:
- Start walrust watching N databases
- Insert row, record timestamp
- Monitor walrust output for detection
- Calculate latency, collect 100 samples per database count
- Report p50, p95, p99 percentiles
CPU Scaling
Section titled “CPU Scaling”CPU usage under load as database count increases:
- Start walrust watching N databases
- Generate concurrent write load
- Sample CPU usage over 10 seconds
- Report average CPU percentage
Real-World Benchmarks (bench/realworld.py)
Section titled “Real-World Benchmarks (bench/realworld.py)”Sync Latency
Section titled “Sync Latency”End-to-end time from write to S3:
- Insert row into database
- Wait for S3 object to appear
- Measure total latency
- Collect 50 samples, report percentiles
Restore Performance
Section titled “Restore Performance”Time to restore database from S3:
- Sync database with sample data
- Delete local database
- Run
walrust restore - Measure total time
LTX Throughput
Section titled “LTX Throughput”Speed of LTX file generation:
- Generate WAL frames programmatically
- Convert to LTX format
- Measure throughput in MB/s
Statistical Methods
Section titled “Statistical Methods”Percentile Calculation
Section titled “Percentile Calculation”We use linear interpolation for percentiles (same as NumPy):
def percentile(data, p): sorted_data = sorted(data) k = (len(sorted_data) - 1) * p / 100 f = int(k) c = f + 1 if c >= len(sorted_data): return sorted_data[-1] return sorted_data[f] + (k - f) * (sorted_data[c] - sorted_data[f])Outlier Handling
Section titled “Outlier Handling”- No outliers are removed from latency measurements
- Results report both mean and percentiles for transparency
- Standard deviation is included where applicable
Reproducibility
Section titled “Reproducibility”Running Locally
Section titled “Running Locally”# Clone and buildgit clone https://github.com/russellromney/walrust.gitcd walrustcargo build --release
# Run with MinIOmake bench-miniomake bench-allVerification
Section titled “Verification”Compare your results with CI results:
- CI results are uploaded as artifacts on each release
- JSON output allows programmatic comparison
Known Limitations
Section titled “Known Limitations”- CI variability: GitHub Actions runners have variable performance
- Memory measurement: RSS includes shared libraries
- Network latency: MinIO eliminates S3 latency, real S3 will be slower
- Warm cache: Multiple runs may benefit from OS file cache
Output Format
Section titled “Output Format”All benchmarks output JSON with this schema:
{ "version": "1.0", "environment": { "platform": "linux", "python_version": "3.12.0", "walrust_version": "0.3.0", "litestream_version": "0.3.13", "storage_backend": "minio", "timestamp": "2024-01-15T12:00:00Z" }, "benchmarks": { "memory_scaling": [...], "startup_time": [...], ... }}