Performance Benchmarks
Measure and understand the performance characteristics of the filter rule compilers, including parallel chunking speedups and cross-language comparisons.
Overview
The repository includes comprehensive benchmarking tools to help understand performance across different compilers and optimize compilation workflows. All compilers (TypeScript, .NET, Python, Rust) support parallel chunking for improved performance with large filter lists.
Benchmarking Tools
🚀 Quick Synthetic Benchmark
File: benchmarks/quick_benchmark.py
Fast simulation showing expected speedups without requiring full compilation setup. Demonstrates:
- How rules are split into chunks
- Simulated parallel processing time
- Expected speedup ratios
📊 Full Benchmark Suite
Files: benchmarks/run_benchmarks.py, generate_synthetic_data.py
Complete benchmarking across all compilers with real compilation. Compares sequential vs chunked/parallel performance using synthetic test data.
Running Benchmarks
Quick Synthetic Benchmark
Run a quick simulation to see expected speedups:
cd benchmarks
# Run comparison suite (recommended)
python quick_benchmark.py --suite
# Run parallel scaling test
python quick_benchmark.py --scaling
# Custom benchmark
python quick_benchmark.py --rules 500000 --parallel 8
# Interactive mode
python quick_benchmark.py --interactive
Full Benchmark with Real Compilation
Generate synthetic test data and run actual compilation benchmarks:
cd benchmarks
# Generate test data (small, medium, large, xlarge filter lists)
python generate_synthetic_data.py --all
# Run benchmarks across all compilers
python run_benchmarks.py
# Run specific compiler only
python run_benchmarks.py --compiler python --iterations 5
# Run specific size only
python run_benchmarks.py --size large
Expected Performance
Performance varies by hardware, I/O speed, and network latency, but here are typical results from synthetic benchmarks:
| Rule Count | Sequential | 4 Workers | 8 Workers | Speedup (8w) |
|---|---|---|---|---|
| 10,000 | ~150ms | ~60ms | ~40ms | 3.75x |
| 50,000 | ~600ms | ~200ms | ~120ms | 5.0x |
| 200,000 | ~2.5s | ~800ms | ~400ms | 6.25x |
| 500,000 | ~6s | ~1.8s | ~900ms | 6.67x |
Parallel Scaling
Speedup scales with CPU cores but with diminishing returns due to overhead, merge time, and I/O contention:
| Workers | Theoretical Max | Typical Efficiency |
|---|---|---|
| 2 | 2.0x | 90-100% |
| 4 | 4.0x | 85-95% |
| 8 | 8.0x | 75-90% |
| 16 | 16.0x | 60-80% |
Efficiency decreases due to process startup overhead, merge/deduplication time, memory bandwidth limits, and I/O contention.
When to Use Chunking
Parallel chunking provides the most benefit for large filter lists with multiple sources:
✅ Enable Chunking
- 6+ filter sources
- Large combined filter lists (100K+ rules)
- Multi-core systems (4+ cores)
- Build/CI pipelines
❌ Disable Chunking
- 1-5 filter sources
- Small filter lists (<50K rules)
- Memory-constrained systems
- Network-bound scenarios (slow downloads)
Example Output
Here's what you might see from the quick benchmark suite:
CHUNKING PERFORMANCE COMPARISON SUITE ====================================================================== CPU cores available: 8 Max parallel workers: 8 Size Sequential Parallel Speedup Efficiency ---------------------------------------------------------------------- 10K rules 150 ms 70 ms 2.14x 27% 50K rules 570 ms 130 ms 4.38x 55% 200K rules 2,350 ms 350 ms 6.71x 84% 500K rules 5,400 ms 800 ms 6.75x 84% ---------------------------------------------------------------------- Average speedup: 5.00x Maximum speedup: 6.75x
Learn More
Chunking Guide
Complete documentation on parallel chunking including configuration, API reference, and best practices.
Compiler Comparison
Compare the different compiler implementations and choose the best one based on specific needs.
View Benchmark Code
Explore the benchmark scripts on GitHub to understand the implementation details.
💡 Tip
Run benchmarks on actual hardware to get accurate performance data for specific use cases. Results vary based on CPU cores, memory, I/O speed, and network latency.