Performance Benchmarks

Measure and understand the performance characteristics of the filter rule compilers, including parallel chunking speedups and cross-language comparisons.

Overview

The repository includes comprehensive benchmarking tools to help understand performance across different compilers and optimize compilation workflows. All compilers (TypeScript, .NET, Python, Rust) support parallel chunking for improved performance with large filter lists.

Benchmarking Tools

🚀 Quick Synthetic Benchmark

File: benchmarks/quick_benchmark.py

Fast simulation showing expected speedups without requiring full compilation setup. Demonstrates:

How rules are split into chunks
Simulated parallel processing time
Expected speedup ratios

📊 Full Benchmark Suite

Files: benchmarks/run_benchmarks.py, generate_synthetic_data.py

Complete benchmarking across all compilers with real compilation. Compares sequential vs chunked/parallel performance using synthetic test data.

Running Benchmarks

Quick Synthetic Benchmark

Run a quick simulation to see expected speedups:

cd benchmarks

# Run comparison suite (recommended)
python quick_benchmark.py --suite

# Run parallel scaling test
python quick_benchmark.py --scaling

# Custom benchmark
python quick_benchmark.py --rules 500000 --parallel 8

# Interactive mode
python quick_benchmark.py --interactive

Full Benchmark with Real Compilation

Generate synthetic test data and run actual compilation benchmarks:

cd benchmarks

# Generate test data (small, medium, large, xlarge filter lists)
python generate_synthetic_data.py --all

# Run benchmarks across all compilers
python run_benchmarks.py

# Run specific compiler only
python run_benchmarks.py --compiler python --iterations 5

# Run specific size only
python run_benchmarks.py --size large

Expected Performance

Performance varies by hardware, I/O speed, and network latency, but here are typical results from synthetic benchmarks:

Rule Count	Sequential	4 Workers	8 Workers	Speedup (8w)
10,000	~150ms	~60ms	~40ms	3.75x
50,000	~600ms	~200ms	~120ms	5.0x
200,000	~2.5s	~800ms	~400ms	6.25x
500,000	~6s	~1.8s	~900ms	6.67x

Parallel Scaling

Speedup scales with CPU cores but with diminishing returns due to overhead, merge time, and I/O contention:

Workers	Theoretical Max	Typical Efficiency
2	2.0x	90-100%
4	4.0x	85-95%
8	8.0x	75-90%
16	16.0x	60-80%

Efficiency decreases due to process startup overhead, merge/deduplication time, memory bandwidth limits, and I/O contention.

When to Use Chunking

Parallel chunking provides the most benefit for large filter lists with multiple sources:

✅ Enable Chunking

6+ filter sources
Large combined filter lists (100K+ rules)
Multi-core systems (4+ cores)
Build/CI pipelines

❌ Disable Chunking

1-5 filter sources
Small filter lists (<50K rules)
Memory-constrained systems
Network-bound scenarios (slow downloads)

Example Output

Here's what you might see from the quick benchmark suite:

CHUNKING PERFORMANCE COMPARISON SUITE
======================================================================
CPU cores available: 8
Max parallel workers: 8

Size            Sequential      Parallel        Speedup      Efficiency
----------------------------------------------------------------------
10K rules       150 ms          70 ms           2.14x        27%
50K rules       570 ms          130 ms          4.38x        55%
200K rules      2,350 ms        350 ms          6.71x        84%
500K rules      5,400 ms        800 ms          6.75x        84%
----------------------------------------------------------------------

Average speedup: 5.00x
Maximum speedup: 6.75x

💡 Tip

Run benchmarks on actual hardware to get accurate performance data for specific use cases. Results vary based on CPU cores, memory, I/O speed, and network latency.

🔒 AdGuard Tools and Utilities