Performance Benchmarks

Measure and understand the performance characteristics of the filter rule compilers, including parallel chunking speedups and cross-language comparisons.

Overview

The repository includes comprehensive benchmarking tools to help understand performance across different compilers and optimize compilation workflows. All compilers (TypeScript, .NET, Python, Rust) support parallel chunking for improved performance with large filter lists.

Benchmarking Tools

🚀 Quick Synthetic Benchmark

File: benchmarks/quick_benchmark.py

Fast simulation showing expected speedups without requiring full compilation setup. Demonstrates:

  • How rules are split into chunks
  • Simulated parallel processing time
  • Expected speedup ratios

📊 Full Benchmark Suite

Files: benchmarks/run_benchmarks.py, generate_synthetic_data.py

Complete benchmarking across all compilers with real compilation. Compares sequential vs chunked/parallel performance using synthetic test data.

Running Benchmarks

Quick Synthetic Benchmark

Run a quick simulation to see expected speedups:

cd benchmarks

# Run comparison suite (recommended)
python quick_benchmark.py --suite

# Run parallel scaling test
python quick_benchmark.py --scaling

# Custom benchmark
python quick_benchmark.py --rules 500000 --parallel 8

# Interactive mode
python quick_benchmark.py --interactive

Full Benchmark with Real Compilation

Generate synthetic test data and run actual compilation benchmarks:

cd benchmarks

# Generate test data (small, medium, large, xlarge filter lists)
python generate_synthetic_data.py --all

# Run benchmarks across all compilers
python run_benchmarks.py

# Run specific compiler only
python run_benchmarks.py --compiler python --iterations 5

# Run specific size only
python run_benchmarks.py --size large

Expected Performance

Performance varies by hardware, I/O speed, and network latency, but here are typical results from synthetic benchmarks:

Rule CountSequential4 Workers8 WorkersSpeedup (8w)
10,000~150ms~60ms~40ms3.75x
50,000~600ms~200ms~120ms5.0x
200,000~2.5s~800ms~400ms6.25x
500,000~6s~1.8s~900ms6.67x

Parallel Scaling

Speedup scales with CPU cores but with diminishing returns due to overhead, merge time, and I/O contention:

WorkersTheoretical MaxTypical Efficiency
22.0x90-100%
44.0x85-95%
88.0x75-90%
1616.0x60-80%

Efficiency decreases due to process startup overhead, merge/deduplication time, memory bandwidth limits, and I/O contention.

When to Use Chunking

Parallel chunking provides the most benefit for large filter lists with multiple sources:

✅ Enable Chunking

  • 6+ filter sources
  • Large combined filter lists (100K+ rules)
  • Multi-core systems (4+ cores)
  • Build/CI pipelines

❌ Disable Chunking

  • 1-5 filter sources
  • Small filter lists (<50K rules)
  • Memory-constrained systems
  • Network-bound scenarios (slow downloads)

Example Output

Here's what you might see from the quick benchmark suite:

CHUNKING PERFORMANCE COMPARISON SUITE
======================================================================
CPU cores available: 8
Max parallel workers: 8

Size            Sequential      Parallel        Speedup      Efficiency
----------------------------------------------------------------------
10K rules       150 ms          70 ms           2.14x        27%
50K rules       570 ms          130 ms          4.38x        55%
200K rules      2,350 ms        350 ms          6.71x        84%
500K rules      5,400 ms        800 ms          6.75x        84%
----------------------------------------------------------------------

Average speedup: 5.00x
Maximum speedup: 6.75x

Learn More

Chunking Guide

Complete documentation on parallel chunking including configuration, API reference, and best practices.

Compiler Comparison

Compare the different compiler implementations and choose the best one based on specific needs.

View Benchmark Code

Explore the benchmark scripts on GitHub to understand the implementation details.

💡 Tip

Run benchmarks on actual hardware to get accurate performance data for specific use cases. Results vary based on CPU cores, memory, I/O speed, and network latency.