Documentation
Parallel Chunking for Large Filter Lists
This guide explains the parallel chunking feature available in the rules compilers for improved performance when processing large filter lists.
Overview
When compiling filter lists with many sources or millions of rules, the single-threaded nature of @adguard/hostlist-compiler can become a bottleneck. Chunking addresses this by:
- Splitting sources into chunks - Distributes sources across multiple parallel workers
- Compiling chunks in parallel - Uses multiple CPU cores simultaneously
- Merging results - Combines chunk outputs with deduplication
Performance Benefits
| Scenario | Sources | Rules | Sequential Time | Chunked Time (4 cores) | Speedup |
|---|---|---|---|---|---|
| Small | 10 | ~50k | 15s | 12s | 1.25x |
| Medium | 50 | ~250k | 75s | 25s | 3x |
| Large | 200 | ~1M | 300s | 85s | 3.5x |
Times are approximate and depend on source download speed and hardware
Supported Compilers
| Compiler | Chunking Support | Status |
|---|---|---|
| TypeScript | Full | Production |
| .NET | Full | Production |
| Python | Full | Production |
| Rust | Full | Production |
Configuration
TypeScript (Deno)
Configuration File
{
"name": "My Filter List",
"sources": [...],
"chunking": {
"enabled": true,
"chunkSize": 100000,
"maxParallel": 4,
"strategy": "source"
}
}
CLI Flags
deno task compile -- --enable-chunking --chunk-size 100000 --max-parallel 4
.NET
Programmatic Usage
var options = new CompilerOptions
{
ConfigPath = "config.yaml",
Chunking = new ChunkingOptions
{
Enabled = true,
ChunkSize = 100_000,
MaxParallel = Environment.ProcessorCount,
Strategy = ChunkingStrategy.Source
}
};
var result = await compiler.CompileAsync(options);
Using Presets
// For small lists (chunking disabled)
var options = CompilerOptions.Default;
// For large lists (chunking enabled with optimal settings)
var options = CompilerOptions.ForLargeLists;
Dependency Injection
services.AddRulesCompiler();
// The IChunkingService is automatically registered
var chunkingService = serviceProvider.GetRequiredService<IChunkingService>();
Python
Programmatic Usage
from rules_compiler import RulesCompiler
from rules_compiler.chunking import ChunkingOptions, ChunkingStrategy
# Create chunking options
chunking_options = ChunkingOptions(
enabled=True,
chunk_size=100_000,
max_parallel=os.cpu_count() or 4,
strategy=ChunkingStrategy.SOURCE
)
# Use preset for large lists
chunking_options = ChunkingOptions.for_large_lists()
# Compile with chunking
compiler = RulesCompiler(chunking=chunking_options)
result = await compiler.compile_async("config.yaml")
CLI Usage
rules-compiler -c config.yaml --chunking --max-parallel 4
Rust
Programmatic Usage
use rules_compiler::{
ChunkingOptions, ChunkingStrategy, CompilerConfig,
should_enable_chunking, split_into_chunks, compile_chunks_async, merge_chunks
};
// Create chunking options
let options = ChunkingOptions::new()
.with_enabled(true)
.with_chunk_size(100_000)
.with_max_parallel(8)
.with_strategy(ChunkingStrategy::Source);
// Use preset for large lists
let options = ChunkingOptions::for_large_lists();
// Split, compile, and merge
if should_enable_chunking(&config, Some(&options)) {
let chunks = split_into_chunks(&config, &options);
let result = compile_chunks_async(chunks, &options, false).await?;
println!("Speedup: {:.2}x", result.estimated_speedup());
}
CLI Usage
rules-compiler -c config.yaml --chunking --max-parallel 4
Configuration Options
| Option | Type | Default | Description |
|---|---|---|---|
enabled |
boolean | false |
Enable parallel chunking |
chunkSize |
number | 100000 |
Maximum estimated rules per chunk |
maxParallel |
number | CPU cores | Maximum parallel workers |
strategy |
string | "source" |
Chunking strategy |
Chunking Strategies
| Strategy | Description | Best For |
|---|---|---|
source |
Distributes sources evenly across chunks | Most use cases |
line-count |
Balances by estimated line count | (Planned) |
How It Works
Source Strategy
-
Calculate chunks: Sources are distributed evenly
Total sources: 20 Max parallel: 4 → 4 chunks with 5 sources each -
Batch processing: Chunks run in parallel batches
Batch 1: Chunks 1-4 (parallel) Batch 2: Chunks 5-8 (parallel) [if needed] -
Merge results: All outputs combined with deduplication
Chunk 1: 25,000 rules Chunk 2: 30,000 rules Chunk 3: 28,000 rules Chunk 4: 27,000 rules ───────────────────── Total: 110,000 rules After dedup: 95,000 rules (removed 15,000 duplicates)
Automatic Enablement
When enabled is not explicitly set:
- Multiple sources + Source strategy → Chunking enabled automatically
- Single source → Chunking disabled (no benefit)
Merge Behavior
The merge process:
- Flattens all chunk outputs into a single list
- Deduplicates actual filter rules while preserving order
- Preserves comments (
!and#prefixed lines) - Preserves empty lines for readability
- Reports duplicate count in logs
[INFO] Merging 4 chunks...
[DEBUG] Total rules before deduplication: 110000
[INFO] Merged to 95000 rules (removed 15000 duplicates)
Result Metrics
The compilation result includes chunking metrics:
var result = await compiler.CompileAsync(options);
// ChunkedCompilationResult properties:
result.TotalRules // Sum of all chunk rules
result.FinalRuleCount // After deduplication
result.DuplicatesRemoved // Number removed
result.TotalElapsedMs // Wall clock time
result.EstimatedSpeedup // Ratio of sequential/parallel time
result.Chunks // Individual chunk metadata
Best Practices
When to Enable Chunking
| Sources | Recommendation |
|---|---|
| 1-5 | Disable chunking (overhead not worth it) |
| 6-20 | Enable with default settings |
| 20+ | Enable with maxParallel matching CPU cores |
Optimal Settings
// Recommended for most large filter lists
var options = new ChunkingOptions
{
Enabled = true,
ChunkSize = 100_000,
MaxParallel = Math.Max(2, Environment.ProcessorCount),
Strategy = ChunkingStrategy.Source
};
Memory Considerations
- Each chunk runs a separate
hostlist-compilerprocess - Memory usage scales with
maxParallel - For memory-constrained systems, reduce
maxParallelto 2-4
Limitations
- Network-bound sources: Chunking helps less when sources are slow to download
- Single large source: Cannot parallelize a single source file
- Transformation order: Global transformations run after merge, not per-chunk
Troubleshooting
Chunking not enabled
Check that:
enabled: truein configuration- Multiple sources exist (for automatic enablement)
ChunkingServiceis registered (DI scenarios)
Poor speedup
Possible causes:
- Sources are network-bound (download time dominates)
- Too few sources to benefit from parallelism
maxParallelset too low
High memory usage
Solutions:
- Reduce
maxParallelto 2-4 - Ensure sufficient RAM (2GB+ recommended for large lists)
API Reference
.NET Types
// Options
public class ChunkingOptions
{
public bool Enabled { get; set; }
public int ChunkSize { get; set; }
public int MaxParallel { get; set; }
public ChunkingStrategy Strategy { get; set; }
}
// Result
public class ChunkedCompilationResult
{
public bool Success { get; set; }
public long TotalElapsedMs { get; set; }
public List<ChunkMetadata> Chunks { get; set; }
public int TotalRules { get; set; }
public int FinalRuleCount { get; set; }
public int DuplicatesRemoved { get; set; }
public double EstimatedSpeedup { get; }
}
// Service interface
public interface IChunkingService
{
bool ShouldEnableChunking(CompilerConfiguration config, ChunkingOptions? options);
List<(CompilerConfiguration Config, ChunkMetadata Metadata)> SplitIntoChunks(...);
Task<ChunkedCompilationResult> CompileChunksAsync(...);
(string[] Rules, int DuplicatesRemoved) MergeChunks(List<string[]> chunkResults);
double EstimateSpeedup(int totalRules, ChunkingOptions options);
}
Python Types
from dataclasses import dataclass
from enum import Enum
class ChunkingStrategy(Enum):
SOURCE = "source"
LINE_COUNT = "line_count"
@dataclass
class ChunkingOptions:
enabled: bool = False
chunk_size: int = 100_000
max_parallel: int = os.cpu_count() or 4
strategy: ChunkingStrategy = ChunkingStrategy.SOURCE
@classmethod
def default(cls) -> "ChunkingOptions": ...
@classmethod
def for_large_lists(cls) -> "ChunkingOptions": ...
@dataclass
class ChunkMetadata:
index: int
total: int
estimated_rules: int = 0
actual_rules: int | None = None
sources: list[FilterSource] = field(default_factory=list)
elapsed_ms: int | None = None
success: bool = False
@dataclass
class ChunkedCompilationResult:
success: bool = False
total_elapsed_ms: int = 0
chunks: list[ChunkMetadata] = field(default_factory=list)
total_rules: int = 0
final_rule_count: int = 0
duplicates_removed: int = 0
@property
def estimated_speedup(self) -> float: ...
# Functions
def should_enable_chunking(config: CompilerConfiguration, options: ChunkingOptions | None) -> bool: ...
def split_into_chunks(config: CompilerConfiguration, options: ChunkingOptions) -> list[tuple[CompilerConfiguration, ChunkMetadata]]: ...
async def compile_chunks_async(chunks: list, options: ChunkingOptions, debug: bool = False) -> ChunkedCompilationResult: ...
def merge_chunks(chunk_results: list[list[str]]) -> tuple[list[str], int]: ...
def estimate_speedup(total_rules: int, options: ChunkingOptions) -> float: ...
Rust Types
// Strategy enum
#[derive(Debug, Clone, Copy, PartialEq, Eq, Default)]
pub enum ChunkingStrategy {
#[default]
Source,
LineCount,
}
// Options
pub struct ChunkingOptions {
pub enabled: bool,
pub chunk_size: usize,
pub max_parallel: usize,
pub strategy: ChunkingStrategy,
}
impl ChunkingOptions {
pub fn new() -> Self;
pub fn for_large_lists() -> Self;
pub fn with_enabled(self, enabled: bool) -> Self;
pub fn with_chunk_size(self, chunk_size: usize) -> Self;
pub fn with_max_parallel(self, max_parallel: usize) -> Self;
pub fn with_strategy(self, strategy: ChunkingStrategy) -> Self;
}
// Metadata
pub struct ChunkMetadata {
pub index: usize,
pub total: usize,
pub estimated_rules: usize,
pub actual_rules: Option<usize>,
pub sources: Vec<FilterSource>,
pub elapsed_ms: Option<u64>,
pub success: bool,
pub error_message: Option<String>,
pub output_path: Option<PathBuf>,
}
// Result
pub struct ChunkedCompilationResult {
pub success: bool,
pub total_elapsed_ms: u64,
pub chunks: Vec<ChunkMetadata>,
pub total_rules: usize,
pub final_rule_count: usize,
pub duplicates_removed: usize,
pub merged_rules: Option<Vec<String>>,
pub errors: Vec<String>,
}
impl ChunkedCompilationResult {
pub fn estimated_speedup(&self) -> f64;
}
// Functions
pub fn should_enable_chunking(config: &CompilerConfig, options: Option<&ChunkingOptions>) -> bool;
pub fn split_into_chunks(config: &CompilerConfig, options: &ChunkingOptions) -> Vec<(CompilerConfig, ChunkMetadata)>;
pub async fn compile_chunks_async(chunks: Vec<(CompilerConfig, ChunkMetadata)>, options: &ChunkingOptions, debug: bool) -> Result<ChunkedCompilationResult>;
pub fn merge_chunks(chunk_results: &[Vec<String>]) -> (Vec<String>, usize);
pub fn estimate_speedup(total_rules: usize, options: &ChunkingOptions) -> f64;
Benchmarking
The repository includes a comprehensive benchmark suite to measure chunking performance.
Quick Synthetic Benchmark
Run a quick simulation to see expected speedups on your system:
cd benchmarks
# Run comparison suite (recommended)
python quick_benchmark.py --suite
# Run parallel scaling test
python quick_benchmark.py --scaling
# Custom benchmark
python quick_benchmark.py --rules 500000 --parallel 8
# Interactive mode
python quick_benchmark.py --interactive
Example output:
======================================================================
CHUNKING PERFORMANCE COMPARISON SUITE
======================================================================
CPU cores available: 8
Max parallel workers: 8
Size Sequential Parallel Speedup Efficiency
----------------------------------------------------------------------
10K rules 150 ms 70 ms 2.14x 27%
50K rules 570 ms 130 ms 4.38x 55%
200K rules 2,350 ms 350 ms 6.71x 84%
500K rules 5,400 ms 800 ms 6.75x 84%
----------------------------------------------------------------------
Average speedup: 5.00x
Maximum speedup: 6.75x
Full Benchmark with Real Compilation
Generate synthetic test data and run actual compilation benchmarks:
cd benchmarks
# Generate test data (small, medium, large, xlarge filter lists)
python generate_synthetic_data.py --all
# Run benchmarks across all compilers
python run_benchmarks.py
# Run specific compiler only
python run_benchmarks.py --compiler python --iterations 5
# Run specific size only
python run_benchmarks.py --size large
Expected Performance
Based on synthetic benchmarks:
| Rule Count | Sequential | 4 Workers | 8 Workers | Speedup (8w) |
|---|---|---|---|---|
| 10,000 | ~150ms | ~60ms | ~40ms | 3.75x |
| 50,000 | ~600ms | ~200ms | ~120ms | 5.0x |
| 200,000 | ~2.5s | ~800ms | ~400ms | 6.25x |
| 500,000 | ~6s | ~1.8s | ~900ms | 6.67x |
Actual times vary by hardware, I/O speed, and network latency for remote sources
Parallel Scaling
Speedup scales with CPU cores but with diminishing returns:
| Workers | Theoretical Max | Typical Efficiency |
|---|---|---|
| 2 | 2.0x | 90-100% |
| 4 | 4.0x | 85-95% |
| 8 | 8.0x | 75-90% |
| 16 | 16.0x | 60-80% |
Efficiency decreases due to:
- Process startup overhead
- Merge/deduplication time
- Memory bandwidth limits
- I/O contention
Future Enhancements
- Line-count strategy: Balance chunks by estimated rule count
- Streaming merge: Reduce memory usage for very large outputs
- Source caching: Cache downloaded sources across chunks
- Progress callbacks: Real-time progress reporting