Benchmarking in hpfracc — layout, coupling, and maintenance

1. Three different “benchmark” surfaces

Location	Role	Typical entry
`hpfracc/benchmarks/benchmark_runner.py`	General numerical benchmarks (array sizes, synthetic test functions, timing/memory, optional plots + CSV export).	`BenchmarkRunner`, `BenchmarkConfig`, `BenchmarkResult` — used by `scripts/run_benchmarks.py` and `examples/benchmarks/benchmark_demo.py`.
`hpfracc/benchmarks/ml_performance_benchmark.py`	Torch-heavy ML layer benchmarks (`FractionalNeuralNetwork`, conv/LSTM/transformer, etc.).	Canonical: `MLPerformanceBenchmark`, `MLBenchmarkConfig`, `MLBenchmarkResult`. Deprecated aliases: `BenchmarkConfig` / `BenchmarkResult` (subclasses that emit `DeprecationWarning` on construction).
`hpfracc/validation/benchmarks.py`	Validation-oriented `PerformanceBenchmark` helpers (warmup, repeat timing) for numerical method comparison.	Used by validation workflows; not the same API as `BenchmarkRunner`.
Repo root `benchmarks/`	Standalone scripts (e.g. intelligent backend timing).	Run as scripts; not necessarily imported as a package.

Numerical (benchmark_runner.py): BenchmarkConfig, BenchmarkResult, BenchmarkRunner — these names are only for the numerical runner.
ML (ml_performance_benchmark.py): use MLBenchmarkConfig and MLBenchmarkResult for new code. The old names BenchmarkConfig / BenchmarkResult in this module remain as deprecated subclasses (same constructor shape, DeprecationWarning in __post_init__).
hpfracc/benchmarks/__init__.py: imports the numerical runner eagerly; exposes MLPerformanceBenchmark, MLBenchmarkConfig, MLBenchmarkResult via __getattr__ so import hpfracc.benchmarks does not load PyTorch until you touch an ML symbol.
validation.method_benchmarks.BenchmarkResult is a third, unrelated dataclass (includes BenchmarkType, success, etc.).

benchmark_runner.py

Core: NumPy, psutil, json, logging (no root logging.basicConfig; configure logging in applications or scripts).
Matplotlib is imported only inside _plot_performance_results, _plot_accuracy_results, and _plot_memory_results so import hpfracc.benchmarks.benchmark_runner does not load pyplot for CSV-only or non-plotting paths.
pandas was already lazy inside the CSV export helper.

ml_performance_benchmark.py

Eager: torch, numpy, psutil, and hpfracc.ml components (heavy).
Matplotlib / seaborn are imported only inside _generate_visualizations.

validation/benchmarks.py

Risk	Mitigation
Default `output_dir="benchmark_results"` (runner) is CWD-relative	Set `BenchmarkConfig(output_dir=...)` to a temp or project artifacts directory in CI.
Name collision across numerical vs ML	Use *`MLBenchmark`** for ML; numerical `BenchmarkConfig` stays on `benchmark_runner`. Deprecated ML aliases will be removed in a future major release after a deprecation window.
ML benchmark cost	Full `run_comprehensive_benchmark` is expensive; gate behind explicit scripts or reduced configs in CI.

tests/test_benchmarks/ — smoke tests: subprocess import guard (matplotlib not loaded for benchmark_runner), BenchmarkRunner with temp output_dir, JSON/CSV save_results, lazy ML exports on hpfracc.benchmarks, and ML canonical vs deprecated DeprecationWarning behaviour (skipped if PyTorch is unavailable).

When extending tests:

Prefer temp output_dir, MPLBACKEND=Agg, and do not patch builtins.open globally alongside matplotlib (see ANALYTICS_ARCHITECTURE.md §7 — same font-manager footgun).

python -m pytest tests/test_benchmarks/ -q