Benchmarking in hpfracc โ€” layout, coupling, and maintenance๏ƒ

This note complements CONTRIBUTING.md, ALGORITHMS_ARCHITECTURE.md, ANALYTICS_ARCHITECTURE.md, SPECIAL_ARCHITECTURE.md, UTILS_ARCHITECTURE.md, VALIDATION_ARCHITECTURE.md, and SOLVERS_ARCHITECTURE.md. It maps where benchmarks live, how they differ, naming pitfalls, and lightweight import rules.


1. Three different โ€œbenchmarkโ€ surfaces๏ƒ

Location

Role

Typical entry

hpfracc/benchmarks/benchmark_runner.py

General numerical benchmarks (array sizes, synthetic test functions, timing/memory, optional plots + CSV export).

BenchmarkRunner, BenchmarkConfig, BenchmarkResult โ€” used by scripts/run_benchmarks.py and examples/benchmarks/benchmark_demo.py.

hpfracc/benchmarks/ml_performance_benchmark.py

Torch-heavy ML layer benchmarks (FractionalNeuralNetwork, conv/LSTM/transformer, etc.).

Canonical: MLPerformanceBenchmark, MLBenchmarkConfig, MLBenchmarkResult. Deprecated aliases: BenchmarkConfig / BenchmarkResult (subclasses that emit DeprecationWarning on construction).

hpfracc/validation/benchmarks.py

Validation-oriented PerformanceBenchmark helpers (warmup, repeat timing) for numerical method comparison.

Used by validation workflows; not the same API as BenchmarkRunner.

Repo root benchmarks/

Standalone scripts (e.g. intelligent backend timing).

Run as scripts; not necessarily imported as a package.


2. Naming and package surface๏ƒ

  • Numerical (benchmark_runner.py): BenchmarkConfig, BenchmarkResult, BenchmarkRunner โ€” these names are only for the numerical runner.

  • ML (ml_performance_benchmark.py): use MLBenchmarkConfig and MLBenchmarkResult for new code. The old names BenchmarkConfig / BenchmarkResult in this module remain as deprecated subclasses (same constructor shape, DeprecationWarning in __post_init__).

  • hpfracc/benchmarks/__init__.py: imports the numerical runner eagerly; exposes MLPerformanceBenchmark, MLBenchmarkConfig, MLBenchmarkResult via __getattr__ so import hpfracc.benchmarks does not load PyTorch until you touch an ML symbol.

  • validation.method_benchmarks.BenchmarkResult is a third, unrelated dataclass (includes BenchmarkType, success, etc.).


3. Dependencies and import coupling๏ƒ

benchmark_runner.py

  • Core: NumPy, psutil, json, logging (no root logging.basicConfig; configure logging in applications or scripts).

  • Matplotlib is imported only inside _plot_performance_results, _plot_accuracy_results, and _plot_memory_results so import hpfracc.benchmarks.benchmark_runner does not load pyplot for CSV-only or non-plotting paths.

  • pandas was already lazy inside the CSV export helper.

ml_performance_benchmark.py

  • Eager: torch, numpy, psutil, and hpfracc.ml components (heavy).

  • Matplotlib / seaborn are imported only inside _generate_visualizations.

validation/benchmarks.py

  • NumPy, psutil, warnings โ€” no matplotlib at module level.


4. Outputs and risks๏ƒ

Risk

Mitigation

Default output_dir="benchmark_results" (runner) is CWD-relative

Set BenchmarkConfig(output_dir=...) to a temp or project artifacts directory in CI.

Name collision across numerical vs ML

Use MLBenchmark* for ML; numerical BenchmarkConfig stays on benchmark_runner. Deprecated ML aliases will be removed in a future major release after a deprecation window.

ML benchmark cost

Full run_comprehensive_benchmark is expensive; gate behind explicit scripts or reduced configs in CI.


5. Tests๏ƒ

  • tests/test_benchmarks/ โ€” smoke tests: subprocess import guard (matplotlib not loaded for benchmark_runner), BenchmarkRunner with temp output_dir, JSON/CSV save_results, lazy ML exports on hpfracc.benchmarks, and ML canonical vs deprecated DeprecationWarning behaviour (skipped if PyTorch is unavailable).

When extending tests:

  • Prefer temp output_dir, MPLBACKEND=Agg, and do not patch builtins.open globally alongside matplotlib (see ANALYTICS_ARCHITECTURE.md ยง7 โ€” same font-manager footgun).

python -m pytest tests/test_benchmarks/ -q