hpfracc.analytics — architecture, dependencies, and maintenance

This note complements CONTRIBUTING.md, ALGORITHMS_ARCHITECTURE.md, SPECIAL_ARCHITECTURE.md, SOLVERS_ARCHITECTURE.md, UTILS_ARCHITECTURE.md, and VALIDATION_ARCHITECTURE.md. It describes how the analytics package is structured, what it depends on, how data flows, known risks, and how to exercise tests locally.


1. Design goals

  1. Opt-in telemetry-style tracking of estimator/method names, parameters, array sizes, and success flags—persisted locally (SQLite by default), not sent to a remote service.

  2. Four concerns, four submodules: usage popularity, performance timing/memory, error/reliability patterns, and workflow/session sequences.

  3. Single façade (AnalyticsManager + AnalyticsConfig) for coordinated tracking, export (json / csv / html), and retention cleanup.

  4. Isolation from numerical core: hpfracc.core and hpfracc.algorithms do not import hpfracc.analytics; integration is call-site only (examples, demos, or future explicit hooks).


2. Module layout (mental model)

Component

File

Responsibility

Facade

analytics_manager.py

AnalyticsManager, AnalyticsConfig; orchestrates sub-trackers; JSON/CSV/HTML reports; optional plots for HTML.

Usage

usage_tracker.py

UsageTracker, UsageEvent, UsageStats; SQLite usage_events.

Performance

performance_monitor.py

PerformanceMonitor, PerformanceEvent, PerformanceStats; SQLite performance_events; uses psutil + NumPy.

Errors

error_analyzer.py

ErrorAnalyzer, ErrorEvent, ErrorStats; SQLite error_events; traceback hashing.

Workflow

workflow_insights.py

WorkflowInsights, WorkflowEvent, patterns/transitions; SQLite workflow_events.

Package surface

__init__.py

Re-exports the six public symbols listed in __all__.


3. Dependency diagram

AnalyticsManager is the only importer of all four submodules at module level. Submodules do not import each other.

flowchart TB
  subgraph analytics_pkg["hpfracc.analytics"]
    CFG["AnalyticsConfig"]
    MGR["AnalyticsManager"]
    UT["UsageTracker"]
    PM["PerformanceMonitor"]
    EA["ErrorAnalyzer"]
    WI["WorkflowInsights"]
  end

  subgraph storage["Local persistence"]
    SQL["SQLite\n(manager: under report dir)"]
    FS["Report dir\n(analytics_reports /)"]
  end

  subgraph heavy["Other deps"]
    PD["pandas\n(lazy: CSV export)"]
    PLT["matplotlib + seaborn\n(lazy: HTML plots)"]
    PSU["psutil"]
    NP["numpy"]
  end

  CFG --> MGR
  MGR --> UT
  MGR --> PM
  MGR --> EA
  MGR --> WI
  UT --> SQL
  PM --> SQL
  EA --> SQL
  WI --> SQL
  MGR --> FS
  PM --> PSU
  PM --> NP

Import cost: analytics_manager does not import pandas or matplotlib at module load. pandas is imported inside _generate_csv_report only; matplotlib and seaborn inside _create_analytics_plots (HTML report path). JSON-only workflows avoid those imports. The diagram keeps pandas/matplotlib in a separate box as a reminder, not as eager imports from MGR.


4. Data flow (typical use)

  1. Caller constructs AnalyticsConfig and AnalyticsManager. SQLite files default to <report_output_dir>/_analytics_data/*.db unless database_dir is set explicitly.

  2. On each logical “method run”, caller invokes track_method_call(...) (and optionally wraps execution in monitor_method_performance(...)).

  3. AnalyticsManager forwards to:

    • UsageTracker.track_usage

    • WorkflowInsights.track_workflow_event

    • ErrorAnalyzer.track_error (only if an exception object is passed).

  4. Performance events are recorded separately via PerformanceMonitor’s context manager (used from monitor_method_performance).

  5. Aggregation/reporting: get_comprehensive_analytics, generate_analytics_report, export_all_data, cleanup_old_data.

There is no automatic instrumentation of Caputo / RiemannLiouville / etc.; any integration must be added explicitly in application or example code.


5. Naming and boundaries

  • AnalyticsManager vs AnalyticsConfig: manager holds runtime state (session_id, subcomponents, output_dir); config is a frozen-style dataclass of feature flags and export settings.

  • Database filenames (usage_analytics.db, performance_analytics.db, …) are defaults; tests should override with temp paths (see tests/test_analytics/).

  • No naming collision with hpfracc.ml or benchmarks modules; the word “analytics” here means library usage telemetry, not autograd “forward pass analytics”.


6. Risk register and mitigations

Risk

Mitigation / note

SQLite relative to CWD (standalone trackers)

UsageTracker / PerformanceMonitor / etc. still default to *_analytics.db in the CWD if constructed without db_path. Prefer AnalyticsManager (central layout) or pass explicit db_path.

AnalyticsManager + default report_output_dir

Still relative to CWD (analytics_reports/_analytics_data), but one tree; set report_output_dir or database_dir for CI and notebooks.

Headless / optional plotting

HTML report path imports matplotlib/seaborn; may need a GUI backend or MPLBACKEND=Agg in constrained environments.

psutil dependency

Declared in [project] dependencies (pyproject.toml) for pip installs; aligns with performance_monitor and other modules.

Swallowed failures

track_method_call and get_comprehensive_analytics catch broad Exception and log; callers may assume tracking succeeded. Acceptable for telemetry; document if tightening.

Privacy / portability

Parameters are JSON-serialized into SQLite; callers should avoid putting secrets into parameters.

HTML reports embed emoji in static strings

Cosmetic; harmless for file output, irrelevant for numerical correctness.


7. Tests and coverage

Pytest tree (representative):

  • tests/test_analytics/ — expanded and comprehensive tests per submodule.

  • tests_unittest/test_analytics.py — lighter unittest-style smoke paths.

Example focused run from repo root:

python -m pytest tests/test_analytics/ tests_unittest/test_analytics.py -q

Optional coverage (whole package avoids some Windows/JAX + pytest-cov edge cases—same guidance as ALGORITHMS_ARCHITECTURE.md §6):

python -m pytest tests/test_analytics/ --cov=hpfracc --cov-report=term-missing:skip-covered -q

HTML / matplotlib tests: Avoid patch("builtins.open", mock_open()) while exercising _generate_html_report / generate_analytics_report with plotting: matplotlib’s font manager opens real font paths via open, and a global open mock can raise PytestUnraisableExceptionWarning (FT2Font / expected bytes, str found). Prefer tmp_path for report_output_dir, set MPLBACKEND=Agg, and let HTML files use the real open (see tests/test_analytics/test_analytics_manager_comprehensive_coverage.py).



9. Consolidation / deprecation candidates (no action required unless you choose)

These are observations, not committed roadmap items:

  • Single SQLite module: The four trackers repeat similar _setup_database / export / retention patterns; a small internal sqlite_store.py could deduplicate boilerplate without changing public API.

  • Further lazy imports: seaborn is only needed inside _create_analytics_plots; could defer its import to the first line of that helper (minor).

  • Optional extra: Declare an analytics optional extra in pyproject.toml if the project ever splits “minimal numerical install” from “telemetry + reporting”; today analytics ships with the main package surface in hpfracc/analytics/.