`hpfracc.analytics` — architecture, dependencies, and maintenance

This note complements CONTRIBUTING.md, ALGORITHMS_ARCHITECTURE.md, SPECIAL_ARCHITECTURE.md, SOLVERS_ARCHITECTURE.md, UTILS_ARCHITECTURE.md, and VALIDATION_ARCHITECTURE.md. It describes how the analytics package is structured, what it depends on, how data flows, known risks, and how to exercise tests locally.

1. Design goals

Opt-in telemetry-style tracking of estimator/method names, parameters, array sizes, and success flags—persisted locally (SQLite by default), not sent to a remote service.
Four concerns, four submodules: usage popularity, performance timing/memory, error/reliability patterns, and workflow/session sequences.
Single façade (AnalyticsManager + AnalyticsConfig) for coordinated tracking, export (json / csv / html), and retention cleanup.
Isolation from numerical core: hpfracc.core and hpfracc.algorithms do not import hpfracc.analytics; integration is call-site only (examples, demos, or future explicit hooks).

2. Module layout (mental model)

Component	File	Responsibility
Facade	`analytics_manager.py`	`AnalyticsManager`, `AnalyticsConfig`; orchestrates sub-trackers; JSON/CSV/HTML reports; optional plots for HTML.
Usage	`usage_tracker.py`	`UsageTracker`, `UsageEvent`, `UsageStats`; SQLite `usage_events`.
Performance	`performance_monitor.py`	`PerformanceMonitor`, `PerformanceEvent`, `PerformanceStats`; SQLite `performance_events`; uses psutil + NumPy.
Errors	`error_analyzer.py`	`ErrorAnalyzer`, `ErrorEvent`, `ErrorStats`; SQLite `error_events`; traceback hashing.
Workflow	`workflow_insights.py`	`WorkflowInsights`, `WorkflowEvent`, patterns/transitions; SQLite `workflow_events`.
Package surface	`__init__.py`	Re-exports the six public symbols listed in `__all__`.

3. Dependency diagram

AnalyticsManager is the only importer of all four submodules at module level. Submodules do not import each other.

flowchart TB
  subgraph analytics_pkg["hpfracc.analytics"]
    CFG["AnalyticsConfig"]
    MGR["AnalyticsManager"]
    UT["UsageTracker"]
    PM["PerformanceMonitor"]
    EA["ErrorAnalyzer"]
    WI["WorkflowInsights"]
  end

  subgraph storage["Local persistence"]
    SQL["SQLite\n(manager: under report dir)"]
    FS["Report dir\n(analytics_reports /)"]
  end

  subgraph heavy["Other deps"]
    PD["pandas\n(lazy: CSV export)"]
    PLT["matplotlib + seaborn\n(lazy: HTML plots)"]
    PSU["psutil"]
    NP["numpy"]
  end

  CFG --> MGR
  MGR --> UT
  MGR --> PM
  MGR --> EA
  MGR --> WI
  UT --> SQL
  PM --> SQL
  EA --> SQL
  WI --> SQL
  MGR --> FS
  PM --> PSU
  PM --> NP

Import cost: analytics_manager does not import pandas or matplotlib at module load. pandas is imported inside _generate_csv_report only; matplotlib and seaborn inside _create_analytics_plots (HTML report path). JSON-only workflows avoid those imports. The diagram keeps pandas/matplotlib in a separate box as a reminder, not as eager imports from MGR.

4. Data flow (typical use)

Caller constructs AnalyticsConfig and AnalyticsManager. SQLite files default to <report_output_dir>/_analytics_data/*.db unless database_dir is set explicitly.
On each logical “method run”, caller invokes track_method_call(...) (and optionally wraps execution in monitor_method_performance(...)).
AnalyticsManager forwards to:
- UsageTracker.track_usage
- WorkflowInsights.track_workflow_event
- ErrorAnalyzer.track_error (only if an exception object is passed).
Performance events are recorded separately via PerformanceMonitor’s context manager (used from monitor_method_performance).
Aggregation/reporting: get_comprehensive_analytics, generate_analytics_report, export_all_data, cleanup_old_data.

There is no automatic instrumentation of Caputo / RiemannLiouville / etc.; any integration must be added explicitly in application or example code.

5. Naming and boundaries

AnalyticsManager vs AnalyticsConfig: manager holds runtime state (session_id, subcomponents, output_dir); config is a frozen-style dataclass of feature flags and export settings.
Database filenames (usage_analytics.db, performance_analytics.db, …) are defaults; tests should override with temp paths (see tests/test_analytics/).
No naming collision with hpfracc.ml or benchmarks modules; the word “analytics” here means library usage telemetry, not autograd “forward pass analytics”.

6. Risk register and mitigations

Risk	Mitigation / note
SQLite relative to CWD (standalone trackers)	`UsageTracker` / `PerformanceMonitor` / etc. still default to `*_analytics.db` in the CWD if constructed without `db_path`. Prefer `AnalyticsManager` (central layout) or pass explicit `db_path`.
`AnalyticsManager` + default `report_output_dir`	Still relative to CWD (`analytics_reports/_analytics_data`), but one tree; set `report_output_dir` or `database_dir` for CI and notebooks.
Headless / optional plotting	HTML report path imports matplotlib/seaborn; may need a GUI backend or `MPLBACKEND=Agg` in constrained environments.
`psutil` dependency	Declared in `[project] dependencies` (`pyproject.toml`) for pip installs; aligns with `performance_monitor` and other modules.
Swallowed failures	`track_method_call` and `get_comprehensive_analytics` catch broad `Exception` and log; callers may assume tracking succeeded. Acceptable for telemetry; document if tightening.
Privacy / portability	Parameters are JSON-serialized into SQLite; callers should avoid putting secrets into `parameters`.
HTML reports embed emoji in static strings	Cosmetic; harmless for file output, irrelevant for numerical correctness.

7. Tests and coverage

Pytest tree (representative):

tests/test_analytics/ — expanded and comprehensive tests per submodule.
tests_unittest/test_analytics.py — lighter unittest-style smoke paths.

Example focused run from repo root:

python -m pytest tests/test_analytics/ tests_unittest/test_analytics.py -q

Optional coverage (whole package avoids some Windows/JAX + pytest-cov edge cases—same guidance as ALGORITHMS_ARCHITECTURE.md §6):

python -m pytest tests/test_analytics/ --cov=hpfracc --cov-report=term-missing:skip-covered -q

HTML / matplotlib tests: Avoid patch("builtins.open", mock_open()) while exercising _generate_html_report / generate_analytics_report with plotting: matplotlib’s font manager opens real font paths via open, and a global open mock can raise PytestUnraisableExceptionWarning (FT2Font / expected bytes, str found). Prefer tmp_path for report_output_dir, set MPLBACKEND=Agg, and let HTML files use the real open (see tests/test_analytics/test_analytics_manager_comprehensive_coverage.py).

9. Consolidation / deprecation candidates (no action required unless you choose)

These are observations, not committed roadmap items:

Single SQLite module: The four trackers repeat similar _setup_database / export / retention patterns; a small internal sqlite_store.py could deduplicate boilerplate without changing public API.
Further lazy imports: seaborn is only needed inside _create_analytics_plots; could defer its import to the first line of that helper (minor).
Optional extra: Declare an analytics optional extra in pyproject.toml if the project ever splits “minimal numerical install” from “telemetry + reporting”; today analytics ships with the main package surface in hpfracc/analytics/.

hpfracc.analytics — architecture, dependencies, and maintenance