# `hpfracc.analytics` — architecture, dependencies, and maintenance This note complements [CONTRIBUTING.md](https://github.com/dave2k77/hpfracc/blob/main/CONTRIBUTING.md), [ALGORITHMS_ARCHITECTURE.md](ALGORITHMS_ARCHITECTURE.md), [SPECIAL_ARCHITECTURE.md](SPECIAL_ARCHITECTURE.md), [SOLVERS_ARCHITECTURE.md](SOLVERS_ARCHITECTURE.md), [UTILS_ARCHITECTURE.md](UTILS_ARCHITECTURE.md), and [VALIDATION_ARCHITECTURE.md](VALIDATION_ARCHITECTURE.md). It describes how the **analytics** package is structured, what it depends on, how data flows, known risks, and how to exercise tests locally. --- ## 1. Design goals 1. **Opt-in telemetry-style tracking** of estimator/method names, parameters, array sizes, and success flags—persisted locally (SQLite by default), not sent to a remote service. 2. **Four concerns, four submodules**: usage popularity, performance timing/memory, error/reliability patterns, and workflow/session sequences. 3. **Single façade** (`AnalyticsManager` + `AnalyticsConfig`) for coordinated tracking, export (`json` / `csv` / `html`), and retention cleanup. 4. **Isolation from numerical core**: `hpfracc.core` and `hpfracc.algorithms` do **not** import `hpfracc.analytics`; integration is **call-site only** (examples, demos, or future explicit hooks). --- ## 2. Module layout (mental model) | Component | File | Responsibility | |-----------|------|----------------| | **Facade** | `analytics_manager.py` | `AnalyticsManager`, `AnalyticsConfig`; orchestrates sub-trackers; JSON/CSV/HTML reports; optional plots for HTML. | | **Usage** | `usage_tracker.py` | `UsageTracker`, `UsageEvent`, `UsageStats`; SQLite `usage_events`. | | **Performance** | `performance_monitor.py` | `PerformanceMonitor`, `PerformanceEvent`, `PerformanceStats`; SQLite `performance_events`; uses **psutil** + NumPy. | | **Errors** | `error_analyzer.py` | `ErrorAnalyzer`, `ErrorEvent`, `ErrorStats`; SQLite `error_events`; traceback hashing. | | **Workflow** | `workflow_insights.py` | `WorkflowInsights`, `WorkflowEvent`, patterns/transitions; SQLite `workflow_events`. | | **Package surface** | `__init__.py` | Re-exports the six public symbols listed in `__all__`. | --- ## 3. Dependency diagram `AnalyticsManager` is the only importer of all four submodules at module level. Submodules do not import each other. ```text flowchart TB subgraph analytics_pkg["hpfracc.analytics"] CFG["AnalyticsConfig"] MGR["AnalyticsManager"] UT["UsageTracker"] PM["PerformanceMonitor"] EA["ErrorAnalyzer"] WI["WorkflowInsights"] end subgraph storage["Local persistence"] SQL["SQLite\n(manager: under report dir)"] FS["Report dir\n(analytics_reports /)"] end subgraph heavy["Other deps"] PD["pandas\n(lazy: CSV export)"] PLT["matplotlib + seaborn\n(lazy: HTML plots)"] PSU["psutil"] NP["numpy"] end CFG --> MGR MGR --> UT MGR --> PM MGR --> EA MGR --> WI UT --> SQL PM --> SQL EA --> SQL WI --> SQL MGR --> FS PM --> PSU PM --> NP ``` **Import cost:** `analytics_manager` does **not** import **pandas** or **matplotlib** at module load. **pandas** is imported inside `_generate_csv_report` only; **matplotlib** and **seaborn** inside `_create_analytics_plots` (HTML report path). JSON-only workflows avoid those imports. The diagram keeps pandas/matplotlib in a separate box as a reminder, not as eager imports from `MGR`. --- ## 4. Data flow (typical use) 1. Caller constructs `AnalyticsConfig` and `AnalyticsManager`. SQLite files default to ``/_analytics_data/*.db`` unless `database_dir` is set explicitly. 2. On each logical “method run”, caller invokes `track_method_call(...)` (and optionally wraps execution in `monitor_method_performance(...)`). 3. `AnalyticsManager` forwards to: - `UsageTracker.track_usage` - `WorkflowInsights.track_workflow_event` - `ErrorAnalyzer.track_error` (only if an exception object is passed). 4. **Performance** events are recorded separately via `PerformanceMonitor`’s context manager (used from `monitor_method_performance`). 5. Aggregation/reporting: `get_comprehensive_analytics`, `generate_analytics_report`, `export_all_data`, `cleanup_old_data`. There is **no automatic instrumentation** of `Caputo` / `RiemannLiouville` / etc.; any integration must be added explicitly in application or example code. --- ## 5. Naming and boundaries - **`AnalyticsManager`** vs **`AnalyticsConfig`**: manager holds runtime state (`session_id`, subcomponents, `output_dir`); config is a frozen-style dataclass of feature flags and export settings. - **Database filenames** (`usage_analytics.db`, `performance_analytics.db`, …) are defaults; tests should override with temp paths (see `tests/test_analytics/`). - **No naming collision** with `hpfracc.ml` or `benchmarks` modules; the word “analytics” here means **library usage telemetry**, not autograd “forward pass analytics”. --- ## 6. Risk register and mitigations | Risk | Mitigation / note | |------|-------------------| | **SQLite relative to CWD (standalone trackers)** | `UsageTracker` / `PerformanceMonitor` / etc. still default to `*_analytics.db` in the CWD if constructed without `db_path`. Prefer `AnalyticsManager` (central layout) or pass explicit `db_path`. | | **`AnalyticsManager` + default `report_output_dir`** | Still relative to CWD (`analytics_reports/_analytics_data`), but **one tree**; set `report_output_dir` or `database_dir` for CI and notebooks. | | **Headless / optional plotting** | HTML report path imports matplotlib/seaborn; may need a GUI backend or `MPLBACKEND=Agg` in constrained environments. | | **`psutil` dependency** | Declared in `[project] dependencies` (`pyproject.toml`) for pip installs; aligns with `performance_monitor` and other modules. | | **Swallowed failures** | `track_method_call` and `get_comprehensive_analytics` catch broad `Exception` and log; callers may assume tracking succeeded. Acceptable for telemetry; document if tightening. | | **Privacy / portability** | Parameters are JSON-serialized into SQLite; callers should avoid putting secrets into `parameters`. | | **HTML reports embed emoji in static strings** | Cosmetic; harmless for file output, irrelevant for numerical correctness. | --- ## 7. Tests and coverage **Pytest tree (representative):** - `tests/test_analytics/` — expanded and comprehensive tests per submodule. - `tests_unittest/test_analytics.py` — lighter unittest-style smoke paths. Example focused run from repo root: ```bash python -m pytest tests/test_analytics/ tests_unittest/test_analytics.py -q ``` Optional coverage (whole package avoids some Windows/JAX + pytest-cov edge cases—same guidance as [ALGORITHMS_ARCHITECTURE.md](ALGORITHMS_ARCHITECTURE.md) §6): ```bash python -m pytest tests/test_analytics/ --cov=hpfracc --cov-report=term-missing:skip-covered -q ``` **HTML / matplotlib tests:** Avoid `patch("builtins.open", mock_open())` while exercising `_generate_html_report` / `generate_analytics_report` with plotting: matplotlib’s font manager opens real font paths via `open`, and a global `open` mock can raise `PytestUnraisableExceptionWarning` (`FT2Font` / `expected bytes, str found`). Prefer `tmp_path` for `report_output_dir`, set `MPLBACKEND=Agg`, and let HTML files use the real `open` (see `tests/test_analytics/test_analytics_manager_comprehensive_coverage.py`). --- ## 8. Related documentation - [CONTRIBUTING.md](https://github.com/dave2k77/hpfracc/blob/main/CONTRIBUTING.md) - [ALGORITHMS_ARCHITECTURE.md](ALGORITHMS_ARCHITECTURE.md) - [BENCHMARKS_ARCHITECTURE.md](BENCHMARKS_ARCHITECTURE.md) - [SPECIAL_ARCHITECTURE.md](SPECIAL_ARCHITECTURE.md) - [SOLVERS_ARCHITECTURE.md](SOLVERS_ARCHITECTURE.md) - [UTILS_ARCHITECTURE.md](UTILS_ARCHITECTURE.md) - [VALIDATION_ARCHITECTURE.md](VALIDATION_ARCHITECTURE.md) - Examples: `examples/benchmarks/analytics_demo.py` (simulated calls into `AnalyticsManager`) --- ## 9. Consolidation / deprecation candidates (no action required unless you choose) These are **observations**, not committed roadmap items: - **Single SQLite module**: The four trackers repeat similar `_setup_database` / export / retention patterns; a small internal `sqlite_store.py` could deduplicate boilerplate without changing public API. - **Further lazy imports**: `seaborn` is only needed inside `_create_analytics_plots`; could defer its import to the first line of that helper (minor). - **Optional extra**: Declare an `analytics` optional extra in `pyproject.toml` if the project ever splits “minimal numerical install” from “telemetry + reporting”; today analytics ships with the main package surface in `hpfracc/analytics/`.