Neural Fractional SDE Guide

Introduction to Neural Fractional SDEs

What are Neural Fractional SDEs?

Neural Fractional Stochastic Differential Equations (Neural fSDEs) combine the power of neural networks with fractional calculus and stochastic dynamics. They extend neural ODEs by incorporating:

  1. Stochasticity: Random noise terms for modeling uncertainty

  2. Memory effects: Fractional derivatives capture long-range temporal dependencies

  3. Learnable dynamics: Neural networks parameterize drift and diffusion functions

Mathematical Foundation

A neural fractional SDE takes the form:

\[ D_t^\alpha X(t) = f_\theta(t, X(t)) dt + g_\theta(t, X(t)) dW(t) \]

where:

  • \(\alpha \in (0, 2)\) is the fractional order

  • \(D_t^\alpha\) is the Caputo or Riemann-Liouville fractional derivative

  • \(f_\theta: \mathbb{R}^{d} \to \mathbb{R}^{d}\) is the learnable drift function (neural network)

  • \(g_\theta: \mathbb{R}^{d} \to \mathbb{R}^{d \times m}\) is the learnable diffusion function

  • \(W(t)\) is a Wiener process (or more general noise)

When to Use Neural fSDEs

Ideal for:

  • Systems with memory effects (viscoelastic materials, anomalous diffusion)

  • Data with stochastic uncertainty

  • Continuous-time dynamics with long-range dependence

  • Spatio-temporal systems (graph-SDE coupling)

  • When uncertainty quantification is crucial

Compared to Neural ODEs:

  • Neural ODEs: Deterministic, integer-order derivatives, limited memory

  • Neural fSDEs: Stochastic, fractional-order, long memory, uncertainty

Quick Start

Installation

pip install hpfracc
# Optional for Bayesian inference
pip install numpyro

Basic Example

import numpy as np
import torch
from hpfracc.ml.neural_fsde import create_neural_fsde

# Create a simple neural fSDE
model = create_neural_fsde(
    input_dim=2,
    output_dim=2,
    hidden_dim=64,
    fractional_order=0.5,
    noise_type="additive"
)

# Forward pass
x0 = torch.randn(32, 2)
t = torch.linspace(0, 1, 50)
trajectory = model(x0, t, method="euler_maruyama", num_steps=50)

print(f"Trajectory shape: {trajectory.shape}")  # (32, 2)

Training Example

import torch.nn as nn

model = create_neural_fsde(input_dim=2, output_dim=2, fractional_order=0.5)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
loss_fn = nn.MSELoss()

# Training loop
for epoch in range(100):
    optimizer.zero_grad()
    pred = model(x0, t)
    loss = loss_fn(pred, target)
    loss.backward()
    optimizer.step()
    
    if epoch % 10 == 0:
        print(f"Epoch {epoch}, Loss: {loss.item():.4f}")

Core Concepts

Fractional Orders in SDEs

The fractional order \(\alpha\) controls memory effects:

  • \(\alpha \to 0\): Nearly instantaneous response (no memory)

  • \(\alpha = 0.5\): Subdiffusion (slower than normal diffusion)

  • \(\alpha = 1\): Standard first-order dynamics

  • \(\alpha = 1.5\): Superdiffusion (faster than normal)

  • \(\alpha \to 2\): Wave-like behavior

# Compare different fractional orders
for alpha in [0.3, 0.5, 0.7, 1.0]:
    model = create_neural_fsde(
        input_dim=2,
        output_dim=2,
        fractional_order=alpha
    )
    # Different models exhibit different memory behavior

Drift and Diffusion Functions

Drift \(f_\theta\): Determines deterministic dynamics

# Custom drift network
drift_net = nn.Sequential(
    nn.Linear(3, 64),  # 2 features + time
    nn.Tanh(),
    nn.Linear(64, 2)
)

model = create_neural_fsde(
    input_dim=2,
    output_dim=2,
    drift_net=drift_net  # Use custom network
)

Diffusion \(g_\theta\): Controls stochastic noise magnitude

# Custom diffusion network
diffusion_net = nn.Sequential(
    nn.Linear(3, 64),
    nn.Tanh(),
    nn.Linear(64, 2)
)

model = create_neural_fsde(
    input_dim=2,
    output_dim=2,
    diffusion_net=diffusion_net
)

Stochastic Noise Modeling

Choose noise type based on problem:

from hpfracc.solvers import BrownianMotion, FractionalBrownianMotion

# Standard Brownian motion (independent increments)
brownian = BrownianMotion(scale=1.0)

# Fractional Brownian motion (correlated increments, Hurst H)
fbm = FractionalBrownianMotion(hurst=0.7, scale=1.0)

# Coloured noise (exponential autocorrelation)
from hpfracc.solvers import ColouredNoise
coloured = ColouredNoise(correlation_time=1.0, amplitude=1.0)

Memory Effects

Fractional derivatives introduce memory through convolution:

\[ D_t^\alpha x(t) = \int_0^t \frac{(t-\tau)^{-\alpha-1}}{\Gamma(-\alpha)} x(\tau) d\tau \]

This requires storing history, but FFT-based computation gives \(O(N \log N)\) complexity.

Building Neural fSDE Models

Model Architecture Design

from hpfracc.ml.neural_fsde import NeuralFSDEConfig

# Configure model
config = NeuralFSDEConfig(
    input_dim=10,
    output_dim=10,
    hidden_dim=128,
    num_layers=4,
    fractional_order=0.5,
    diffusion_dim=1,
    noise_type="multiplicative",  # or "additive"
    learn_alpha=True  # Make fractional order learnable
)

model = create_neural_fsde(
    input_dim=config.input_dim,
    output_dim=config.output_dim,
    hidden_dim=config.hidden_dim,
    num_layers=config.num_layers,
    fractional_order=config.fractional_order,
    diffusion_dim=config.diffusion_dim,
    noise_type=config.noise_type,
    learn_alpha=config.learn_alpha
)

Network Configuration

Drift Network Tips:

  • Include time as input: concatenate [t, x]

  • Use smooth activations (Tanh, Swish) for drift

  • Avoid ReLU for drift (creates discontinuities)

Diffusion Network Tips:

  • Use positive activations (Softplus, Sigmoid)

  • Scale output appropriately for your data

  • Consider log-scale for diffusion

# Example: Positive diffusion network
def create_diffusion_net(input_dim, output_dim):
    return nn.Sequential(
        nn.Linear(input_dim + 1, 64),
        nn.Tanh(),
        nn.Linear(64, 64),
        nn.Tanh(),
        nn.Linear(64, output_dim),
        nn.Softplus()  # Ensure positivity
    )

Learnable Parameters

# Fixed fractional order
model_fixed = create_neural_fsde(..., learn_alpha=False)

# Learnable fractional order
model_learnable = create_neural_fsde(..., learn_alpha=True)

# Access learned order
alpha = model_learnable.get_fractional_order()
print(f"Current fractional order: {alpha}")

Backend Selection

The intelligent backend selector automatically chooses optimal computation:

from hpfracc.ml.intelligent_backend_selector import select_optimal_backend

# Automatic selection
backend = select_optimal_backend(
    operation_type="neural_sde",
    data_shape=(1000, 10),
    requires_gradient=True
)

Training with Adjoint Methods

Why Adjoint Methods?

Standard backpropagation through SDEs stores full trajectory (memory-intensive). Adjoint methods solve the adjoint equation backwards.

Basic Adjoint Training

from hpfracc.ml.adjoint_optimization import AdjointConfig

config = AdjointConfig(
    use_adjoint=True,
    memory_efficient=True,
    checkpoint_frequency=10
)

model = create_neural_fsde(
    ...,
    use_adjoint=config.use_adjoint
)

Memory-Efficient Checkpointing

from hpfracc.ml.sde_adjoint_utils import (
    SDEAdjointOptimizer, CheckpointConfig
)

checkpoint_config = CheckpointConfig(
    checkpoint_frequency=10,
    checkpoint_strategy="adaptive",
    max_checkpoints=100
)

optimizer = torch.optim.Adam(model.parameters())
sde_optimizer = SDEAdjointOptimizer(
    model, optimizer,
    checkpoint_config=checkpoint_config
)

Mixed Precision Training

from hpfracc.ml.sde_adjoint_utils import MixedPrecisionConfig

mixed_precision_config = MixedPrecisionConfig(
    enable_amp=True,  # Automatic Mixed Precision
    half_precision=False,
    loss_scaling=1.0
)

sde_optimizer = SDEAdjointOptimizer(
    model, optimizer,
    mixed_precision_config=mixed_precision_config
)

Gradient Accumulation

For large models or limited memory:

accumulation_steps = 4
optimizer.zero_grad()

for i, batch in enumerate(dataloader):
    loss = compute_loss(model, batch)
    loss = loss / accumulation_steps
    loss.backward()
    
    if (i + 1) % accumulation_steps == 0:
        optimizer.step()
        optimizer.zero_grad()

Graph-SDE Coupling

Spatio-Temporal Dynamics

Combine graph neural networks with temporal SDE dynamics:

from hpfracc.ml.graph_sde_coupling import GraphFractionalSDELayer

layer = GraphFractionalSDELayer(
    input_dim=10,
    output_dim=10,
    fractional_order=0.5,
    coupling_type="bidirectional",
    use_gated_coupling=True
)

# Apply to graph
features = torch.randn(32, 10)  # Node features
adjacency = torch.ones(32, 32)  # Graph edges
output = layer(features, adjacency)

Coupling Mechanisms

  1. Bidirectional: Information flows both ways

  2. Gated: Learnable gating controls coupling strength

  3. Attention-based: Self-attention for coupling

Multi-Scale Modeling

from hpfracc.ml.graph_sde_coupling import MultiScaleGraphSDE

model = MultiScaleGraphSDE(
    node_dim=10,
    num_scales=3,
    fractional_orders=[0.3, 0.5, 0.7]
)

Operator Splitting

For weakly coupled systems:

from hpfracc.solvers import OperatorSplittingSolver

solver = OperatorSplittingSolver(
    fractional_order=0.5,
    splitting_method="strang"  # Strang splitting
)

Uncertainty Quantification

Bayesian Neural fSDEs

from hpfracc.ml.probabilistic_sde import create_bayesian_fsde

model = create_bayesian_fsde(
    input_dim=2,
    output_dim=2,
    hidden_dim=64,
    fractional_order=0.5,
    use_guide=True
)

NumPyro Integration

import numpyro
import numpyro.distributions as dist
from numpyro.infer import SVI, Trace_ELBO

# Define guide (variational posterior)
guide = model.create_guide()

# Stochastic Variational Inference
svi = SVI(
    model.model,
    guide,
    numpyro.optim.Adam(step_size=1e-3),
    loss=Trace_ELBO()
)

# Training loop
for epoch in range(1000):
    elbo = svi.step(x0, t, observations)

Posterior Predictive Distributions

# Generate samples from posterior
predictive = numpyro.infer.Predictive(model.model, guide=guide, num_samples=1000)
samples = predictive(rng_key, x0, t)

# Compute statistics
mean = samples.mean(axis=0)
std = samples.std(axis=0)

Confidence Intervals

# 95% confidence interval
lower = np.percentile(samples, 2.5, axis=0)
upper = np.percentile(samples, 97.5, axis=0)

Performance Optimization

Intelligent Backend Selection

from hpfracc.ml.intelligent_backend_selector import IntelligentBackendSelector

selector = IntelligentBackendSelector(enable_learning=True)

# It learns optimal backend over time
backend = selector.select_backend(workload)

Memory Management

  1. Gradient Checkpointing: Trade compute for memory

  2. Mixed Precision: Use float16 for activations

  3. Sparse Gradients: Store only non-zero gradients

from hpfracc.ml.sde_adjoint_utils import SparseGradientAccumulator

accumulator = SparseGradientAccumulator(sparsity_threshold=1e-6)

for param in model.parameters():
    if param.grad is not None:
        accumulator.accumulate(param.grad)

GPU Utilization

# Move model to GPU
model = model.cuda()

# Use DataParallel for multiple GPUs
model = nn.DataParallel(model)

Batch Processing

Process multiple trajectories efficiently:

# Batch of 100 initial conditions
x0_batch = torch.randn(100, 32, 2)
trajectories = model(x0_batch, t)

Best Practices

Model Selection

  1. Start simple: Begin with additive noise

  2. Add complexity gradually: Introduce multiplicative noise if needed

  3. Validate architecture: Use validation set to tune architecture

Hyperparameter Tuning

Critical parameters:

  • Fractional order \(\alpha\): Try 0.3, 0.5, 0.7

  • Hidden dimension: 32, 64, 128

  • Number of layers: 2-4

  • Learning rate: 1e-4 to 1e-2

# Grid search example
for alpha in [0.3, 0.5, 0.7]:
    for hidden_dim in [32, 64, 128]:
        model = create_neural_fsde(
            fractional_order=alpha,
            hidden_dim=hidden_dim
        )
        # Train and evaluate

Numerical Stability

  1. Clip gradients: Prevent exploding gradients

torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
  1. Use stable activations: Tanh, Swish over ReLU for drift

  2. Check for NaNs: Monitor during training

if torch.isnan(loss):
    print("NaN detected!")

Validation Strategies

# Train-validation split
train_size = int(0.8 * len(data))
train_data, val_data = torch.utils.data.random_split(data, [train_size, len(data) - train_size])

# Early stopping
best_val_loss = float('inf')
patience = 10
no_improve = 0

for epoch in range(epochs):
    train_loss = train_one_epoch(model, train_data)
    val_loss = validate(model, val_data)
    
    if val_loss < best_val_loss:
        best_val_loss = val_loss
        no_improve = 0
        torch.save(model.state_dict(), 'best_model.pt')
    else:
        no_improve += 1
        if no_improve >= patience:
            break

Debugging Tips

  1. Visualize trajectories: Plot \(\partial_t X(t)\)

  2. Check noise magnitude: Scale diffusion appropriately

  3. Monitor memory: Use torch.cuda.memory_allocated()

  4. Profile code: Use torch.profiler for bottlenecks

# Profile example
with torch.profiler.profile(
    activities=[torch.profiler.ProfilerActivity.CPU, torch.profiler.ProfilerActivity.CUDA],
    schedule=torch.profiler.schedule(wait=1, warmup=1, active=3)
) as prof:
    output = model(x0, t)
    
prof.export_chrome_trace('trace.json')

Advanced Topics

Learnable Fractional Orders

Make \(\alpha\) a learnable parameter:

model = create_neural_fsde(..., learn_alpha=True)

# During training, alpha updates automatically
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)

# Access learned order
alpha = model.get_fractional_order()
print(f"Learned order: {alpha}")  # e.g., 0.523

Multi-Equation Systems

Different fractional orders per equation:

solution = solve_fractional_sde_system(
    drift, diffusion, x0,
    t_span=(0, 1),
    fractional_order=[0.3, 0.5, 0.7],  # Different orders
    method="euler_maruyama"
)

Custom Noise Processes

Implement your own noise:

from hpfracc.solvers import NoiseModel

class CustomNoise(NoiseModel):
    def generate_increment(self, t, dt, size, seed=None):
        # Your custom noise generation
        return custom_noise

References

Getting Help