Fractional Autograd Guide
This guide provides a comprehensive overview of the fractional autograd framework in HPFRACC, including spectral domain computation, stochastic memory sampling, probabilistic fractional orders, and GPU optimization.
Table of Contents
Overview
The fractional autograd framework extends PyTorch’s automatic differentiation to fractional calculus operations. It provides three main approaches:
Spectral Domain Computation: Efficient fractional derivatives using FFT, Mellin, and Laplacian transforms
Stochastic Memory Sampling: Approximate fractional operators by sampling from memory history
Probabilistic Fractional Orders: Treat fractional orders as random variables for uncertainty quantification
Spectral Autograd Engines (Unified by Default)
Basic Usage (Unified)
import torch
from hpfracc.ml.spectral_autograd import FractionalAutogradLayer, SpectralFractionalNetwork
# Create a spectral fractional layer (engine default: FFT)
layer = FractionalAutogradLayer(
engine_type="fft", # or "mellin", "laplacian"
alpha=0.5,
method="RL" # Riemann-Liouville
)
# Forward pass
x = torch.randn(32, 128, requires_grad=True)
output = layer(x)
# Backward pass (automatic)
loss = output.sum()
loss.backward()
print(f"Input gradient shape: {x.grad.shape}")
### Unified Network Helper
```python
# Unified by default: specify dims
net = SpectralFractionalNetwork(
input_dim=128, hidden_dims=[256, 256], output_dim=10,
alpha=0.5, mode="unified"
)
y = net(x)
Model-Specific (Legacy) Mode
# Opt-in to legacy/coverage-style args
net_legacy = SpectralFractionalNetwork(
input_size=128, hidden_sizes=[64, 64], output_size=10,
alpha=0.5, mode="model"
)
Backends
Supported backends:
pytorch(default),jax,numba.Fallbacks: if a backend is unavailable, CPU-safe paths are used.
FFT fallbacks use NumPy FFTs isolated from PyTorch FFT to avoid mock leakage.
### Engine Types
#### FFT Engine
- **Best for**: Large sequences, periodic functions
- **Method**: Frequency domain multiplication
- **Performance**: O(N log N) complexity
```python
from hpfracc.ml.spectral_autograd import FFTEngine
engine = FFTEngine()
result = engine.forward(x, alpha=0.5)
Mellin Engine
Best for: Power-law functions, scale-invariant problems
Method: Mellin transform domain
Performance: O(N log N) complexity
from hpfracc.ml.spectral_autograd import MellinEngine
engine = MellinEngine()
result = engine.forward(x, alpha=0.5)
Laplacian Engine
Best for: Spatial problems, diffusion equations
Method: Fractional Laplacian in frequency domain
Performance: O(N log N) complexity
from hpfracc.ml.spectral_autograd import LaplacianEngine
engine = LaplacianEngine()
result = engine.forward(x, alpha=0.5)
Stochastic Memory Sampling
Basic Usage
from hpfracc.ml.stochastic_memory_sampling import StochasticFractionalLayer
# Create stochastic fractional layer
layer = StochasticFractionalLayer(
alpha=0.5,
k=32, # Number of samples
method="importance" # or "stratified", "control_variate"
)
# Forward pass
x = torch.randn(32, 128)
output = layer(x)
Sampling Methods
Importance Sampling
Best for: General purpose, power-law distributions
Variance: Moderate
Computational cost: O(K)
from hpfracc.ml.stochastic_memory_sampling import ImportanceSampler
sampler = ImportanceSampler(alpha=0.5, k=32)
indices, weights = sampler.sample_indices(n=128, k=32)
estimate = sampler.estimate_derivative(x, indices, weights)
Stratified Sampling
Best for: When recent history is important
Variance: Low for recent-heavy distributions
Computational cost: O(K)
from hpfracc.ml.stochastic_memory_sampling import StratifiedSampler
sampler = StratifiedSampler(alpha=0.5, k=32, recent_window=16)
indices, weights = sampler.sample_indices(n=128, k=32)
estimate = sampler.estimate_derivative(x, indices, weights)
Control Variate Sampling
Best for: When baseline estimates are available
Variance: Lowest (with good baseline)
Computational cost: O(K + baseline)
from hpfracc.ml.stochastic_memory_sampling import ControlVariateSampler
sampler = ControlVariateSampler(alpha=0.5, k=32, baseline_window=8)
indices, weights = sampler.sample_indices(n=128, k=32)
estimate = sampler.estimate_derivative(x, indices, weights)
Choosing K (Number of Samples)
The choice of K affects both accuracy and computational cost:
K = 8-16: Fast, suitable for real-time applications
K = 32-64: Balanced accuracy and speed
K = 128-256: High accuracy, slower computation
# Benchmark different K values
from hpfracc.ml.stochastic_memory_sampling import benchmark_k_values
results = benchmark_k_values(
sequence_length=128,
alpha=0.5,
k_values=[8, 16, 32, 64, 128]
)
Probabilistic Fractional Orders
Basic Usage
from hpfracc.ml.probabilistic_fractional_orders import create_normal_alpha_layer
# Create probabilistic fractional layer
layer = create_normal_alpha_layer(
mean=0.5,
std=0.1,
learnable=True
)
# Forward pass
x = torch.randn(32, 128)
output = layer(x)
Distribution Types
Normal Distribution
Best for: Continuous fractional orders
Parameters: mean, std
Uncertainty: Gaussian
from hpfracc.ml.probabilistic_fractional_orders import create_normal_alpha_layer
layer = create_normal_alpha_layer(mean=0.5, std=0.1, learnable=True)
Uniform Distribution
Best for: Bounded fractional orders
Parameters: low, high
Uncertainty: Uniform
from hpfracc.ml.probabilistic_fractional_orders import create_uniform_alpha_layer
layer = create_uniform_alpha_layer(low=0.1, high=0.9, learnable=True)
Beta Distribution
Best for: Fractional orders in [0, 1]
Parameters: concentration1, concentration0
Uncertainty: Beta-shaped
from hpfracc.ml.probabilistic_fractional_orders import create_beta_alpha_layer
layer = create_beta_alpha_layer(concentration1=2.0, concentration0=2.0, learnable=True)
Gradient Estimation Methods
Reparameterization Trick
Best for: Continuous distributions
Variance: Low
Requirements: Reparameterizable distribution
from hpfracc.ml.probabilistic_fractional_orders import ReparameterizedFractionalDerivative
# Use in custom autograd function
result = ReparameterizedFractionalDerivative.apply(x, alpha_dist, epsilon, method, k)
Score Function Estimator
Best for: Any distribution
Variance: Higher
Requirements: None
from hpfracc.ml.probabilistic_fractional_orders import ScoreFunctionFractionalDerivative
# Use in custom autograd function
result = ScoreFunctionFractionalDerivative.apply(x, alpha_dist, method, k)
Variance-Aware Training
Basic Usage
from hpfracc.ml.variance_aware_training import create_variance_aware_trainer
# Create variance-aware trainer
trainer = create_variance_aware_trainer(
model=model,
optimizer=optimizer,
loss_fn=loss_fn,
base_seed=42,
variance_threshold=0.1,
log_interval=10
)
# Train with variance monitoring
results = trainer.train(dataloader, num_epochs=100)
Variance Monitoring
from hpfracc.ml.variance_aware_training import VarianceMonitor
monitor = VarianceMonitor(window_size=100)
# Monitor variance during training
for batch in dataloader:
output = model(batch)
monitor.update("output", output)
# Check variance summary
summary = monitor.get_summary()
for name, metrics in summary.items():
if metrics['cv'] > 0.5: # High variance
print(f"High variance in {name}: CV={metrics['cv']:.3f}")
Adaptive Sampling
from hpfracc.ml.variance_aware_training import AdaptiveSamplingManager
sampling_manager = AdaptiveSamplingManager(
initial_k=32,
min_k=8,
max_k=128,
variance_threshold=0.1
)
# Adjust K based on variance
new_k = sampling_manager.update_k(variance=0.2, current_k=32)
print(f"Adjusted K: {new_k}")
GPU Optimization
Automatic Mixed Precision (AMP)
from hpfracc.ml.gpu_optimization import GPUOptimizedSpectralEngine
# Create GPU-optimized engine
engine = GPUOptimizedSpectralEngine(
engine_type="fft",
use_amp=True,
dtype=torch.float16
)
# Forward pass with AMP
x = torch.randn(32, 1024, device='cuda')
result = engine.forward(x, alpha=0.5)
Chunked FFT for Large Sequences
from hpfracc.ml.gpu_optimization import ChunkedFFT
# Create chunked FFT processor
chunked_fft = ChunkedFFT(chunk_size=1024, overlap=128)
# Process large sequences
x = torch.randn(32, 8192, device='cuda')
x_fft = chunked_fft.fft_chunked(x)
x_reconstructed = chunked_fft.ifft_chunked(x_fft)
Performance Profiling
from hpfracc.ml.gpu_optimization import GPUProfiler
profiler = GPUProfiler(device="cuda")
# Profile operations
profiler.start_timer("fractional_derivative")
result = fractional_derivative(x, alpha=0.5)
profiler.end_timer(x, result)
# Get performance summary
summary = profiler.get_summary()
print(f"Execution time: {summary['fractional_derivative']['execution_time']:.4f}s")
Complete Examples
End-to-End Training
import torch
import torch.nn as nn
from hpfracc.ml.spectral_autograd import FractionalAutogradLayer
from hpfracc.ml.stochastic_memory_sampling import StochasticFractionalLayer
from hpfracc.ml.probabilistic_fractional_orders import create_normal_alpha_layer
from hpfracc.ml.variance_aware_training import create_variance_aware_trainer
class FractionalNeuralNetwork(nn.Module):
def __init__(self, input_size=128, hidden_size=64, output_size=1):
super().__init__()
# Standard layers
self.linear1 = nn.Linear(input_size, hidden_size)
self.linear2 = nn.Linear(hidden_size + 3, output_size) # +3 for fractional outputs
# Fractional layers
self.spectral_layer = FractionalAutogradLayer(engine_type="fft", alpha=0.5)
self.stochastic_layer = StochasticFractionalLayer(alpha=0.5, k=32, method="importance")
self.probabilistic_layer = create_normal_alpha_layer(0.5, 0.1, learnable=True)
def forward(self, x):
# Standard forward pass
x = torch.relu(self.linear1(x))
# Apply fractional layers
spectral_out = self.spectral_layer(x)
stochastic_out = self.stochastic_layer(x)
probabilistic_out = self.probabilistic_layer(x)
# Handle different output shapes
if spectral_out.dim() == 2:
spectral_out = spectral_out.mean(dim=-1, keepdim=True)
if stochastic_out.dim() == 0:
stochastic_out = stochastic_out.unsqueeze(0).unsqueeze(-1)
if probabilistic_out.dim() == 0:
probabilistic_out = probabilistic_out.unsqueeze(0).unsqueeze(-1)
# Combine features
x_combined = torch.cat([x, spectral_out, stochastic_out, probabilistic_out], dim=1)
# Final output
output = self.linear2(x_combined)
return output
# Create model and trainer
model = FractionalNeuralNetwork()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
loss_fn = nn.MSELoss()
trainer = create_variance_aware_trainer(
model=model,
optimizer=optimizer,
loss_fn=loss_fn,
base_seed=42,
variance_threshold=0.1
)
# Train the model
results = trainer.train(dataloader, num_epochs=100)
Benchmarking Performance
from hpfracc.ml.gpu_optimization import benchmark_gpu_optimization
# Benchmark different configurations
results = benchmark_gpu_optimization()
# Print results
for length, configs in results.items():
print(f"Sequence length {length}:")
for config, alphas in configs.items():
avg_time = np.mean(list(alphas.values()))
print(f" {config}: {avg_time:.4f}s average")
Best Practices
1. Choosing the Right Approach
Spectral engines: Use for large sequences (>1000 points) or when high accuracy is needed
Stochastic sampling: Use for real-time applications or when memory is limited
Probabilistic orders: Use when uncertainty quantification is important
2. Performance Optimization
GPU acceleration: Always use GPU for large computations
AMP: Enable for 2x speedup with minimal accuracy loss
Chunked FFT: Use for sequences >4096 points
Batch processing: Process multiple sequences together
3. Variance Control
Monitor variance: Use variance-aware training for stochastic methods
Adaptive sampling: Adjust K based on variance levels
Seed management: Use reproducible seeds for debugging
4. Memory Management
Chunked processing: Use for large sequences
Gradient checkpointing: Enable for memory-constrained training
Mixed precision: Use AMP to reduce memory usage
5. Debugging Tips
Start simple: Begin with spectral engines before adding stochastic/probabilistic components
Check gradients: Verify gradient flow with
torch.autograd.gradcheckMonitor variance: Watch for high variance in stochastic methods
Profile performance: Use GPU profiler to identify bottlenecks
Troubleshooting
Common Issues
High variance in stochastic sampling
Increase K (number of samples)
Use stratified or control variate sampling
Check if alpha is too extreme
Memory issues with large sequences
Use chunked FFT
Enable gradient checkpointing
Reduce batch size
Slow performance
Enable GPU acceleration
Use AMP (Automatic Mixed Precision)
Profile to identify bottlenecks
Gradient issues
Check if tensors require gradients
Verify autograd function implementation
Use
torch.autograd.gradcheck
Getting Help
Check the examples in
examples/directoryRun the test scripts to verify functionality
Use the benchmark scripts to measure performance
Consult the API documentation for detailed parameter descriptions
Conclusion
The fractional autograd framework provides powerful tools for incorporating fractional calculus into neural networks. By combining spectral domain computation, stochastic memory sampling, and probabilistic fractional orders, you can build sophisticated models that capture the non-local and memory-dependent nature of fractional systems.
Start with the basic examples and gradually incorporate more advanced features as needed. The framework is designed to be modular, so you can mix and match different approaches based on your specific requirements.