Benchmark ========= Performance comparison of all available backends on the `Memorize `_ EEG dataset (71 channels × 319 500 samples, 100 EM iterations, 1 model, ``use_min_dll=False``, ``use_grad_norm=False``). Results ------- Tested on: * **CPU** - AMD Ryzen 9 7950X (16 physical cores, 62 GB RAM) * **GPU** - NVIDIA GeForce RTX 5070 Ti (15.5 GB VRAM) **Timing** (milliseconds per iteration phase) .. csv-table:: :file: _benchmark_timing.csv :header-rows: 1 :widths: 20, 12, 16, 14, 14, 14 **Fit quality and speedup** (relative to Fortran baseline) .. csv-table:: :file: _benchmark_speedup.csv :header-rows: 1 :widths: 20, 12, 20, 16, 16 Notes ----- * **iter 1**: wall time of the very first EM iteration. For compiled runs this includes ``torch.compile`` tracing overhead (graph capture + kernel compilation), which is a one-time cost amortised over all subsequent iterations. * **grad ms**: mean of iterations 2–50 (gradient phase, iter 1 excluded). * **newt ms**: mean of iterations 51–100 (Newton phase). The relative cost of Newton vs gradient steps varies by backend: Fortran and uncompiled CPU are slower in the Newton phase, while compiled backends seem to be faster (more effective kernel fusion on the Newton correction). * **total (ms)**: sum of all iteration times including iter 1. * **spd/iter**: speedup vs Fortran based on Newton-phase mean ms/iter. Because compiled backends are faster in the Newton phase than the gradient phase, this speedup improves with longer runs: typical AMICA fits use 200–2000 iterations, where nearly all compute is Newton-phase, so ``spd/iter`` is the more representative metric for real-world use. * **spd/total**: speedup vs Fortran based on total time (iter 1 included); reflects the real-world cost of ``torch.compile`` tracing overhead. > 1× means faster than Fortran, < 1× means slower. * The Fortran binary writes per-iteration timings to ``amicaout/out.txt``; per-phase means are derived from those values. * LL values are comparable across backends: same dataset, same number of iterations, same algorithm settings. Running the Benchmark --------------------- .. code-block:: bash python benchmarks/benchmark.py python benchmarks/benchmark.py --output /tmp/my_results.csv The script auto-detects all available backends (Fortran, CPU, CUDA, MPS) and saves results to ``benchmarks/results.csv`` by default. The Fortran baseline requires ``data/amica15ub`` and ``data/Memorize.fdt`` to be present.