This report benchmarks GPU options for deploying Scope’s real-time video diffusion inference pipelines, focusing on performance (FPS), VRAM fit / OOM risk, and cost-efficiency across multiple resolutions (320×576 up to 768×1344) and four pipelines: reward-forcing, longlive, streamdiffusionv2, and krea-realtime-video

RTX 5090 vs H100 SXM vs H200 SXM
Key takeaways
- H200 SXM is the fastest in every case where it successfully runs, typically delivering ~5–15% higher FPS than H100 SXM.
- H100 SXM is the best default choice for production: it is close to H200 performance, has much more VRAM headroom than RTX 5090, and avoids many OOM failures at higher resolutions.
- RTX 5090 is the best value when it fits in 32GB VRAM, offering strong throughput per dollar at low-to-mid resolutions, but it hits OOM earlier and fails on more memory-heavy cases (notably krea-realtime-video).
- Provider choice matters at low resolutions (when GPUs are underutilized). In several low-res tests, RTX 5090 on TensorDock was ~10–30% faster than RunPod, while results converge at higher resolutions where GPUs saturate.
- Across this benchmark set, reward-forcing is the highest-FPS pipeline overall.
Practical guidance (what to pick)
- Choose RTX 5090 if you are primarily cost-sensitive and your target pipeline + resolution reliably fits in 32GB.
- Choose H100 SXM if you need a single option that works across most resolutions and pipelines with strong performance and fewer OOM surprises.
- Choose H200 SXM if you need maximum throughput / lowest latency and are willing to pay a premium for a modest speed uplift over H100.
Full report (benchmark tables, raw logs, and charts): Scope GPU Benchmark Report