TheStage AI Slashes Inference Time with Nebius and Blackwell

TheStage AI Slashes Inference Time with Nebius and Blackwell TheStage AI Slashes Inference Time with Nebius and Blackwell
IMAGE CREDITS: THESTAGE AI

In a world where milliseconds matter, TheStage AI is pushing the boundaries of AI inference speed. The Delaware-based startup, founded by former Huawei engineers, recently secured $4.5 million in funding and is already making waves by achieving a major milestone in diffusion model acceleration. Thanks to an early-stage collaboration with Nebius and access to NVIDIA’s new Blackwell B200 GPUs, TheStage AI has now set a new benchmark for inference performance.

While many AI companies are still scrambling to support Blackwell architecture manually, TheStage AI has taken a different route. Its automated compiler technology enables seamless optimization for next-gen hardware, giving it a serious head start in an industry that thrives on speed. In closed tests, the company’s FLUX.1 model clocked 22.5 iterations per second on the B200 chip—more than three times faster than NVIDIA’s previous H100 benchmarks on PyTorch.

This leap is more than just a speed boost—it marks a fundamental shift in how scalable, cost-effective AI infrastructure can perform in real-world settings. With support for Hugging Face integration and an expanding product roadmap that includes text-to-video generation, TheStage AI is positioning itself as a must-watch player in the generative AI ecosystem.

Fast, Smarter, More Scalable: TheStage AI’s Advantage

Unlike competitors still coding kernels by hand, TheStage AI’s compiler-first approach allows it to instantly support new GPU platforms like Blackwell. This gives developers using the platform a massive edge—models can be integrated, scaled, and deployed with minimal friction. Engineers can access pre-compiled models directly via Hugging Face, making adoption fast and hassle-free.

Real-world performance speaks for itself. The company’s FLUX.1-schnell model now renders high-resolution 1024×1024 images in just 0.3 seconds, slashing the previous best of 0.6 seconds in half. Meanwhile, the more advanced FLUX-dev model completes the same task in 1.85 seconds, outpacing rivals that average around 3.1 seconds. These gains directly translate to reduced latency and lower compute costs—two critical challenges for any business deploying large-scale AI models.

CEO Kirill Solodskih explained that early access to Nebius Cloud’s Blackwell-powered clusters played a crucial role in achieving this performance jump. With a platform built around NVIDIA’s GB200 NVL72 and HGX B200 Superchips, Nebius offered not just raw power but also the agility TheStage AI needed to test and deploy its models at speed.

Cloud Collaboration That Delivers

Nebius has become one of the first cloud platforms to bring Blackwell infrastructure to the U.S. and Europe. Its AI-native clusters are engineered for next-gen applications, offering up to 1.6× better latency and even more dramatic improvements—up to 3.5×—when paired with TheStage AI’s software stack.

The partnership isn’t just about compute power; it’s about unlocking the full potential of generative AI at scale. According to Aleksandr Patrushev, Head of ML/AI at Nebius, the collaboration with TheStage AI is a step toward enabling cloud customers to deploy faster, smarter diffusion models that can handle modern demand.

TheStage’s models are already live in Nebius’s AI Studio and available for wider self-hosting, giving developers the tools they need to build image generation pipelines that balance speed, cost, and creative quality.

Looking ahead, TheStage AI is planning to expand beyond image generation. Text-to-video generation and optimizations for large language models (LLMs) are already in the works, powered by the same flexible inference engine architecture. The startup is also in active talks with other generative AI platforms seeking to improve performance while cutting operational costs.

By prioritizing adaptability, accessibility, and acceleration, TheStage AI isn’t just keeping pace—it’s setting the pace for what AI inference can become.

Share with others

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Service

Follow us