Rising AI demand drives shift to fractional GPU model

2 days ago 6
ARTICLE AD BOX

Rising AI demand drives shift to fractional GPU model

Bengaluru: Fractional GPUs are reshaping AI economics as demand for compute rises. The growing need for AI workloads, combined with the high cost of GPU infrastructure, is driving a shift in how companies access computing power.

At the centre of this transition is the emergence of fractional GPUs—a model that allows users to rent portions of high-performance GPUs instead of entire units.Naresh Singh, senior director analyst at Gartner, said fractional GPUs are particularly relevant in emerging markets such as India. “They reduce cost barriers for GPU-as-a-service offerings and enable SMBs and startups to access advanced systems that may otherwise be too expensive or unnecessary in early stages.

They also improve utilisation through granular orchestration, helping maximise the number of tokens processed per GPU,” he said.Karan Kirpalani, chief product officer at Neysa, explained that at its core, a fractional GPU partitions a single GPU into smaller, isolated units, allowing multiple users or workloads to run concurrently. “This improves utilisation, reduces costs for smaller workloads, and enables more efficient use of high-value GPU resources,” he said.

He added that the model is especially relevant for use cases such as running smaller AI models or enabling shared access in research and education environments, where dedicating an entire GPU would be inefficient.The rise of fractional GPUs is part of a broader evolution in AI infrastructure. “We are seeing two big trends—fractional GPUs and neoclouds,” said Ray Wang, CEO of Constellation Research. “Fractional GPUs lower entry costs for inference and fine-tuning, while neoclouds provide dedicated bare-metal GPUs for compute-intensive workloads.”

These neocloud platforms also offer flexible contracts, faster provisioning, and specialised infrastructure configurations.In practice, companies are increasingly combining both models. Fractional GPUs are used to scale workloads incrementally—adding compute in smaller units—while dedicated infrastructure is reserved for large-scale training and high-performance tasks.Service providers, too, benefit from this shift. By slicing GPUs using techniques such as time-slicing or hardware-based partitioning like Nvidia’s Multi-Instance GPU (MIG), they can maximise utilisation and process more workloads per chip.Investors say fractional GPUs are not just a technical innovation but also a key enabler of more sustainable startup economics. “For frontier model development, you need powerful GPU clusters,” said Ganapathy Subramaniam, founding managing partner at Yali Capital. “But many startups are fine-tuning models rather than building from scratch, and fractional GPUs help them operate within budget.”This shift is also encouraging more efficient AI development—using compute more selectively rather than relying purely on scale.

Beneath it lies a deeper transformation in the AI infrastructure stack.While hardware providers continue to focus on building and maintaining data centres, differentiation is increasingly shifting to software platforms that abstract complexity. Aggregators such as io.net, for instance, offer developers simple APIs to access compute, removing the need to manage servers directly.“Developers don’t want a server; they want an endpoint,” said Gaurav Sharma, CEO and CTO of GPU marketplace io.net.

“They want to consume compute per workload without worrying about the underlying infrastructure.”That said, fractional GPUs are not a one-size-fits-all solution. They are best suited for smaller models—typically under 7 billion parameters—and for use cases such as inference, fine-tuning, and research workloads. Large-scale training still requires full GPUs or dedicated clusters, Kirpalani said.

Read Entire Article