Breakdown the BMC: Felafax

Unleashing the X-Factor in AI Infrastructure Optimization

and

Oct 14, 2024

∙ Paid

In today’s rapidly evolving AI landscape, enterprises are increasingly looking for AI-driven solutions to enhance model performance, reduce costs, and find operational efficiencies. Felafax, a standout startup from the YC S24 cohort, is emerging as a leader in AI infrastructure optimization. Co-founded by Nikhil Sonti (CEO) and Nithin Sonti (CTO), Felafax focuses on streamlining the deployment and scalability of large language models (LLMs) across a variety of non-NVIDIA GPUs, offering cost-effective hardware alternatives that are often overlooked.

Nikhil and Nithin bring a wealth of industry expertise from top tech companies, including Meta, Microsoft, Google, and Nvidia. Nikhil, with over six years at Meta, honed his skills in ML inference infrastructure, optimizing performance for Facebook's Feed. His work focused on boosting efficiency and throughput at scale. Nithin, having spent over five years at Google and Nvidia, specialized in large-scale ML training infrastructure. His contributions were pivotal in building the training platform for YouTube's recommender models and fine-tuning Gemini for YouTube’s AI systems.

Together, the Sonti brothers have built Felafax to address a critical pain point: the challenge of managing large-scale infrastructure for AI workloads, particularly in the context of training and deploying ever-growing models like LLMs. As models like llama 3.1, with its 405 billion parameters, continue to push the boundaries of AI, traditional single-GPU clusters struggle to keep up. This led Felafax to innovate around partitioning models across multiple GPU clusters and efficiently managing distributed checkpoints.

Felafax’s mission is to empower enterprises by making AI accessible across a broader range of hardware ecosystems. Their solutions enable companies to leverage the power of AI without being tied to a single hardware provider, making non-NVIDIA options like AMD and Google TPUs more viable and effective for AI workloads.

Felafax’s Vision: AI Beyond NVIDIA

Currently, over 90% of large language model (LLM) workloads rely on NVIDIA GPUs, creating both scalability and cost-effectiveness challenges. As Nithin Sonti, Felafax’s CTO, noted, "AI models are getting bigger and better, and we need to innovate on how we partition these models across clusters of GPUs." Felafax is positioning itself to reduce this dependency by supporting other hardware providers, such as AMD and Google’s TPUs.

A recent milestone demonstrated Felafax’s commitment to this vision: they successfully trained the LLaMA3 model, with 405 billion parameters, on AMD’s MI300x GPUs. This achievement, featured on Hacker News, garnered over 500 upvotes in the AI community, underscoring Felafax’s capability to challenge the dominance of NVIDIA hardware. The milestone showcases that cost-effective AI training on alternative hardware platforms is not just feasible, but attainable for companies willing to explore non-traditional GPU options.

Felafax’s platform, built on a JAX-based infrastructure, is ideal for multi-cloud and multi-GPU setups, which Nithin emphasizes has made their system twice as cost-efficient as competitors. “JAX works exceptionally well for us,” Nithin explained. “It allows us to optimize models effectively, especially when working with non-NVIDIA GPUs, which is central to what we’re trying to accomplish.” While JAX may seem unconventional compared to more popular frameworks like PyTorch, it gives Felafax the flexibility and performance edge they need to compete in a market dominated by NVIDIA.

JAX excels at handling non-NVIDIA hardware, particularly AMD, due to its hardware-agnostic architecture, which includes:

XLA Compiler: JAX uses XLA (Accelerated Linear Algebra) to compile computations into a hardware-independent format, allowing efficient execution across various backends, including AMD GPUs.
Platform-independent Optimizations: XLA provides performance enhancements across all supported hardware platforms, ensuring consistent results.
Easy Portability: JAX enables seamless transitions between NVIDIA and AMD hardware with minimal code changes, unlike PyTorch, which is more tightly coupled with NVIDIA’s CUDA ecosystem. Porting PyTorch code to AMD hardware often requires more effort due to its CUDA-specific implementations.

Felafax’s platform offers a comprehensive set of features designed to optimize AI infrastructure while minimizing operational complexity:

Effortless Scaling: Felafax supports one-click cluster spin-ups, from 8 to 1024 TPU chips, allowing enterprises to manage machine learning workloads at scale with smooth orchestration.
Cost-efficient Performance: Their custom-built training platform, based on JAX and XLA compilers, delivers performance equivalent to NVIDIA’s H100 GPUs while reducing costs by 30%.
On-Premise Deployment: Felafax prioritizes data privacy by deploying within a customer’s Virtual Private Cloud (VPC), ensuring data remains securely within the user’s network.
Highly Customizable: With a no-code interface for model fine-tuning and an option to drop into Jupyter notebooks for more advanced configurations, Felafax offers users full control without compromising ease of use.
Complete ML Ops Handling: Felafax handles model partitioning for large models like LLaMA 3.1 (405B), multi-controller training, and inference, allowing users to focus on innovation rather than infrastructure.
Pre-configured Templates: Felafax provides out-of-the-box environments with all necessary dependencies, supporting both PyTorch XLA and JAX frameworks, enabling rapid deployment of models.

With these features, Felafax is emerging as a powerful and versatile platform for enterprises seeking to optimize their AI infrastructure while reducing costs and simplifying operations.

Keep reading with a 7-day free trial

Subscribe to AI with Aish to keep reading this post and get 7 days of free access to the full post archives.

A guest post by

Harini Anand