Kubeflow VMware Distribution

Kubeflow VMware Distribution

VMware Private AI helps you realize value from your Generative AI initiatives faster while maintaining the privacy and control your expect for the sensitive data, whether it resides in a data center, public cloud, or at the edge. 

VMware makes it easy to adopt Private AI with the following two solutions:

  • The first is a partnership with NVIDIA to launch VMware Private AI Foundation, which extends the two companies' strategic collaboration to prepare enterprises running VMware Cloud infrastructure for the ability to adapt to next-generation Generative AI capabilities.
  • The second is the launch of the VMware Private AI open source reference architecture, which helps customers achieve their desired AI outcomes by supporting top open source technologies like Kubeflow, a popular MLOps Toolkit for Kubernetes, and PyTorch, the leading framework for Generative AI. The architecture provides flexibility to use these tools alongside commercial solutions from partners such as Anyscale, cnvrg.io, Domino Data Lab, NVIDIA, One Convergence, Run:ai, and Weights & Biases.


Many VMware customers have invested in VMware vSphere® for critical applications and now want to consolidate Artificial Intelligence (AI)/Machine Learning (ML) workloads on the same platform. However, operationalizing AI/ML at scale poses challenges around resources, security, and workflow orchestration.

The Kubeflow VMware Distribution addresses these challenges. It provides an optimized Kubeflow configuration leveraging VMware's proven infrastructure stack. This simplifies deploying and managing Kubeflow on VMware infrastructure at scale, securely. In turn, customers can operationalize AI/ML efficiently while building on their trusted virtualization foundation.

Seamless Integration with VMware Infrastructure

Kubeflow VMware Distribution tightly integrates with VMware's enterprise infrastructure stack including networking, storage, security, and management. This integration allows customers to build on their existing VMware investments to deploy Kubeflow faster and manage it more easily.

Leveraging mature vSphere capabilities reduces costs for customers by eliminating the need to adopt new infrastructure just for AI/ML. Tight integration also speeds up ML adoption by making it easier to leverage vSphere's enterprise-grade qualities like high availability, access controls, and multi-tenancy.

Specifically, the advanced networking, scalable storage, and security features of VMware NSX®, VMware vSAN™, and vSphere harden and optimize the Kubeflow environment. VMware vCenter® and VMware Tanzu® mission control simplify consistent operations and lifecycle management across Kubeflow and traditional workloads.

By deploying Kubeflow on the proven vSphere infrastructure, customers benefit from performance at scale, robust security, and simplified management right out of the box. This accelerates time-to-value for AI initiatives.

Simplified Deployment and Management

Kubeflow VMware Distribution simplifies deployment, configuration, and lifecycle management. The distribution bundles the core Kubeflow components and integrates them natively with VMware Tanzu Kubernetes Grid™ using Carvel packaging. This pre-integrated approach allows customers to get started with Kubeflow faster without needing to assemble and integrate the infrastructure components themselves.

With Kubeflow VMware Distribution, the complete stack is delivered as a tested and easy to install package. Customers can rapidly deploy Kubeflow on top of Tanzu Kubernetes clusters on vSphere without concerning over the underlying infrastructure. The integrated distribution handles configuring Kubernetes, Istio, pipelines, and the full ML stack tailored for vSphere environments. This turnkey experience enables enterprises to be up and running quickly with a production-grade Kubeflow deployment on familiar vSphere foundations.

Production-ready Platform

The Kubeflow on vSphere distribution is optimized as a secure and scalable platform for production AI/ML deployments. Key capabilities include:

  • Unified identity and access management powered by pinniped authentication integrated with vSphere for simplified user management across Kubeflow and infrastructure.
  • Custom VMware-hosted GPU notebook images tuned for accelerated frameworks like RAPIDS, NVIDIA CUDA, and Intel oneAPI.
  • Dynamic GPU autoscaling to efficiently manage ML training resources via integration with Tanzu Kubernetes Grid.
  • Integrated monitoring stack consisting of Prometheus for metrics collection and Grafana for visualization, alerting, and analytics.
  • Role-based access control, logging, and security hardening to control access, maintain audit trails, and help meet compliance requirements.

These enterprise-grade features and production-ready capabilities make Kubeflow on vSphere the ideal platform for accelerating AI adoption in mission-critical and scalable ML workloads.

The Kubeflow VMware Distribution includes most of the key components for achieving the highest level of MLOps maturity.

Large Language Models (LLMs) Support

Kubeflow VMware distribution supports different types of ML workloads, including natural language processing (NLP), image classification, video recognition, and more. LLMs have become a focus for enterprises developing artificial intelligence and ML applications. However, many enterprises face challenges in effectively applying large models at scale due to their resource intensity, governance needs, and integration complexity.

Kubeflow VMware distribution now enables inference and fine-tuning off popular LLMs like Meta LLaMA 2 and BigScience Bloom models directly on the platform, Additional models are currently being validated for support. 

By combining vSphere's ML capabilities with leading LLMs, organizations can rapidly build intelligent applications across many use cases. Leveraging this integrated platform allows enterprises to more easily develop solutions such as knowledge bases, conversational chatbots, and enhanced customer service experiences without the effort of assembling discrete components.

The solution streamlines deploying enterprise-grade intelligence powered by state-of-the-art language models. By relying on this turnkey platform, companies can focus on driving business value from AI rather than just managing underlying infrastructure and workflows.

Open Source Community

Kubeflow VMware Distribution is built on open source technologies, including Kubeflow and Ray. This allows customers to take advantage of the large and active open source community that supports these technologies.


Use software you already know to run AI/ML workload.

Kubeflow VMware Distribution provides a distribution of Kubeflow optimized for enterprise-grade deployment on VMware Infrastructure leveraging advanced networking, security, scalability and life cycle management capability of VMware Tanzu Kubernetes Grid.

With Kubeflow VMware Distribution, enterprises can accelerate their AI/ML initiatives by building on top of existing VMware infrastructure investments. The distribution incorporates MLOps capabilities for managing ML workflows at scale—from development to deployment, monitoring, and governance.

Filter Tags

AI/ML Hardware Acceleration GPU Document