Andreea Munteanu
on 2 June 2024
It’s been over two years since generative AI (GenAI) took off with the launch of ChatGPT. From that moment on, a variety of applications, models and libraries were launched to address market needs and simplify enterprise activity. As Deloitte observes in its State of Generative AI Q2 2024 report, organisations are now at a stage where they are ready to move beyond pilots and proof of concepts and start creating value – but bringing AI models to production can prove highly complex.
Canonical has collaborated with NVIDIA in the past to help enable open source AI at scale. In 2023, both Canonical Kubernetes and Charmed Kubeflow were certified as part of the NVIDIA DGX-Ready Software program. Shortly after, NVIDIA NGC containers and NVIDIA Triton Inference Server were integrated with our MLOps platform. This year brought news about Ubuntu on NVIDIA Jetson for AI at the edge and Kubernetes enablement for the NVIDIA AI Enterprise software platform.
Today, our sights are set on GenAI. This blog will explore how together we help organisations with their GenAI applications and simplify the path to production. You can develop your GenAI apps on Canonical’s MLOps platform, Charmed Kubeflow, and deploy them using NVIDIA NIM inference microservices – part of the NVIDIA AI Enterprise software platform for the development and deployment of generative AI – integrated with KServe, Kubeflow’s component for deployment.
Scale enterprise AI with Canonical and NVIDIA NIM
To simplify operations and deliver GenAI at scale, your teams need to be able to focus on building models rather than tooling. The best way to achieve this is with integrated solutions that cover the entire machine learning lifecycle. Professionals need an end-to-end solution that can be used to train models, automate ML workloads and then deploy them to edge devices. This is an iterative process that requires constant updates, enhanced monitoring and the ability to serve models anywhere. These needs are directly met by using Canonical MLOps integrated with NVIDIA NIM.
Canonical MLOps is a solution that covers the complete machine learning lifecycle, integrating leading open-source tooling such as Spark, Kafka or MLflow in a secure, portable and reliable manner. Charmed Kubeflow is the foundation of the solution. It is an MLOps platform that runs on any cloud, including hybrid or multi-cloud scenarios and any CNCF-conformant Kubernetes. KServe is one of the core components of Kubeflow, and it is used to serve models in a serverless manner. It enables different inference engines to be used, including NVIDIA Triton Inference Server and NVIDIA NIM.
NVIDIA NIM, part of NVIDIA AI Enterprise, is a set of microservices designed to reduce the time to market of machine learning models and enable organisations to run their projects in production while maintaining security and control of their GenAI applications. NVIDIA NIM delivers seamless, scalable AI inferencing, on premises or in the cloud, using industry-standard APIs. It simplifies model deployment across any cloud and streamlines the path to enterprise AI at scale, reducing the upfront engineering costs. The microservices bridge the gap between complex deployments and operational needs to maintain models in production. It is a cloud-native solution that integrates with KServe, so you can develop and deploy models using a single set of tools.
“Beyond the work that we do with NVIDIA in Ubuntu and in Canonical Kubernetes for GPU-specific integrations and optimisations, we facilitate the development and deployment of ML models as one integrated solution,” said Aniket Ponkshe, Director of Silicon Alliances at Canonical. “After the work done to certify Charmed Kubeflow and Charmed Kubenertes on NVIDIA DGX, extending it to NVIDIA NIM on the MLOps platform was a natural step for our teams to further simplify the developer journey from development to production.”
“Enterprises often struggle with the complexity of deploying generative AI models into production, facing challenges in scalability, security, and integration,” said Pat Lee, Vice President of Strategic Enterprise Partnerships at NVIDIA. “Charmed Kubeflow with NVIDIA NIM simplifies the process by providing pre-built, cloud-native microservices that streamline deployment, reduce costs, and deliver enterprise-grade performance and security.”
Accelerate AI project delivery
In its 2024 report, The AI Infrastructure Alliance asked AI/ML technology leads about their greatest concerns around deploying GenAI. The top two concerns were making mistakes due to moving too quickly, and moving too slowly due to a lack of execution ability. This offering from Canonical with NVIDIA NIM addresses both of these problems by enabling enterprises to move at speed with a repeatable, streamlined GenAI delivery path.
Canonical MLOps is built with secure open source software so that organisations can develop their models in a reliable environment. By taking advantage of Ubuntu Pro and Canonical Kubernetes in addition to the MLOps solutions, enterprises have a one-stop shop for their AI projects, with a secure, trusted operating system and upstream Kubernetes with NVIDIA integrations to accelerate their AI journey from concept to deployment. No matter what requirements and internal skill sets they have, organisations can benefit from enterprise support, managed services and even training from Canonical experts.
Get started with Charmed Kubeflow and NVIDIA NIM
Getting started with the solution is easy. You can deploy Charmed Kubeflow in any environment. Then, you can access NVIDIA NIM microservices from the NVIDIA API catalogue after applying for NIM access. After that, it just takes a few actions at the Kubernetes layer to create a NIM runtime, create a PVC, instantiate KServe’s Inference service and validate the NIM running on KServe. You can read more about it here and follow up the NVIDIA NIM on Charmed Kubeflow tutorial.