Galem KAYO
on 9 October 2019
Designing an open source machine learning platform for autonomous vehicles
Self-driving cars are one of the most notable technology breakthroughs of recent years. The progress that has been made from the DARPA challenges in the early 2000s to Waymo’s commercial tests is astounding. Despite this rapid progress, much still needs to be done to reach full autonomy without humans in the loop – an objective also referred to as SAE Level 5. Infrastructure is one of the gaps that need to be bridged to achieve full autonomy.
Embedding the full compute power needed to fully automatise vehicles may prove challenging. On the other hand, relying on the cloud at scale would pose latency and bandwidth issues. Therefore, vehicle autonomy is a case for edge computing. But, how to distribute and orchestrate AI workloads, data storage, and networking at the edge for such a safety-critical application? We propose an open-source architecture that will address these questions.
Challenges in the field
Embedding compute into vehicles
Cars are increasingly becoming computers on wheels, and as such, they will need to be powered by an embedded operating system. The need for advanced security is trivial for automotive applications. Due to the complexity of full autonomy, the optimum OS will need to be open-source, so as to leverage technical contributions from a broad range of organisations. Therefore, key characteristics that such an OS will need to have are security and openness. Considering these requirements, Ubuntu Core is a choice operating system candidate for vehicles of the future.
Deep learning services at the edge
Embedding computers on board will certainly make vehicles smarter. But how smart does an embedded computer needs to be to make decisions in real-time in an environment as complex as real-world traffic? The answer is extremely smart, much more so than any embedded mobile computer currently is capable of. Vehicles will need to map their dynamic environment in real-time and at high speed. Obstacle avoidance and path planning decisions will need to be taken every millisecond. It would take more hardware capabilities than are currently practical to embed in every single vehicle to tackle these challenges. Therefore, distributing AI compute workloads between embedded computers, edge gateways, and local data centers would be a more promising approach.
If, for instance, environment mapping workloads are run by the vehicle’s embedded computer, motion planning workloads are better executed at the edge. This means that cars would continuously send localisation data they collect to edge nodes installed in the successive areas they are passing by. Edge nodes would be context-aware since they would store information specific to the area they are located in. Strong with context-specific information and aggregated data collected from passing cars, edge nodes would be much more efficient at optimising motion planning than vehicle embedded computers.
Global optimisation and model training in the cloud
Some tasks that are crucial for autonomous driving are performed in the most optimal way in a central core. Path planning, for instance, if performed solely at the vehicle level would lack information pertaining to the overall state of traffic. However, if a central core supports vehicular path planning workloads, it could leverage traffic data aggregated over several areas. Global traffic information would then be extracted from this data and fed back to individual vehicles for better coordinated planning.
Since mapping, avoidance and planning decisions are made based on machine learning models, continuous training of these models is required in order to achieve near-perfect accuracy. Central clouds are best equipped to support model training tasks. The deployment of improved models would then be orchestrated from the core cloud to edge nodes, and finally vehicle embedded computers. Furthermore, central cores could be resorted to for transfer learning. The efficiency and accuracy of the machine learning models could be drastically improved by storing knowledge gained from a traffic situation at one location and applying it in a similar situation at another location.
A machine learning toolkit for automotive
Introducing Kubeflow
In order to implement an open-source machine learning platform for autonomous vehicles, data scientists can use Kubeflow: the machine learning toolkit for Kubernetes. The Kubeflow project is dedicated to making deployments of machine learning workflows simple, portable and scalable. It consists of various open-source projects which can be integrated to work together. This includes Jupyter notebooks and the TensorFlow ecosystem. However, since the Kubeflow project is growing very fast, its support is soon going to expand over other open-source projects, such as PyTorch, MXNet, Chainer, and more.
Kubeflow allows data scientists to utilize all base machine learning algorithms. This includes regression algorithms, pattern recognition algorithms, clustering and decision making algorithms. With Kubeflow data scientists can easily implement tasks which are essential for autonomous vehicles. These tasks include object detection, identification, recognition, classification, and localisation.
Getting Kubeflow up and running
As Kubeflow works on top of K8s, the Kubernetes cluster has to be deployed first. This may be challenging, however, as gateways are tiny devices with limited resources. MicroK8s is a seamless solution to this problem. Designed for appliances and IoT, MicroK8s enables the implementation of Kubernetes at the network edge. For experimenting and testing purposes, you can get MicroK8s up and running on your laptop in 60 seconds by executing the following command:
sudo snap install microk8s –classic
This assumes you have snapd installed. Otherwise, refer to snapd documentation first. You can then follow this tutorial to install Kubeflow on top of MicroK8s.
However, as some operations are performed in the network core, data scientists have to be able to use Kubeflow in the core too. Although, MicroK8s could be used again, for such data center environments Charmed Kubernetes is a better option. Designed for large-scale deployments, Charmed Kubernetes is a flexible and scalable solution. By using charms for Kubernetes deployment in the core, data scientists can benefit from full automation, model-driven approach and simplified operations.
Conclusions and next steps
An open-source machine learning platform for autonomous vehicles can be designed based on Kubeflow, running on the Kubernetes cluster. While MicroK8s are perfectly suitable for the edge, Charmed Kubernetes fits better for the network core.
To learn more about Kubeflow, visit its official website.
Found an issue with MicroK8s? Report it here.