Double Orchestration

Kubernetes has revolutionized container orchestration, but deploying it at the edge or for real-time, low-latency workloads presents unique challenges. One often overlooked hurdle is the "double orchestration" problem introduced when running Kubernetes on top of traditional Type-1 hypervisors. Let's unpack this issue and why it matters.

Understanding Double Orchestration

Two Maestros, One Orchestra: Imagine Kubernetes as a conductor managing your application workloads (the orchestra). A Type-1 hypervisor (like VMware ESXi or KVM) is yet another conductor managing the virtual machines on which Kubernetes runs. Resource allocation and task scheduling now have two layers of decision-making. The hypervisor might not grasp the real-time needs of Kubernetes workloads, and Kubernetes may not be fully aware of the underlying hardware limitations. This can lead to resource conflicts, delays, and unpredictable performance – disastrous for edge and real-time applications.

 
 

The Problem of Double Orchestration

Resource Contention: In a setup with a Type-1 hypervisor and Kubernetes, both have their own schedulers and resource management systems. This can lead to scenarios where the hypervisor might allocate resources to a VM that Kubernetes deems low-priority, leading to performance degradation for critical workloads within that VM. Kubernetes might not be aware of the underlying hardware limitations and over-provision of resources within VMs, leading to the hypervisor struggling to meet these demands.

Scheduling Conflicts: The timing and prioritization of tasks can clash. The hypervisor's scheduler might not understand the real-time requirements of a task within a VM. Conversely, Kubernetes might not have visibility into hardware-level constraints that require the hypervisor to preempt certain VM activities.

Increased Complexity: Handling the interactions between two complex orchestration systems adds an entire layer of management overhead. Troubleshooting performance issues becomes more difficult as you have to trace potential problems through both the hypervisor and Kubernetes layers.

Why It Matters for Edge and Real-Time Workloads

Application Challenges Edge Computing

  • Edge devices often have limited resources, making any resource contention due to double orchestration more severe.

  • Edge workloads might involve interacting with physical devices or sensors, demanding strict real-time.

Real-Time, Low-Latency Applications

  • Applications like self-driving cars, industrial automation, or high-frequency trading rely on millisecond-level precision. The overhead and potential conflicts within double orchestration can introduce unacceptable delays.

  • Ensuring millisecond-level precision, overhead, and potential conflicts, determinism



What is Metalvisor?

Metalvisor, a revolutionary hypervisor, operates seamlessly within a system's firmware or hardware. Its unique design eliminates the need for a traditional operating system and orchestrators, enabling virtual machines (VMs) to execute directly on the silicon. Metalvisor prioritizes determinism, striving for predictable performance and response times through its minimalist design and tight hardware integration. Additionally, it offers Quality of Service (QoS) capabilities, ensuring that critical workloads receive the necessary resources and performance levels, even under demanding load conditions.

TypeZero Hypervisor: Metalvisor is a hypervisor that embeds itself directly into the firmware or hardware of a system. Unlike traditional hypervisors, it doesn't run on top of an existing operating system. Instead, Metalvisor is launched from firmware to run VMs directly on the silicon and removes the OS and orchestrators between the VM and bare-metal.  

High Determinism: Determinism in a hypervisor refers to its ability to provide predictable performance and response times. Metalvisor, due to its minimalist nature and close integration with hardware, aims for a high level of determinism and predictable CPU clock cycles, beneficial for time-sensitive workloads.‍

Quality of Service (QoS): Metalvisor can enforce Quality of Service guarantees, ensuring that critical workloads receive the necessary resources and performance levels, even under load.


Metalvisor vs. Type-1 Hypervisors for Kubernetes

Hypervisors, such as ESXi and KVM, provide hardware abstraction for multiple virtual machines (VMs). However, Type-1 hypervisors have some performance overhead due to an additional layer between VMs and the physical hardware. Metalvisor, a firmware-level hypervisor, offers better performance for Kubernetes workloads by reducing this overhead. While traditional Type-1 hypervisor setups involve "double orchestration" with the hypervisor managing VMs and Kubernetes managing workloads, Metalvisor does not require a full-fledged VM orchestrator, streamlining resource allocation for Kubernetes and reducing the potential for resource contention and scheduling conflicts.

Type-1 Hypervisors: These are the most common types of hypervisors (like VMware ESXi, KVM). They run on operating systems specifically built for running virtualization, creating a layer of abstraction to support multiple virtual machines (VMs).

Performance: Type-1 hypervisors have some performance overhead due to the extra layer between the VMs and the physical hardware. Metalvisor, being embedded at the firmware level, reduces this overhead, offering better performance for Kubernetes workloads.

Double Orchestration: In a Type-1 setup, you have the hypervisor's orchestrator managing VMs and Kubernetes managing the workloads inside those VMs. This "double orchestration" can lead to resource contention and scheduling conflicts. Metalvisor, while still providing hardware management, avoids a full-fledged VM orchestrator, streamlining resource allocation for Kubernetes.

 
 

How Metalvisor (TypeZero) Helps

Metalvisor, a TypeZero hypervisor offers several advantages over traditional hypervisors. It simplifies resource management by streamlining allocation decisions with Kubernetes, reducing virtualization overhead, and improving determinism for real-time applications.

Simplified Resource Management: By removing the VM-level orchestration of a traditional hypervisor, you streamline resource allocation decisions. These decisions can be coordinated directly with Kubernetes for optimal workload execution.

Reduced Overhead: The minimal footprint of a Type-Zero hypervisor like Metalvisor translates to less virtualization overhead, maximizing resources available for the workloads themselves.

Improved Determinism: With tighter coupling to the hardware, a Type-Zero hypervisor can potentially provide more predictable performance, critical for real-time applications.


Feature Metalvisor Type-1 Hypervisor
Location in stack Embedded in firmware/hardware Runs in operating system
Performance overhead Lower Potentially higher
Determinism Cultural policy (e.g., high domestic market shares) Foreign sales, circulation, user metrics, commercial profits
QoS Can't enforce strict QoS More challenging to guarantee
Orchestration Minimal VM-level orchestration Full VM-level orchestration required
Double Orchestration Potentially avoided Can be an issue
 

Important Considerations

  • Complexity: TypeZero hypervisors can be more complex to set up and manage than established Type-1 solutions, especially for those without specialized expertise. Metalvisor can help simplify setup and installation through automated installation and configuration. 

  • Features: Metalvisor's focus on performance and determinism means it might lack some of the advanced features found in mature Type-1 hypervisors, like live migration or comprehensive management tools. However, many enterprise features, as well as direct Kubernetes integrations, are on the roadmap for Metalvisor. 

  • Maturity: Type-Zero hypervisors are still a relatively newer technology compared to the well-established Type-1 hypervisor market. 

Is Metalvisor Right For You?

Metalvisor might be a compelling option if:

  • You have Kubernetes workloads with strict performance and real-time requirements.

  • You are willing to forgo certain enterprise features in return for performance gains.

  • You want to minimize overhead and want to avoid double orchestration issues.

 
Brad Sollar

CTO, Mainsail

Previous
Previous

Trust Starts at the Silicon

Next
Next

Red Hat Certification & Support