The 4.5 release no a minor "point" update: it is one of the most feature-rich releases in the project's history. It contains several important additions. Most notably, new Xen PVH virtualization mode now supports running as dom0, enhanced support for Remus, significant ARM architecture updates, security improvements, real-time scheduling, support for Intel Cache Monitoring Technology (CMT), as well as improvements for automotive and embedded use-cases. Other enhancements include additional support for FreeBSD, systemd support, additional libvirt support, the release of Mirage OS 2.0, and more.
Besides giving an overview of Xen 4.5, we will explain the project's roadmap process and share what's ahead for 2015: such as improved OpenStack integration and hotpatching (applying security fixes without the need to reboot).
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
LFCOLLAB15: Xen 4.5 and Beyond
1. Lars Kurth
Community Manger, Xen Project
Chairman, Xen Project Advisory Board
Lead CentOS Virtualization SIG
Director, Open Source Business Office, Citrix lars_kurth
2. Released on January 15, 2015 (10 months of development)
Resources:
Blog: bit.do/xen-4-5-blog
Docs: bit.do/xen-4-5-docs
Download: bit.do/xen-4-5-download
Stats:
Changesets: 1812
KLOC Added: 81
KLOC Removed: 141 (mostly removal of XM)
Contributors:102 individuals
Employers: 39 (93 individuals working for them)
4. 0
50
100
150
200
250
2010 2011 2012 2013 2014
Developers
Employers
Using GitDM over Git logs using our database of developers and organizations to remove duplicates
across all sub-projects
Reasons for faster Innovation:
More developers and orgs
Fewer forked up-streams
(e.g. Linux, BSDs, QEMU, …)
Architecture clean-up
(e.g. XM – XL)
Better Development Process
7. Xen 4.5: XEND / XM has been removed
XL now the default interface into Xen
Resources:
Docs: bit.do/xen-xl
Comparison: bit.do/xen-4-5-xm-2-xl-compare
Migration Guide: bit.do/xen-4-5-xm-2-xl
Libvirt integration has been vastly improved
Resources:
Docs: bit.do/xen-libvirt
Complete List: bit.do/xen-4-5-blog
Dom0
Dom0 Kernel
Drivers
Toolstack(s)
LIBXENLIGHT
XL LIBVIRT
XEND
XM
8. Xen via Libvirt in Openstack:
Great Platform for Production Deployments
Get into Quality Group A in 2015
Great Platform for Development
Great DevStack support
Libvirt:
Better Quality, Stability & Usability
Drivers: OpenStack, CentOS Virt SIG – learning what a distro needs
Resources:
Docs: bit.do/xen-openstack
Plans: bit.do/xen-openstack-fosdem15
Install Video: https://vimeo.com/119572029
XenServer
(XAPI)
ESX Hyper-V
Group B
NOVA
LIBVIRT
KVM
Group A
Xen
Group C
Group A
9. Number 1 priority for the project
Vendor funded Test Infrastructure
More capacity & coverage
Automated performance testing
Vendor funded OpenStack CI loop
Xen Project Rack
10. Overview
Xen 4.5: Real-Time Deferrable Server Scheduler
What is next?
Resources:
Docs: bit.do/xen-schedulers
11. HWCPUsMemoryI/O
Dom0
Dom0 Kernel
Drivers
The Xen Project Hypervisor supports several
different schedulers with different properties.
Different schedulers can be assigned to…
… an entire host
e.g. Credit2 Scheduler
12. HWCPUsMemoryI/O
Dom0
Dom0 Kernel
Drivers
The Xen Project Hypervisor supports several
different schedulers with different properties.
Different schedulers can be assigned to…
… an entire host
… a pool of physical CPU’s (=CPU Pool) on a host
(VMs need to be assigned to a pool or pinned to a CPU)
e.g. RTDS Scheduler e.g. Credit Scheduler
14. Soft Real-time CPU scheduler (experimental)
Guarantees CPU capacity to guest VMs on SMP hosts
Budget: Amount of time assigned to a VM
Period: Time period in which depleted budgets are replenished
Global:
Allow VCPU Migration across CPUs
Partitioned:
Pin VCPU to a physical CPU
Schedule VMs per CPU
More flexibility & best utilization
Migration Overhead & Cache Penalty
May underutilize CPU
Lower overheads & lower latency
16. Scheduler Use-cases Xen 4.5 Plans for 4.6+
Credit General Purpose Supported
Default
Supported
Optional
Credit 2 General Purpose
Optimized for lower latency, higher VM density
Experimental Supported
Default
RTDS Soft & Firm Real-time
Multicore
Embedded, Automotive, Graphics & Gaming in
the Cloud, Low Latency Workloads
Experimental Hardening
Optimization
Better XL support
<1μs granularity
Supported
ARINC 653 Hard Real-time
Single core
Avionics, Drones, Medical
Supported
Compile time
No change
Legend:
likely in 4.6
possible in 4.6
17. Overview
Jan 2015: Intel GVT-g (XenGT) Updates
What is next?
Resources:
News: bit.do/xengt-jan15
Docs: bit.do/xengt-jan15-docs
18. Watch the demo at
https://www.youtube.com/
watch?v=V2i8HCcAnY8
Virtual GPU per VM
Performance critical resources
directly assigned to VM
19. XenGT support is currently out-of-tree
Q4-2014 refresh by Intel: In use by XenClient 5.5
First patches have been posted for review on xen-devel
Requires some Linux and QEMU patches also
Motivation: create a common code base for Xen & KVM
Likely complete for Xen 4.6 (or shortly afterwards)
Will initially be experimental
22. Shortcut Mode With
HVM / Fully Virtualized HVM
HVM + PV drivers HVM PV Drivers
PVHVM HVM PVHVM Drivers
PVH PV pvh=1
PV PV
Poor Performance
Scope for Improvement
Optimal Performance
VS VS VS VH
P VS VS VH
P P VS VH
P P P VH
P P P P
P = Paravirtualized
VS = Software Virtualized (QEMU)
VH = Hardware Virtualized
WindowsLinux,BSDs,…
23. PVH PV P P P VH
PV PV P P P P
ARM PV P P P VH
Simplicity: Less code & fewer Interfaces in Linux/FreeBSD
– Security : smaller TCB and attack surface, fewer possible exploits
– Clean-up : possibility to simplify Linux kernel and reduce maintenance burden
Better Performance & Lower Latency
– Dom0 must be a PV guest
– 64 bit: VM’s run in ring 0 instead of ring 3
(fewer expensive TLB flushes)
This is the most complex part
of Xen today!
24. Feature Complete
Hardware support for AMD x86 chips
Add support for PCI passthrough
Migration of PVH Dom U’s (including systems with PVH Dom 0)
Hardening & Tuning
Add PVH to test suite and make test failures blocking
Benchmarking and performance tests
Code clean-up
25. x86
HPET: Better and faster resolution values
Parallel memory scrubbing on boot (large machines)
Lower interrupt latency for PCI passthrough (machines > 2 sockets)
Soft affinity for non-NUMA machines
Multiple IO-REQ services for guests
(remove bottlenecks for HVM guests by allowing multiple QEMU back-ends)
Intel
SandyBridge: VT-d posted interrupts for PVHVM
(I/O intensive workloads)
26. Vulnerabilities published in 2014
Evolution of Xen Security Features
Xen 4.5 : Virtual Machine Introspection
A new Model for Cloud Security
What is next?
27. Escalation Linux Container KVM + QEMU Xen (PV)
Xen (HVM+Stub)
Privilege
Escalation
(guest to host)
7 – 9 3 – 5 0
Denial of Service
(by guest of host) 12 5 – 7 3
Information Leak
(from host to guest) 1 0 1
Assumptions
x86 vulnerabilities from guest to host that hosting/cloud providers worry about
Xen (HVM) without stub domains has slightly more than Xen (PV) due to use of QEMU, less than KVM + QEMU
Have the underlying analysis (but won’t cover it in the talk)
29. 2007 2008 2009 2010 201520142011 2012 2013
Stub Domains : QEMU in separate domains
Flask / Xen Security Modules (Xen’s version of SE Linux)
vTPM (Virtual Trusted Module)
Driver Domains (Network, Disk, … drivers in a separate VM)
TODAY: Mainly used by security apps (XenClient,
Qubes OS, …), Forensic, Military & Embedded
TODAY: In general use
(but has trade-offs at cloud scale)
XenAccess / XenProbes VM Introspection (via LibVMI)
Major
Upgrades
30. 2007 2008 2009 2010 201520142011 2012 2013
XenAccess / XenProbes VM Introspection (via LibVMI)
Exposed lots of existing Xen functionality in LibVMI
Hypervisor can bring paged out guest memory
Mem_access-emulate(-with-no-write)
Many more patches currently under review for Xen 4.6
31. Watch the demo at
https://www.youtube.com/wa
h?v=ZJPHfpDiN4o
Credit: Tamas K Lengyel
32. VM3
Guest OS
App
VMn
Guest OS
App
VM2
Guest OS
App
Dom0
Dom0 Kernel
Drivers Agent(s) Agent(s) Agent(s)
Installed in-guest agents, e.g. anti-virus software,
VM disk & memory scanner, network monitor, etc.
Anti virus storm, deployment/maintenance, …
33. Several
VM3 VMnVM2Dom0
Dom0 Kernel
Drivers
VM3
Guest OS
App
VMn
Guest OS
App
VM2
Guest OS
App
Security
Appliance
VM1
Introspection
Engine
Protected area
Agent Agent Agent
Hybrid approach: no need to move
everything outside (chose best trade-off)
XSM/Flask
34. Major re-work of Virtual Machine Introspection
Optimization, Code cleanup/future-proofing
Support for ARM CPUs
Intel #VE support
Turn on Xen Security Modules on by default and include in test suite
Disabled today and not automatically tested
Specialist Use General Use!
35. Reduce TCB
QEMU secure mode for HVM without stub domains
Move the instruction emulator into non-privilege mode
Move the Xen compatibility layer into a lower privilege ring
Binary Live Patching for the Xen Hypervisor
Depends on which solution the kernel will standardize on
(kpatch / kGraft / ftrace-based)
We want to share tooling
37. Remus: Non-stop Service Replication
Continually live migrates a copy of a running VM to a backup server
Automatically activates if the primary server fails
Expensive in terms of overheads and hardware requirements
COLO: A different approach (building on top of Remus)
Relaxes requirement of backup server/VM being an exact replica
If backup server generates the same response to input we are able to fail
over without service stop
Eliminates overheads, reduces hardware requirements
38. Remus
Some “loose ends”, e.g. one fix for PV
guests not in upstream kernel
Better tools integration and control
(“xl remus” instead of “remus”)
Optimizations for COLO
COLO
Out-of-tree
Integrates with Remus via “xl remus” –
works with Xen 4.5
Some known issues
Fix “loose ends”
Include into Xen Hypervisor code base
Switch block replication from blktap2 to
qdisk (motivation: performance &
alignment)
Hardening
40. Larger VMs
Up to 1TB of guest RAM
Lower virtualization overhead
Super page mappings and faster interrupt EOIs (no maintenance interrupts)
Improved Interrupt handling
Support for priorities and irq migration (virtual and physical)
Near feature parity with x86
Boot via UEFI firmware
QEMU PV backends (disk, console, keyboard, mouse, framebuffer)
Many new IP blocks, firmware interfaces and platforms are supported
E.g. AMD Seattle 64-bit server SoC – see bit.do/xen-4-5-docs
41. Hardening
Inclusion of 64 Bit Hardware into test infrastructure
VM Save/Restore and Live Migration
Note: Remus and COLO are architecture independent
PCI Passthrough
Note: passthrough of MMIO regions works in 4.6
ACPI and UEFI support for guests
More IP blocks, …
Support for more Hardware
42. Determine the usage of cache by VMs running
Monitors the L3 cache (LLC in most server platforms)
$ xl psr-cmt-attach vm-id
$ xl psr-cmt-show cache_occupancy
Identify noisy neighbor VMs and take corrective action
E.g. Migrate VM to a different host
E.g. CPU pinning, CPU pools, schedulers
What’s Next?
Intel Cache Allocation Technology
Longer term: schedulers can use HW utilization information
44. Release Manager: Wei Liu
Proposal: Tweaked Release Process for Xen 4.6
lists.xenproject.org/archives/html/xen-devel/2015-02/msg01214.html
Development start: 6 Jan 2015
Feature freeze: 10 Jul 2015
Release date: 9 Oct 2015 (could release earlier)
45. Master branch on xen.git
Feature Development
Feature
Freeze
point
Wait period
to clear test pushgate
RC’s
Release
Announcement
RELEASE-4.5.0 branch on xen.git
46. Master branch on xen.git
Feature Development RC’s
This is when patches for the ongoing release
need to be submitted for review
Wait period
to clear test pushgate
No new features will be accepted, unless there is a Freeze Exception
Bug fixes are allowed, with approval by Maintainers/Release Manager
Release Manager declares that only bug fixes deemed
blockers can be accepted
47. Release Manager:
Sends first
Xen x.y Development Update
email on xen-devel@
Deferred features from previous
release, Timetable, etc.
Release Manager:
Sends Monthly
Xen x.y Development Update
email on xen-devel@
Release Manager:
RC Announcements, Test Days
Release Manager:
RC Announcement
Contributors:
Expected to reply if they are working on a feature that is not
on the list of tracked features
Expected to provide Status updates on features & bugs on the list
Not engaging with the process may lead to removal or downgrading
Contributors:
Expected to reply if they are working on a feature that is not
on the list of tracked features and tracked bugs
Same as above: can also ask for Freeze Exceptions
Contributors:
Expected to provide Status updates on tracked bugs on the list
49. Embedded & Automotive
Sound, graphics, and other drivers for Linux and other OS’es
Lots of other enablers: e.g. security features
Certification
VMWare Tools support
Run VMWare images unmodified in Xen
More: First 4.6 Development Update
lists.xenproject.org/archives/html/xen-devel/2015-02/msg01816.html
50. Mirage OS
Safer and cleaner TLS stackopenmirage.org/blog/announcing-bitcoin-
pinata
Irmin: Git-like distributed, branchable storage
Jitsu: a DNS server that spawns unikernels in
response to DNS requests
IPv6, Tooling, etc.
VMn
Language run-time
Application
Cubieboard2 serving
2048 game @ FOSDEM’15
50 Minutes! ACTUAL TALK TIME
The 4.5 release no a minor "point" update: it is one of the most feature-rich releases in the project's history. It contains several important additions. Most notably, new Xen PVH virtualization mode now supports running as dom0, enhanced support for Remus, significant ARM architecture updates, security improvements, real-time scheduling, support for Intel Cache Monitoring Technology (CMT), as well as improvements for automotive and embedded use-cases. Other enhancements include additional support for FreeBSD, systemd support, additional libvirt support, the release of Mirage OS 2.0, and more. Besides giving an overview of Xen 4.5, we will explain the project's roadmap process and share what's ahead for 2015: such as improved OpenStack integration and hotpatching (applying security fixes without the need to reboot).
Remo
TIMING: 35 MINUTES
Unit tests, Tempest
= 18 MINS =
Ties back to the previous use-case
Notes: Seen up to 6 VM’s with graphics at good performance
TODO: a few notes to zoom stuff forward (playing time)
= 22 MINS =
Virt spectrum
PVH Dom0 : Why relevant?
E.g. on EC2 when you chose HVM for Linux and the Linux, you actually get PVHVM – while you get HVM + PV drivers for Windows
PVH Dom0 : Why relevant?
2nd part:
On x86-32 ISA, Xen ran PV guest kernels in ring 1 to protect the hypervisor from the guests.
x86-64 ISA removed rings 1 and 2 (leaving ring 0 for kernels and ring 3 for userspace) and eliminated the segmentation mechanism.
This means that on x86-64, the guest kernel and userspace both run in ring 3, requiring a complete TLB flush for transitions between them.
This is very expensive and is part of the reason why HVM can outperform PV x86-64 Hardware for some workloads
HPET=High Precision Event Timer
Soft affinity=sysadmin to define an arbitrary set of physical CPUs on which vCPUs prefer to run on
= 27 MINS =
VMI – HW assisted
Security Process Changes
Next (Table)
32 vulnerabilities in 2014 : some require a several conditions to hold at the same time, some affect code and configurations that is not used in a standard hosting provider config
Of course the same applies to LXC and KVM
Assumptions
Intel x86 CPU
general purpose operating system as the guest
attacker has already gained control of the guest
vulnerabilities which, for example, a cloud hosting provider would worry about (containers and KVM both make the case that they are secure in such environments)
Dedicated: other features, such as PVH, etc. also have a security dimension
TPM=standard for a secure cryptoprocessor
There has been a lot of security functionality in Xen for a long time, BUT it is primarily used in a very narrow market segment.
Reasons:
Security wasn’t such a hot topic until recently.
Some of these features in Xen were not well documented, enabled by default …
---
Early signs are that this is changing: and you can see some of this if you look at the work which is currently performed by various stake-holders in the community
LibVMI: target’s Xen but support for other HV exists (but Xen is by far the best supported).
LibVMI: KVM support is rather “rough”. KVM doesn't have the same type of APIs as Xen does to map VM memory into another process, or to forward events. Although some work is going on to improve the situation.
1) Two features that the Bitdefender guys added was the capability to inject pagefaults so that the guest OS would bring back paged-out memory.
2) The other one is mem_access-emulate(-with-no-write)
which is very handy when you are tracing the VM execution with EPT permissions (mem_access)
you don't need to reset the page permissions every time a trap is hit to let the VM progress.
If the -with-no-write flag is enabled than the emulation will not touch the guest VM memory, a good way to get pass the execution of shellcodes safely.
Players: BitDefender, TU Muncih, Zentific, Intel (McAfee) as well as the HW group, Cisco
Shell in Dom0
Running DRAKVUF Dynamic Malware Analysis System (sits on top of LibVMI)
Other issues:
Complex deployment / maintenance, Visibility, etc.
Duplication of resources
Etc.
Several = for multi-tenancy, one per customer
Advantages:
Easier deployment / maintenance, etc. – e.g. centrally managed
Better visitiblity / performance
Avoids Duplication of resources – e.g. anti-virus storm
Hardware support coming: Intel #VE
Etc.
Notes:
Introspection engine is NOT running in Dom0
Use XSM/Flask to tightly control what the security appliance can do
Motivator for block replication: performance and community alignment
TODO: Split in two and add a picture?
= 39 =
16GB => 1TB
PCI: In the ARM world, it is quite common to have no PCIe devices and to only access devices using MMIO regions.
LLC=last level cache
NOISY neighbor :
Consider a VM as equivalent to a process. VM A can be running processes that consume (evict) many cache entries from VM b and, therefore, slowing down the performance of VM B.
In this case the noisy neighbor is a VM and now your can consider mitigation actions like live migrating VM A to a different host (or at least be able to explain why VM B is running slower than expected). The noisy neighbor is the situation where you 2 processes, A & B. Process A can be noisy in that it runs an algorithm that dirties many entries in the cache, evicting cache entries for process B and thereby slowing down process B. CMT, today, allows you to track which processes are using how much cache and identify the noisy ones (the process A’s that consume too much cache).
= 44 =
Show example of the first 4.6 mail …
= 27 MINS =
VMI – HW assisted
Security Process Changes
Next (Table)
Other examples: OSv, HalVM, ErlangOnXen/Ling, Rump Kernels
Jitsu: DNS server that spawns unikernels in response to DNS requests and boots them in real-time with no perceptible lag to the end user.
Goal = enable a community cloud of ARM-based Cubieboard2 boards that serve user content without requiring centralised data centers, but with the ease-of-use of existing systems.