Appendix A: Command-Line And Inspection Tools
For every tool here the concern is the same: which flags matter for microVM
work, what the tool actually measures, and where the numbers come from in the
kernel or the process. If you have ever run a KVM_RUN loop and wondered why
kvm_stat was showing more exits than you expected, or typed lscpu on a guest
and wondered why it reported a different hypervisor than you knew was there, read
those entries first.
firecracker
When you type firecracker on the command line you are starting the microVM
monitor. The process runs three thread types: an API thread that runs an
HTTP/1.1 server over the Unix-domain socket; a VMM thread that handles device
emulation and the microVM metadata service (MMDS); and one vCPU thread per
vcpu_count, each executing KVM_RUN in a tight loop. "Three threads" is the
minimum, valid only for a 1-vCPU VM; a VM with N vCPUs has N+2 threads total.
Seccomp filters are applied per-thread before the first guest instruction
executes — a filter on the vCPU thread allows KVM_RUN but not the broader set
of syscalls the API thread requires.
On a production host the jailer wraps firecracker and constructs the chroot,
cgroups, and privilege drop before execing into the binary. Running
firecracker directly, without the jailer, is appropriate for lab sessions and
is the pattern used throughout this book's examples.
The flag that matters most in day-to-day use is --api-sock. The getting-started
guide uses /tmp/firecracker.socket as a conventional override; the binary
defaults to /run/firecracker.socket. Every subsequent curl --unix-socket
call targets the path you give here, so agree on it before the first API call.
--config-file accepts a JSON file containing the complete microVM
configuration — boot source, drives, network interfaces, machine config — and
causes Firecracker to configure itself before exposing the socket. Combined with
--no-api, which disables the HTTP server entirely, these two flags form the
"batch mode" path used in production when a higher-level orchestrator owns the
VM lifecycle. The two flags are interdependent: --no-api without
--config-file will not start.
--id sets the microVM identifier. It appears in every log line and defaults to
anonymous-instance, which is acceptable in a lab but collides if you run
multiple VMs in parallel on the same host.
Logging is split across two flags: --log-path names a FIFO or file, and
--level selects the verbosity. Valid values for --level are Error,
Warning, Info, Debug, and Trace. --show-level and --show-log-origin
prepend the level and the Rust source location (module + line number) to each
log line. For debugging device emulation failures, --show-log-origin is often
the fastest way to find the responsible code path without reading the full source.
The --enable-pci flag switches the virtio transport from the default
virtio-mmio to virtio-pci. This was added in v1.13.0 and is not needed for the
standard microVM device set (virtio-net, virtio-block, vsock), but matters when
a guest driver only speaks PCI.
Two flags appear in process trees managed by the jailer — --start-time-us and
--start-time-cpu-us — and should not be set by hand. The jailer injects them
to carry wall-clock and CPU-clock timestamps from before the exec, which
Firecracker uses to measure the true end-to-end boot latency.
The REST API
Communication with a running Firecracker process is plain HTTP/1.1 over the
Unix-domain socket, addressed with curl --unix-socket <path>. Pre-boot
configuration endpoints use PUT and return HTTP 204 on success. The two
required calls before InstanceStart are /boot-source and /machine-config:
# The firecracker process must have been started with /dev/kvm access.
# The curl calls themselves require no privilege.
curl --unix-socket /tmp/firecracker.socket -i \
-X PUT "http://localhost/boot-source" \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
-d '{
"kernel_image_path": "./vmlinux",
"boot_args": "console=ttyS0 reboot=k panic=1 pci=off nomodules"
}'
curl --unix-socket /tmp/firecracker.socket -i \
-X PUT "http://localhost/machine-config" \
-H "Content-Type: application/json" \
-d '{"vcpu_count": 2, "mem_size_mib": 512}'
On aarch64, prepend keep_bootcon to boot_args. The boot arguments shown
(console=ttyS0 reboot=k panic=1 pci=off nomodules) come from the
getting-started guide and reflect the minimal kernel command line for the default
virtio-mmio microVM configuration: no PCI bus, no modules, panic on reboot
rather than spinning.
The full set of pre-boot configuration endpoints:
| Path | Required fields |
|---|---|
PUT /boot-source |
kernel_image_path; optional boot_args, initrd_path |
PUT /drives/{drive_id} |
drive_id, is_root_device |
PUT /machine-config |
vcpu_count, mem_size_mib |
PUT /network-interfaces/{iface_id} |
iface_id, host_dev_name |
PUT /logger |
log_path; optional level, show_level, show_log_origin |
PUT /metrics |
metrics_path |
PUT /mmds/config |
MMDS configuration; imds_compat field added in v1.13.0 |
PUT /actions |
action_type: InstanceStart, FlushMetrics, or SendCtrlAltDel |
Post-boot, the most operationally useful endpoint is PATCH /vm with body
{"state": "Paused"} or {"state": "Resumed"}, which suspends and resumes the
VM without teardown. Snapshots go through PUT /snapshot/create (requiring
snapshot_type, snapshot_path, and mem_file_path) and PUT /snapshot/load
(which accepts backend_type: "File" or "Uffd" in the mem_backend object
for userfaultfd-backed restoration). GET /vm/config exports the complete
current configuration as JSON.
Performance Guarantees
Firecracker's SPECIFICATION.md commits to two numbers that appear throughout
this book. Boot time from InstanceStart to guest /sbin/init is at most
125 ms, measured on M5D.metal and M6G.metal instances with serial console
disabled, a minimal kernel, and a minimal root filesystem. Memory overhead per
microVM (1 vCPU, 128 MiB RAM) is at most 5 MiB. Firecracker was open-sourced
in November 2018.
Supported Kernel Versions
Firecracker's docs/kernel-policy.md guarantees at least two supported host
kernels and two supported guest kernels simultaneously. The current table:
| Kernel | Role | Min Firecracker version | Support end |
|---|---|---|---|
| 5.10 | Host + Guest | v1.0.0 | 2024-01-31 |
| 6.1 | Host | v1.5.0 | 2025-10-12 |
| 6.1 | Guest | v1.9.0 | 2026-09-02 |
| 6.18 | Host | v1.16.0 | 2028-05-28 |
Versions not in this table may work but are not validated in CI.
jailer
The jailer ships alongside firecracker and is the required wrapper for
production deployments. Its job is to construct an isolation envelope around the
VMM process before any guest code executes: a chroot that hides the host
filesystem, cgroups that bound CPU and memory, a uid/gid drop that removes root
privilege, and optional network and PID namespace isolation. It then execs into
the firecracker binary. By the time the first KVM_RUN executes, the VMM
process has no path back to the host filesystem and no privilege it did not
enter the chroot with.
Safety note. The jailer modifies the cgroup hierarchy, creates device nodes inside the chroot, and drops privileges with
setuid/setgid. Run it as root. The--uidand--gidtargets must be a non-root account; using uid 0 defeats the purpose.
Four flags are required on every invocation:
/usr/bin/jailer \
--id 551e7604-e35c-42b3-b825-416853441234 \
--exec-file /usr/bin/firecracker \
--uid 123 --gid 100 \
--cgroup cpuset.cpus=0-3 \
--cgroup cpuset.mems=0 \
--netns /var/run/netns/my_netns \
--daemonize
--id must be alphanumeric plus hyphens, maximum 64 characters. It becomes the
cgroup name and the subdirectory under --chroot-base-dir. The --exec-file
path must point to a statically linked Firecracker binary; the musl toolchain
produces the expected artifact. --uid and --gid are numeric and become the
drop targets.
The chroot lands at <chroot-base-dir>/<exec_file_name>/<id>/root. The default
--chroot-base-dir is /srv/jailer. The jailer copies the binary into the
chroot, creates /dev/net/tun and /dev/kvm device nodes inside it, and
changes ownership of all resources to uid:gid before executing.
--cgroup is repeatable and writes a key=value pair into the appropriate cgroup
file. --cgroup-version selects between the v1 and v2 hierarchy (default v1).
--parent-cgroup overrides the parent cgroup path (default: the exec file name).
--netns takes the path of an existing network namespace file descriptor and
causes the jailer to join that namespace before exec. --new-pid-ns wraps
the Firecracker process in a new PID namespace via CLONE_NEWPID and writes
the child PID into a file named <exec_file_name>.pid in the chroot. These two
flags are independent and can be combined.
--daemonize calls setsid() and redirects stdin/stdout/stderr to /dev/null.
--resource-limit sets setrlimit(2) values. The default is no-file=2048;
fsize (maximum file size in bytes) is also supported. Pass additional
invocations of the flag to set multiple limits.
Any flags placed after -- on the jailer command line are appended verbatim to
the firecracker argv. This is how you pass --level Debug or a custom
--seccomp-filter without the jailer interpreting them.
The Fifteen-Step Execution Sequence
The jailer's execution sequence matters because it determines what is visible at each isolation boundary. In order:
- Validate paths and VM ID.
- Close all inherited file descriptors except stdin, stdout, and stderr.
- Clear environment variables.
- Create the chroot directory at
<base>/<exec_file_name>/<id>/root. - Copy the Firecracker binary into the chroot.
- Apply
setrlimit(2)resource limits. - Create cgroup subdirectories and write parameter values.
- Enter a new mount namespace;
pivot_root(2)to the chroot. - Create
/dev/net/tunand/dev/kvmdevice nodes inside the chroot. - Change ownership of all resources to
uid:gid. - Join the network namespace if
--netnswas given. - Daemonize if
--daemonizewas given. - Clone into a new PID namespace if
--new-pid-nswas given. - Drop privileges to
uid:gid. exec(2)into the Firecracker binary.
The sequence is not arbitrary. Device nodes are created at step 9, after
pivot_root has locked the process into the chroot but before privileges are
dropped at step 14, because creating device nodes requires elevated privilege and
must target the new root. The network namespace join happens at step 11, after
device nodes, so that tun is visible in the namespace the process actually runs
in.
firectl
firectl is a Go tool built on firecracker-go-sdk that wraps both
firecracker and, optionally, jailer. Its flag parsing uses
github.com/jessevdk/go-flags. It translates a single command line into the
sequence of REST API calls that configure and start a VM, making it useful for
quick lab sessions where you want to avoid writing the curl sequence by hand.
Production deployments typically drive the REST API directly or through a
dedicated orchestrator; firectl is an inspection and experimentation tool.
A minimal invocation:
firectl \
--kernel=./vmlinux \
--root-drive=./rootfs.ext4 \
--ncpus=2 --memory=512 \
--tap-device=tap0/AA:FC:00:00:00:01
The flags that determine the VM configuration map directly onto the REST API
fields above. --kernel corresponds to kernel_image_path in /boot-source,
defaulting to ./vmlinux. --kernel-opts sets boot_args; the default is
ro console=ttyS0 noapic reboot=k panic=1 pci=off nomodules. --root-drive
is required; the optional :ro or :rw suffix controls the drive's
is_read_only field. Additional drives use --add-drive, which is repeatable.
Networking uses --tap-device in the form <device>/<mac>; the flag is
repeatable for multiple interfaces. Vsock uses --vsock-device <path>:<cid>.
-c / --ncpus sets vcpu_count (default 1). -m / --memory sets
mem_size_mib (default 512). --cpu-template accepts C3 or T2, which are x86-only (Intel) static CPU
templates that Firecracker applies through the /machine-config endpoint to
produce a consistent CPUID view for snapshot portability. From v1.13.0 onward,
custom CPU templates are also available via the REST API's cpu_config key.
Log handling: --vmm-log-fifo names a FIFO for VMM log output, --log-level
sets the level (default Debug), and -l / --firecracker-log redirects the
FIFO's contents to a file. --metrics-fifo names a FIFO for the Firecracker
metrics JSON.
--metadata accepts a JSON blob passed to MMDS, equivalent to configuring and
populating the metadata service via the REST API.
Jailer integration is opt-in: --jailer specifies the jailer binary path, and
--uid, --gid, --id, --node (NUMA node), --chroot-base-dir, and
--daemonize pass through to the jailer. Without --jailer, firectl invokes
firecracker directly.
qemu-system-x86_64 with -M microvm
The microvm machine type was introduced in QEMU 4.2.0, released 2019-12-13.
Its design statement in the QEMU documentation is precise: "microvm is a machine
type inspired by Firecracker and constructed after its machine model." Like
Firecracker, it has no PCI bus by default, no ACPI, and no device hotplug. It
does not support live migration across QEMU versions.
The book reaches for QEMU when comparing Firecracker design choices or running
inspection experiments that need a standard Linux virtio stack — for example,
reading info virtio-status through the QEMU monitor to observe what
Firecracker hides behind its device emulation layer.
Machine Type And Transport
The microvm machine type uses virtio-mmio exclusively. The source file
hw/i386/microvm.c configures a maximum of eight virtio-mmio transports by
default. An ISA bus is present; the legacy devices on it — the i8259 PIC, i8254
PIT, MC146818 RTC, and ISA serial port — are each conditionally instantiated
through machine properties. kvmclock and fw_cfg are supported. The default BIOS
is qboot, chosen for reduced boot time; SeaBIOS is also compatible. No
firmware currently supports booting from a virtio-mmio block device, so a host
kernel must always be supplied via -kernel.
Safety note.
-enable-kvmrequires access to/dev/kvm. Run as a user in thekvmgroup or as root. On hosts where the KVM module is not loaded, omit-enable-kvmand expect a ten- to hundred-fold increase in boot time due to software emulation.
Machine-specific options are passed as comma-separated key=value pairs after -M microvm:
| Option | Effect |
|---|---|
x-option-roms=off |
Disable option ROM loading |
pit=off |
Disable i8254 PIT |
pic=off |
Disable i8259 PIC |
rtc=off |
Disable MC146818 RTC |
isa-serial=off |
Disable ISA serial port |
auto-kernel-cmdline=on |
Auto-append virtio-mmio entries to kernel cmdline |
Canonical Invocations
The standard configuration uses the ISA serial port for the console:
# Requires /dev/kvm access.
qemu-system-x86_64 -M microvm \
-enable-kvm -cpu host -m 512m -smp 2 \
-kernel vmlinux -append "earlyprintk=ttyS0 console=ttyS0 root=/dev/vda" \
-nodefaults -no-user-config -nographic \
-serial stdio \
-drive id=test,file=test.img,format=raw,if=none \
-device virtio-blk-device,drive=test \
-netdev tap,id=tap0,script=no,downscript=no \
-device virtio-net-device,netdev=tap0
The minimal-footprint configuration disables all legacy ISA devices and uses a virtio-serial console instead:
# Requires /dev/kvm access.
qemu-system-x86_64 \
-M microvm,x-option-roms=off,pit=off,pic=off,isa-serial=off,rtc=off \
-enable-kvm -cpu host -m 512m -smp 2 \
-kernel vmlinux -append "console=hvc0 root=/dev/vda" \
-nodefaults -no-user-config -nographic \
-chardev stdio,id=virtiocon0 \
-device virtio-serial-device \
-device virtconsole,chardev=virtiocon0 \
-drive id=test,file=test.img,format=raw,if=none \
-device virtio-blk-device,drive=test \
-netdev tap,id=tap0,script=no,downscript=no \
-device virtio-net-device,netdev=tap0
-nodefaults suppresses the default set of devices QEMU would otherwise create
(VGA, sound, USB) — essential for a microVM profile. -no-user-config prevents
QEMU from reading per-user configuration files. -cpu host passes through the
host's CPU feature flags, enabling the guest to see the same virtualization
feature bits that lscpu and cpuid show on the host.
HMP: The Human Monitor Protocol
QEMU can multiplex the serial console and an interactive monitor over the same
terminal using the mux chardev form. The canonical invocation above uses
-serial stdio, which does not enable mux mode; to get Ctrl-a switching
between the monitor and console, use -serial mon:stdio instead. The escape
key sequences:
| Sequence | Effect |
|---|---|
Ctrl-a c |
Switch between monitor and console (requires -serial mon:stdio) |
Ctrl-a h |
Show help |
Ctrl-a x |
Kill the emulator |
Ctrl-a s |
Sync disk (snapshot mode) |
Ctrl-a b |
Send break / magic SysRq |
Ctrl-a Ctrl-a |
Send literal Ctrl-a to guest |
You can also attach the monitor to a separate channel: -monitor unix:/path,server,nowait
or -monitor telnet::4444,server,nowait.
The HMP info subcommands most useful for KVM and virtio inspection:
info kvm— whether KVM acceleration is activeinfo registers— the current vCPU register stateinfo cpus— all vCPU states and thread IDsinfo virtio— all virtio devicesinfo virtio-status <path>— status register for a specific deviceinfo virtio-queue-status <path> <queue>— queue pointers and flagsinfo mtree— full memory map treeinfo qtree— full device tree
System control: quit, stop, cont, system_reset, system_powerdown.
QMP: The QEMU Machine Protocol
QMP uses a JSON wire format: UTF-8 in, ASCII out, objects terminated by CRLF.
Enable it with -qmp unix:/path/to/sock,server,nowait or
-qmp tcp:127.0.0.1:4444,server,nowait. On connect, QEMU sends a greeting:
{"QMP": {"version": {"qemu": {"micro": 0, "minor": 0, "major": 3}, "package": "v3.0.0"}, "capabilities": ["oob"]}}
The client must issue {"execute": "qmp_capabilities"} before any other
command; omitting it causes every subsequent command to return CommandNotFound.
The oob capability in the greeting enables out-of-band execution that bypasses
the normal command queue. QMP is the interface used by libvirt and other
automation layers; for one-off inspection in a lab session, HMP is faster.
kvm_stat
When a VM exit occurs, the hardware stops the guest and hands control to KVM.
KVM classifies the exit by reason — CPUID, HLT, IO_INSTRUCTION, EPT_VIOLATION,
and so on — and increments a per-reason counter. kvm_stat reads those counters
and displays a rolling table of exit counts, one row per exit reason, refreshed
at an adjustable interval. It is, as its name suggests, vmstat for VM exits.
The tool is a Python 3 script, shipped at tools/kvm/kvm_stat/kvm_stat in the
Linux kernel source tree. Distributions package it separately; on Ubuntu, the
package is linux-tools-host (present since at least kernel version
4.15.0-19.20). Do not look for it in the QEMU tree — it moved into the kernel
tree.
Safety note. Both of
kvm_stat's data source modes require elevated privilege. Debugfs mode requires mounting debugfs and reading files under/sys/kernel/debug/kvm/, which requires root orCAP_SYS_ADMIN. Tracepoint mode callsperf_event_open(2)withPERF_TYPE_TRACEPOINT, which also requires elevated privilege on most kernel configurations.
Data Source Modes
kvm_stat has two modes for reading exit counters, selectable by flag:
Debugfs mode (-d / --debugfs) reads pseudo-files under
/sys/kernel/debug/kvm/. Each running VM gets a subdirectory named {pid}-{fd},
where pid is the VMM process ID and fd is the file descriptor number of the
KVM VM fd returned by KVM_CREATE_VM. Each exit reason is a file in that
directory; reading it returns a cumulative count. Debugfs must be mounted:
# Requires root.
mount -t debugfs none /sys/kernel/debug
Tracepoint mode (-t / --tracepoints) reads from kernel tracepoints under
/sys/kernel/debug/tracing/events/kvm/ using perf_event_open(2) with
PERF_TYPE_TRACEPOINT and PERF_FORMAT_GROUP. This mode sees exits from all
VMs on the host simultaneously and does not require per-VM fd knowledge, but it
cannot filter to a single VM by fd — only by guest name (-g) or PID (-p).
Key Flags
-1 / --once / --batch produces a single-pass non-interactive output,
suitable for piping into scripts. -f <regex> / --fields filters the displayed
counters by regex — useful when you want to watch only EXIT_REASON_EPT_VIOLATION
rows without the noise of CPUID and HLT. -p <pid> restricts to one VM by
PID. -s / --set-delay adjusts the refresh interval (0.1–25.5 seconds).
-z / --skip-zero-records suppresses all-zero rows in log mode.
Exit Reason Dictionaries
kvm_stat ships exit reason tables for VMX (Intel), SVM (AMD), and AArch64
inside the Python source. Selected entries from each:
VMX (Intel) — from VMX_EXIT_REASONS in the script:
| Code | Name |
|---|---|
| 0 | EXCEPTION_NMI |
| 1 | EXTERNAL_INTERRUPT |
| 2 | TRIPLE_FAULT |
| 10 | CPUID |
| 12 | HLT |
| 75 | NOTIFY |
The NOTIFY exit (code 75) is the newest entry and signals a notify VM exit —
a mechanism added to handle the case where the guest has been non-cooperative
for too long without causing a conventional exit.
SVM (AMD) — from SVM_EXIT_REASONS in the script:
| Code (hex) | Name |
|---|---|
0x000 |
READ_CR0 |
0x010 |
WRITE_CR0 |
0x400 |
NPF — Nested Page Fault |
0x401 |
AVIC_INCOMPLETE_IPI |
0x402 |
AVIC_UNACCELERATED_ACCESS |
0x403 |
VMGEXIT |
The NPF exit (0x400) is the AMD SVM equivalent of Intel's EPT_VIOLATION.
VMGEXIT (0x403) appears when SEV-ES guests execute the VMGEXIT instruction
to communicate with the hypervisor — an exit reason you will not see on a
non-confidential microVM workload.
perf kvm
perf kvm is a perf subcommand for KVM-specific profiling. Where kvm_stat
shows aggregate exit counts, perf kvm stat shows exit latency: how long each
exit reason keeps the vCPU off the guest, broken down by min, max, and mean
across all samples.
The subcommand has six sub-subcommands: top, record, report, diff,
buildid-list, and stat. The stat variant further divides into
stat record, stat report, and stat live. In practice, the three-step
workflow is: record to a file, then report from it, or use stat live for an
immediate rolling view.
Safety note.
perf kvmcallsperf_event_open(2)and reads from kernel tracepoints. It requires root orCAP_PERFMON(Linux 5.8+) /CAP_SYS_ADMINon older kernels. The--guestflag additionally requires debugfs access for guest symbol resolution.
perf kvm stat
On x86, the --event= flag selects the exit type to analyze. The supported
values are vmexit (the default, supported on all architectures), mmio (x86
only), and ioport (x86 only). For a microVM workload on an x86 host, vmexit
is the starting point; ioport narrows to the I/O port exits that Firecracker
sees during early boot.
# Requires root or CAP_PERFMON.
perf kvm stat record -a -- sleep 30
perf kvm stat report --event=vmexit
The report output columns are sample, percent_sample, time,
percent_time, max_t, min_t, and mean_t. The --key flag sorts by any
of those column names (sample is the default). --duration=<value> filters to
only exits that took longer than the given threshold in microseconds — useful for
isolating pathological exits from the background noise of CPUID and HLT.
--vcpu=<n> restricts analysis to a specific vCPU index.
Live mode skips the file:
# Requires root or CAP_PERFMON.
perf kvm stat live --event=vmexit
Live mode supports only two sort keys: sample (default) and time.
Recording With Guest Symbols
When a VM exit lands in a guest kernel function, perf kvm can resolve the
exit's originating address to a symbol name if you supply the guest's kallsyms
and module table. Record from inside the guest with:
# Requires root on the host; guest files must be available on the host.
perf kvm --host --guest \
--guestkallsyms=/tmp/guest.kallsyms \
--guestmodules=/tmp/guest.modules \
record -a
Terminate the recording with SIGINT only — other signals corrupt the data file.
Underlying Tracepoints
perf kvm stat consumes kernel tracepoints in the kvm: subsystem, visible
under /sys/kernel/debug/tracing/events/kvm/. The complete set includes
kvm:kvm_entry, kvm:kvm_exit, kvm:kvm_hypercall, kvm:kvm_pio,
kvm:kvm_cpuid, kvm:kvm_apic, kvm:kvm_inj_virq, kvm:kvm_page_fault,
kvm:kvm_msr, kvm:kvm_cr, and kvm:kvm_mmio. These are the same tracepoints
kvm_stat -t reads. To count all of them system-wide for a fixed interval
without filtering:
# Requires root or CAP_PERFMON.
sudo perf stat -e 'kvm:*' -a sleep 1h
cpuid
The cpuid instruction is how software asks the CPU what it is and what it can
do. A guest running under KVM gets a virtualized view: KVM intercepts cpuid
exits (exit reason 10 in the VMX table above) and synthesizes responses according
to what Firecracker or QEMU has configured. The cpuid userspace utility
(available as the cpuid package on most distributions) issues the instruction
directly from ring 3 and prints the raw register values. It is useful for two
things in microVM work: confirming that hardware virtualization extensions are
present on the host, and reading the KVM hypervisor vendor leaf to confirm what
the guest sees.
The most useful invocation:
cpuid -l 0x40000000
This reads the hypervisor base leaf, which the hypervisor vendor range starts at. Physical CPUs ignore this range (returning zeros); all major hypervisors respond here with their vendor string.
Feature Bits On The Host
| Leaf | Register | Bit | Meaning |
|---|---|---|---|
0x1 |
ECX | 5 | CPUID.01H:ECX[5] — Intel VMX (VT-x) supported |
0x1 |
ECX | 31 | Hypervisor present; physical CPUs always return 0; all major hypervisors set it to 1 |
0x80000001 |
ECX | 2 | CPUID.80000001H:ECX[2] — AMD SVM; edk2 constant name SVM |
The KVM Hypervisor CPUID Leaves
The range 0x40000000–0x4FFFFFFF is reserved for hypervisors by convention. KVM occupies two leaves:
Leaf 0x40000000 — KVM_CPUID_SIGNATURE: EAX returns the maximum hypervisor
leaf (typically 0x40000001; old hosts may return 0x0, which should be treated
as 0x40000001). EBX, ECX, and EDX together encode the 12-character ASCII
vendor string. For KVM:
| Register | Value | Meaning |
|---|---|---|
| EBX | 0x4b4d564b |
Bytes 0–3: KVMK |
| ECX | 0x564b4d56 |
Bytes 4–7: VMKV |
| EDX | 0x4d |
Bytes 8–11: M\0\0\0 |
Concatenated: "KVMKVMKVM\0\0\0". For comparison, Hyper-V returns
"Microsoft Hv", VMware returns "VMwareVMware", and Xen returns
"XenVMMXenVMM". If you run cpuid -l 0x40000000 inside a Firecracker guest
and see this string, KVM is confirmed as the underlying hypervisor regardless of
what Firecracker presents as the paravirt interface.
Leaf 0x40000001 — KVM_CPUID_FEATURES: EAX bits advertise KVM paravirt
features. The ones that appear in microVM work:
| Bit | Name | Meaning |
|---|---|---|
| 0 | KVM_FEATURE_CLOCKSOURCE |
kvmclock via MSRs 0x11, 0x12 |
| 3 | KVM_FEATURE_CLOCKSOURCE2 |
kvmclock via MSRs 0x4b564d00, 0x4b564d01 |
| 5 | KVM_FEATURE_STEAL_TIME |
steal time accounting via MSR 0x4b564d03 |
| 24 | KVM_FEATURE_CLOCKSOURCE_STABLE_BIT |
clocksource is stable across CPUs |
EDX bit 0 is KVM_HINTS_REALTIME, which signals that vCPUs will not be
preempted indefinitely — the hint a paravirt scheduler would act on to avoid
unnecessary yield calls.
lscpu
lscpu is part of util-linux. It reads /proc/cpuinfo — not the CPUID
instruction directly — to populate most of its output. Two fields are relevant
for microVM work.
The Virtualization field shows VT-x or AMD-V. The implementation in
sys-utils/lscpu-virt.c, in the function lscpu_read_virtualization(), scans
the flags: line in /proc/cpuinfo for the substrings " vmx " and " svm ".
Finding vmx prints VT-x; finding svm prints AMD-V. This means lscpu
is reporting what the kernel advertises in /proc/cpuinfo, not what the CPUID
instruction returned at boot — a distinction that matters inside a guest, where
the kernel may have been given a modified CPUID view.
The Hypervisor vendor field is populated by read_hypervisor_cpuid() in the
same file. That function queries CPUID leaf 0x40000000 and matches the 12-byte
vendor string against the table described above: "KVMKVMKVM" maps to
VIRT_VENDOR_KVM, "XenVMMXenVMM" to VIRT_VENDOR_XEN, "Microsoft Hv" to
VIRT_VENDOR_MSHV, and "VMwareVMware" to VIRT_VENDOR_VMWARE. DMI tables
are checked via read_hypervisor_dmi() as a fallback. Running lscpu inside a
Firecracker guest will therefore show KVM as the hypervisor vendor, which is
accurate — Firecracker is a VMM on top of KVM, not a standalone hypervisor.
lsmod And KVM Module Parameters
Three kernel modules implement KVM on x86: kvm (the common base, always
required), kvm_intel (Intel VMX), and kvm_amd (AMD SVM). On a bare-metal
host, exactly one of kvm_intel or kvm_amd is loaded alongside kvm. On a
VM, all three may be absent if the guest kernel was not compiled with KVM
support, or all three may be present if nested virtualization is enabled.
lsmod | grep kvm
On an Intel host running Firecracker, the expected output is two lines:
kvm_intel (with kvm listed as a dependency) and kvm. The Used by count
on the kvm line reflects the number of open /dev/kvm file descriptors, which
corresponds to the number of running VMs plus any that still have a fd open.
Module Parameters
Module parameters are exposed at /sys/module/<module>/parameters/ and can be
enumerated with modinfo -p kvm_intel or modinfo -p kvm_amd.
Safety note. Changing module parameters at runtime or via
/etc/modprobe.d/affects all VMs on the host. On a production host, treat these as host-wide configuration, not per-VM settings.
The parameters that affect microVM behavior:
/sys/module/kvm_intel/parameters/:
| Parameter | Description |
|---|---|
nested |
Enable nested VMX. Set to Y. |
ept |
Extended Page Tables (Intel SLAT); enabled by default. |
enable_apicv |
APIC virtualization acceleration. |
enable_shadow_vmcs |
Shadow VMCS support. |
/sys/module/kvm_amd/parameters/:
| Parameter | Description |
|---|---|
nested |
Enable nested SVM. Set to 1. |
npt |
Nested Page Tables (AMD SLAT); default 1 (enabled) on 64-bit and 32-bit PAE. |
To persist a change across reboots, write to /etc/modprobe.d/:
# Intel nested VMX (use a separate file per module to avoid overwriting):
echo "options kvm_intel nested=1" > /etc/modprobe.d/kvm-intel.conf
# AMD nested SVM:
echo "options kvm_amd nested=1" > /etc/modprobe.d/kvm-amd.conf
Where the parameter permits runtime changes (check the modinfo output for the
param_ops_* type; not all parameters are writable after module load):
# Requires root.
echo -n 1 > /sys/module/kvm_intel/parameters/nested
Confirm nested VMX is active after loading: cat /sys/module/kvm_intel/parameters/nested
should print Y. The AMD equivalent path is /sys/module/kvm_amd/parameters/nested
and prints 1 when enabled.
/proc And /sys Files That Expose KVM State
These pseudo-files are the lowest-level inspection surface. No additional tooling is required; all of them are readable (or writable, with appropriate privilege) from a shell.
| Path | What it shows |
|---|---|
/proc/cpuinfo flag vmx |
Intel VT-x capable CPU |
/proc/cpuinfo flag svm |
AMD-V capable CPU |
/proc/cpuinfo flag hypervisor |
Running inside a hypervisor; corresponds to CPUID.01H:ECX[31] |
/dev/kvm |
Character device opened by the VMM to create VMs; presence confirms the KVM module is loaded |
/sys/kernel/debug/kvm/ |
Per-VM debugfs directory; each VM appears as {pid}-{fd} |
/sys/kernel/debug/tracing/events/kvm/ |
KVM tracepoint event definitions; backing store for kvm_stat -t and perf kvm |
/sys/module/kvm/parameters/ |
Core KVM module parameters |
/sys/module/kvm_intel/parameters/ |
Intel VMX module parameters |
/sys/module/kvm_amd/parameters/ |
AMD SVM module parameters |
The hypervisor flag in /proc/cpuinfo corresponds directly to
CPUID.01H:ECX[31]. Physical CPUs always return 0 at that bit. Every major
hypervisor — KVM, Hyper-V, VMware, Xen — sets it to 1 so that guest software
can detect the environment without knowing the specific vendor. A guest kernel
uses this bit to decide whether to activate paravirt clock sources, balloon
drivers, and other host-cooperative features.
/dev/kvm is a character device with mode 0660, owned root:kvm. Its major
number is dynamically assigned and visible in /proc/devices. Any process in
the kvm group can open it. The file's presence confirms the KVM module is
loaded; its absence means either the module was not loaded or the CPU lacks
hardware virtualization support — dmesg | grep kvm will distinguish the two.
Safety note.
/sys/kernel/debug/requires either root orCAP_SYS_ADMIN, and requires debugfs to be mounted (mount -t debugfs none /sys/kernel/debug). On hosts where debugfs is mounted withhidepidor other restrictions, per-VM directories under/sys/kernel/debug/kvm/may not be visible to non-root users even if/dev/kvmis accessible. All commands reading these paths in the book's lab examples should be run inside the lab VM or as root on the host, not as an unprivileged user on a development workstation.
Sources And Further Reading
- Firecracker getting-started guide: https://github.com/firecracker-microvm/firecracker/blob/main/docs/getting-started.md
- Firecracker
SPECIFICATION.md: https://github.com/firecracker-microvm/firecracker/blob/main/SPECIFICATION.md - Firecracker design doc: https://github.com/firecracker-microvm/firecracker/blob/main/docs/design.md
- Firecracker v1.13.0 release notes: https://github.com/firecracker-microvm/firecracker/releases/tag/v1.13.0
- Firecracker REST API Swagger spec: https://raw.githubusercontent.com/firecracker-microvm/firecracker/main/src/firecracker/swagger/firecracker.yaml
- Firecracker snapshot support: https://raw.githubusercontent.com/firecracker-microvm/firecracker/main/docs/snapshotting/snapshot-support.md
- Firecracker jailer docs: https://github.com/firecracker-microvm/firecracker/blob/main/docs/jailer.md
- Firecracker kernel policy: https://github.com/firecracker-microvm/firecracker/blob/main/docs/kernel-policy.md
- AWS open-source blog — Firecracker announcement (November 2018): https://aws.amazon.com/blogs/opensource/firecracker-open-source-secure-fast-microvm-serverless/
firectloptions source (main branch): https://github.com/firecracker-microvm/firectl/blob/main/options.go- QEMU
microvmmachine type documentation: https://www.qemu.org/docs/master/system/i386/microvm.html - QEMU 4.2.0 release announcement (2019-12-13): https://www.qemu.org/2019/12/13/qemu-4-2-0/
- QEMU
microvmC source (hw/i386/microvm.c): https://github.com/qemu/qemu/blob/master/hw/i386/microvm.c - QEMU direct kernel boot: https://www.qemu.org/docs/master/system/linuxboot.html
- QEMU HMP monitor: https://www.qemu.org/docs/master/system/monitor.html
- QEMU mux-chardev: https://www.qemu.org/docs/master/system/mux-chardev.html
- QEMU QMP specification: https://www.qemu.org/docs/master/interop/qmp-spec.html
- Linux kernel
kvm_statsource: https://github.com/torvalds/linux/blob/master/tools/kvm/kvm_stat/kvm_stat kvm_statman page (Ubuntu Jammy): https://manpages.ubuntu.com/manpages/jammy/man1/kvm_stat.1.htmlperf-kvm(1)man page: https://www.man7.org/linux/man-pages/man1/perf-kvm.1.html- KVM tracepoints reference: https://www.linux-kvm.org/page/Perf_events
- KVM CPUID documentation: https://docs.kernel.org/virt/kvm/x86/cpuid.html
- CPUID instruction reference (mirrors Intel SDM Vol 2A): https://www.felixcloutier.com/x86/cpuid
- edk2 AMD CPUID header (SVM bit definition): https://github.com/tianocore/edk2/blob/master/MdePkg/Include/Register/Amd/Cpuid.h
util-linuxlscpu-virt.c: https://github.com/util-linux/util-linux/blob/master/sys-utils/lscpu-virt.c- KVM nested guests documentation: https://docs.kernel.org/virt/kvm/x86/running-nested-guests.html
- Arch Linux KVM wiki: https://wiki.archlinux.org/title/KVM