Chapter 15: Boot And Configuration
By the time you call PUT /actions with action_type: "InstanceStart", the
firecracker process has been running for perhaps a dozen milliseconds and has
done almost nothing. It opened /dev/kvm, issued KVM_CREATE_VM, and started
listening on a Unix domain socket. The guest CPU has never executed an
instruction. What happens between process start and that first instruction is the
subject of this chapter: a sequence of HTTP/1.1 API calls over a Unix socket
that assembles every parameter the VMM needs to build a virtual machine from
scratch — and hands control to the guest kernel in under 125 ms.
That number, 125 ms from InstanceStart to the start of the guest user-space
/sbin/init process, is not an aspiration. It is a contractual requirement stated in
SPECIFICATION.md, measured under specific conditions: serial console disabled,
minimal kernel, minimal root filesystem. Clearing it constrains every API
design decision. Configuration has to be complete before boot, not discovered
during it. The API has to be stateless across calls, not transactional. The
device model has to start without probing for hardware that does not exist.
The Transport Layer
Firecracker does not expose its API on a TCP port. Every request travels over
a Unix domain socket (UDS) using HTTP/1.1, with Content-Type:
application/json on both sides. The socket path is operator-chosen and passed
to the firecracker binary via --api-sock; there is no default. The OpenAPI
2.0 swagger document at src/firecracker/swagger/firecracker.yaml declares
host: localhost and schemes: [http] as artifacts of the spec format, not as
TCP configuration.
The choice of a Unix socket over a TCP port is not accidental. The jailer
creates the socket inside the chroot before execing firecracker; the host
process that manages the VM holds the socket path and is the only caller that
can reach it. There is no exposed port for a network scanner to find, no bind
address to misconfigure, and no TCP stack overhead on what is already a
localhost path. The swagger spec version in the repository as of June 2026 is
1.17.0-dev.
The Pre-Boot State Machine
The API endpoints divide cleanly into two classes based on when they are valid.
Most configuration calls are pre-boot only: PUT /boot-source, PUT
/drives/{id}, PUT /machine-config, and PUT /network-interfaces/{id} all
refuse to execute after the VM has started. PUT /actions with
action_type: "InstanceStart" is the transition that moves the VMM from the
"Not started" state to "Running". After that crossing, a different set of
endpoints becomes valid: PATCH /drives/{id}, PATCH
/network-interfaces/{id}, and PUT /actions with action_type:
"FlushMetrics" or "SendCtrlAltDel".
GET / is valid in all states. It returns an InstanceInfo object with four
required fields: app_name (always the string "Firecracker"), id (the
operator-assigned or auto-generated instance identifier), state (one of "Not
started", "Running", or "Paused"), and vmm_version (the build version of
the running firecracker binary). The state strings matter when you are
scripting: "Not started" is two words with a space, and the field for the
build version is vmm_version, not firecracker_version.
Configuring The Boot Source
PUT /boot-source sets three fields. Only one is required.
kernel_image_path is an absolute path on the host filesystem to the
uncompressed kernel image. On x86_64 that means vmlinux, an ELF binary
typically in the 4–8 MB range. On aarch64 it means a PE-formatted Image file.
The kernel must be uncompressed because Firecracker does not implement a
bootloader — it loads the kernel image directly into guest memory and launches
it via one of two boot protocols. When the kernel exports a PVH entry point
(pvh_boot_cap), Firecracker uses PVH boot (BootProtocol::PvhBoot), which
is the protocol its tuned kernels use. If no PVH entry point is present,
Firecracker falls back to the Linux 64-bit boot protocol
(BootProtocol::LinuxBoot). The guest memory load region starts at
0x100000 (HIMEM_START), but the entry address differs between the two
protocols. There is no GRUB, no BIOS, no real-mode stage.
boot_args is the kernel command line. The API does not enforce a default; if
you omit the field, no command-line arguments are passed. In practice, a working
command line for a block-device boot looks like:
The pci=off suppresses PCI bus enumeration entirely — there is no PCI bus to
find, but without that flag some kernel configurations will wait for one. The
reboot=k tells the kernel to send a keyboard controller reset on reboot, which
Firecracker intercepts and converts to a clean VMM exit. nomodules is
appropriate here because a Firecracker guest kernel carries no loadable modules;
the compiled-in device drivers are all that exist. firectl, the thin Go
wrapper around the Firecracker API, uses these as its default when --kernel-opts
is not specified.
initrd_path is optional. Set it to a host path to load an initramfs image
into guest memory before jumping to the kernel entry point. Set it to null or
omit it entirely for a direct boot into the root block device. The distinction
matters for the required kernel configuration: initrd-only boot does not need
CONFIG_ACPI=y, CONFIG_PCI=y, or CONFIG_VIRTIO_BLK=y, while a block-device
boot requires all three.
PUT /boot-source accepts only PUT — no PATCH, no GET. A second PUT
before InstanceStart replaces the previous configuration entirely. The
semantics are idempotent in the replace-the-whole-thing sense, not in the
merge-fields sense.
Machine Configuration
PUT /machine-config — or its field-merging sibling PATCH /machine-config,
which is also pre-boot only — controls the virtual hardware the guest sees
before it executes its first instruction: how many CPUs, how much memory, whether
hyperthreading is advertised, and which CPU template shapes the CPUID leaves.
vcpu_count is required. The minimum is 1; the maximum is 32, a hard limit
enforced in the Firecracker source and declared with maximum: 32 in the
swagger schema. The KVM_CREATE_VCPU ioctl is called once per vCPU, and each
vCPU becomes one host thread — 32 is therefore also the thread-count ceiling on
the VMM process.
mem_size_mib is required. The swagger schema carries no minimum constraint.
The practical floor is 128 MiB, the value the FAQ documents as its default;
below that, the Linux boot sequence may fail before reaching init. The
effective ceiling is set by KVM_MEM_MAX_NR_PAGES, a KVM constant that depends
on host kernel version and architecture.
smt is a boolean, optional, defaulting to false, and valid only on x86_64.
Setting it to true on aarch64 returns a 400. When true, Firecracker presents
a CPU/cache topology to the guest that reflects simultaneous multithreading —
concretely, the topology structures visible via cpuid leaf 0xb will indicate
that threads share a core. This field was named ht_enabled before Firecracker
v1.0.0; the rename to smt happened in that release, and the field became
optional with a default of false.
Before setting
smt: truein production: the Firecracker production host setup guide recommends disabling SMT at the host level in multi-tenant deployments because "SMT is frequently a precondition for speculation issues utilized in side channel attacks such as Spectre variants and MDS." The host kernel exposes SMT control at/sys/devices/system/cpu/smt/control; writingoffto that file disables SMT host-wide. Theforceoffandnotsupportedvalues are read-only and set by the kernel, not the operator. Guest-visible SMT and host SMT are independent settings; you can present SMT to the guest while running the host with SMT disabled, but the presented topology will not reflect real hardware.
track_dirty_pages enables KVM's dirty bitmap for the guest's memory slot. When
true, KVM_GET_DIRTY_LOG can be used to identify which 4 KiB pages the guest
has written since the last flush — the mechanism that underpins live migration
and snapshots. Enabling dirty tracking at 4 KiB granularity has a measurable
cost when combined with huge pages, discussed below.
huge_pages takes the string enum "None" (the default) or "2M". When set
to "2M", Firecracker backs the guest memory slot with 2 MB hugetlbfs pages
instead of 4 KiB base pages. 1 GB pages are not supported. This feature was
added as a developer preview in Firecracker v1.7.0. The boot time improvement in
Firecracker's own benchmarks reached up to 50%, because 2 MB pages mean fewer
TLB misses during the kernel's initial page walks across guest memory.
Three caveats apply to huge_pages: "2M". First, the host must pre-allocate
a large enough 2 MB huge page pool before the VM starts; Firecracker maps guest
memory with MAP_NORESERVE, so an undersized pool causes SIGBUS at access
time, not at map time. Second, snapshots of a hugepage-backed VM can only be
restored via userfaultfd (UFFD); the standard mmap-based restore path does
not support them. Third — and this interaction is easy to miss — setting
track_dirty_pages: true alongside huge_pages: "2M" destroys the performance
benefit: KVM reverts to 4 KiB PTE granularity for dirty tracking, which is
exactly what 2 MB pages were bought to avoid.
CPU Templates
cpu_template accepts a string enum selecting a static CPU template, which
shapes the CPUID leaves and MSRs the guest sees. Static templates have been
deprecated since Firecracker v1.5.0 in favor of PUT /cpu-config, which
accepts a JSON structure with architecture-specific overrides. If both a static
template and a custom template are configured, whichever was set most recently
wins.
The official documentation is explicit that "CPU templates shall not be used as a security protection against malicious guests." They exist for fleet homogeneity, not for isolation.
The available static templates and their target hardware are:
| Template | Architecture | Target host CPUs |
|---|---|---|
C3 |
x86_64 (Intel) | Skylake, Cascade Lake, Ice Lake |
T2 |
x86_64 (Intel) | Skylake, Cascade Lake, Ice Lake |
T2S |
x86_64 (Intel) | Skylake, Cascade Lake |
T2CL |
x86_64 (Intel) | Cascade Lake, Ice Lake |
T2A |
x86_64 (AMD) | Milan |
V1N1 |
aarch64 (ARM) | Neoverse V1, presented as Neoverse N1 |
None |
both | No template applied |
T2CL and T2A together present a homogeneous instruction-set surface across
Intel Cascade Lake and AMD Milan hosts in a mixed fleet — the same guest binary
runs identically on either without branching on CPU vendor.
On x86_64, what these templates actually do is set specific CPUID leaf values
and MSR contents. The C3 template sets CPUID leaf 0x1 EAX to 0x000306e4,
which decodes to family 6, model 0x3e — the Ivy Bridge / Xeon E5 v2
signature. T2 sets the same leaf to 0x000306F2 (family 6, model 0x3f, Haswell).
T2S sets CPUID leaf 0x1 EAX to the same value — 0x000306F2, family 6,
model 0x3f — so T2 and T2S present an identical CPU version identifier to
the guest. The distinction between them is in ECX/EDX feature bits on other
leaves. T2S additionally writes MSR IA32_ARCH_CAPABILITIES at address
0x10A to 0x000000000C080C4C, exposing the specific capability bits needed
for safe snapshot migration between Skylake and Cascade Lake hosts.
The V1N1 aarch64 template modifies four system registers to make a Neoverse V1
host look like a Neoverse N1 to the guest: ID_AA64PFR0_EL1 (clears SVE at
bits 35:32 and DIT at bits 51:48), ID_AA64ISAR0_EL1 (clears SHA3, SM3, SM4,
ASIMDFHM, FLAGM, and RND; sets SHA2 to 0b0001), ID_AA64ISAR1_EL1 (clears
JSCVT, FCMA, BF16, DGH, and I8MM; sets DPB to 0b0001 and LRCPC to
0b0001), and ID_AA64MMFR2_EL1 (clears USCAT at bits 35:28). The effect is
that code checking these ID registers on a V1 host sees the N1 feature set —
the guest binaries compiled for N1 run without modification.
Regardless of which template is selected, Firecracker applies CPUID
normalization on x86_64 after template application, so a template-set bit
can be overwritten by normalization. Leaves touched on all CPUs include 0x0,
0x1, 0xb, 0x80000005, and 0x80000006. On Intel hosts, normalization
additionally touches leaves 0x4, 0x6, 0x7, 0xa, 0x1f, and
0x80000002–0x80000004. On AMD, it touches 0x7, 0x80000001,
0x80000002–0x80000004, 0x80000008, 0x8000001d, and 0x8000001e. The
practical effects include disabling Intel Turbo Boost, disabling performance
monitoring counters, setting the HYPERVISOR bit in leaf 0x1 ECX, and
enabling TSC_DEADLINE for APIC timer use.
For operators who need finer control than the static templates offer, PUT
/cpu-config accepts a JSON body with cpuid_modifiers, msr_modifiers, and
kvm_capabilities on x86_64, or reg_modifiers, vcpu_features, and
kvm_capabilities on aarch64. Individual bits are expressed with a bitmap
notation where "x" means pass-through (inherit from hardware), "0" forces
the bit clear, and "1" forces it set. Underscore characters in bit strings are
visual separators only.
Drives And The Root Filesystem
Every storage device the guest sees is a virtio-block device. Firecracker
implements the OASIS virtio 1.2 specification's block device type (Device ID 2,
section 5.2). The guest addresses all block devices at sector granularity of 512
bytes; the sector field in every virtio-blk request is a 512-byte offset into
the device.
PUT /drives/{drive_id} attaches a drive. The drive_id path parameter and
the matching body field together identify the device. is_root_device: true
marks the drive that the guest kernel will find as /dev/vda — the
paravirtualized block device that holds the root filesystem. Exactly one drive
may carry is_root_device: true.
path_on_host is the absolute host filesystem path to the backing block file —
a raw image, an ext4 file, a squashfs image. The file must exist and be readable
by the firecracker process at the time of the PUT call.
is_read_only sets the virtio feature bit VIRTIO_BLK_F_RO (bit 5). When this
bit is advertised, write requests from the guest driver must fail with
VIRTIO_BLK_S_IOERR; the host-side device emulator refuses to pass any write
through to the backing file. The guest kernel detects the read-only state during
feature negotiation, before any I/O is attempted.
io_engine selects between "Sync" (the default, using standard blocking file
I/O) and "Async" (added in Firecracker v1.0.0, backed by io_uring). The
async engine reduces per-operation latency under load by avoiding the host
thread context switch per I/O, at the cost of requiring io_uring support in
the host kernel (Linux 5.1 or later).
cache_type selects "Unsafe" (default, writeback not guaranteed to be
flushed) or "Writeback", which maps to the VIRTIO_BLK_F_CONFIG_WCE feature
bit (bit 11) and enables explicit writeback cache semantics. The naming is
deliberate: "Unsafe" is the correct choice when the backing file is on a
fast local SSD whose durability you do not need to guarantee; "Writeback" is
correct when the guest's writes must survive a host crash.
PATCH /drives/{drive_id} is post-boot only. It allows updating
path_on_host and the rate limiter fields, and nothing else. is_read_only and
is_root_device cannot be patched. This is not hot-plug of a new device; the
virtio device the guest sees is the same device, with a different backing file
on the host side. The use case is swapping the data volume of a running VM
without the guest noticing a device disappear and reappear.
The Root Filesystem As A Block Device
The canonical setup for a Firecracker VM assigns an ext4 image to the root drive and passes appropriate boot arguments to the kernel:
PUT /boot-source
{
"kernel_image_path": "/path/to/vmlinux",
"boot_args": "root=/dev/vda console=ttyS0 reboot=k panic=1 pci=off"
}
The root=/dev/vda argument tells the kernel which device holds the root
filesystem. Because virtio-blk devices appear as /dev/vda, /dev/vdb, and so
on in the order they were attached, the root drive is always /dev/vda when
is_root_device: true was set.
Read-Only Base Plus Overlay
When thousands of VMs share the same base operating system image, copying a multi-gigabyte root filesystem into each VM's backing file is expensive in both time and disk space. The standard pattern is to use a single read-only base image, attach a per-VM overlay drive for writes, and wire them together inside the guest with the Linux overlay filesystem.
The setup uses two drives:
flowchart TB
A["base-rootfs.squashfs\n(is_read_only: true)\nguest: /dev/vda"] --> C["overlay filesystem in guest\n(mount -t overlay)"]
B["overlay-vm-N.ext4\n(is_read_only: false)\nguest: /dev/vdb"] --> C
C --> D["pivot_root → new /"]
-
Attach the base image — typically a squashfs image — as the root drive with
is_read_only: true. The guest will see it as/dev/vda. -
Attach a per-VM sparse ext4 file as a second drive with
is_read_only: false. The guest sees it as/dev/vdb. -
Set
boot_argsto includeinit=/sbin/overlay-init overlay_root=vdb. -
The
/sbin/overlay-initscript in the base image mounts the overlay ext4 at/overlay, then callsmount -t overlay overlay -o lowerdir=/,upperdir=/overlay/root,workdir=/overlay/work /mnt, and callspivot_rootto switch to the merged view. Afterpivot_root, the running system's/is the overlay, writes go to the ext4 file on/dev/vdb, and the squashfs base is never modified.
The directories /overlay/root, /overlay/work, /mnt, and /rom must be
pre-created inside the squashfs image, because the base filesystem is read-only
at runtime and cannot be written during init.
For a temporary overlay — state lost on VM termination — replace
overlay_root=vdb with overlay_root=ram or omit it; a tmpfs layer handles
writes and vanishes with the process. For a persistent overlay, overlay_root=vdb
directs writes to the ext4 sparse file, which survives VM termination. The
sparse file starts near empty and grows only as the guest writes; a typical
minimal workload accumulates tens of megabytes per VM rather than gigabytes.
Network Interfaces
PUT /network-interfaces/{iface_id} connects the guest to the host network.
Firecracker requires the TAP device to already exist on the host before the API
call; the VMM does not create it. The host_dev_name field names the TAP
interface, and Firecracker opens it by name and holds the file descriptor for
the lifetime of the VM.
Before running this step: creating a TAP device and attaching it to a bridge requires root (or
CAP_NET_ADMIN) on the host. The sequence is:ip tuntap add dev tap0 mode tapandip link set tap0 up. When running under the jailer, the jailer creates the TAP before exec'ingfirecracker, because the chroot eliminates access to/dev/net/tunafter exec.
guest_mac is optional. If omitted, Firecracker generates a MAC address
deterministically from the interface index. Specifying it explicitly is
necessary when the address must be stable across VM recreation or must match a
DHCP reservation.
mtu was added in Firecracker v1.16.0. It sets the MTU advertised to the guest
via VIRTIO_NET_F_MTU. The valid range is 68–65535. If omitted, the guest uses
the default MTU negotiated by the virtio-net driver. In practice, setting this
to match the host's physical interface MTU (typically 1500, or 9000 for jumbo
frames) avoids silent fragmentation at the host TAP boundary.
rx_rate_limiter and tx_rate_limiter attach token-bucket rate limiters to
the ingress and egress paths. Both fields accept a RateLimiter object
containing two independent TokenBucket sub-objects: bandwidth (measured in
bytes per second) and ops (measured in operations per second). Each bucket has
three fields: size sets the bucket capacity, one_time_burst sets an initial
fill that allows bursting above the steady-state rate, and refill_time sets the
time in milliseconds to refill the bucket from zero to size. The steady-state
rate is size / refill_time. Setting both size and refill_time to 0
disables the limiter.
PATCH /network-interfaces/{iface_id} is post-boot only and accepts only
the rate limiter fields. The host_dev_name and guest_mac cannot be changed
on a running VM; those are part of the virtio device identity that the guest
negotiated at boot.
Logging
Firecracker's logger does not emit a word until it is explicitly initialized.
The process starts, opens its API socket, and waits — silently. PUT /logger
performs that initialization, and it can only be called once. A second call
before the logger is active returns 400; once initialized, there is no API to
reconfigure it.
Before calling
PUT /logger: the destination file or named pipe must already exist on the filesystem. For a named pipe:mkfifo /tmp/fc-log.fifo. For a regular file:touch /tmp/fc-log.txt. Thefirecrackerprocess must be able to open the path for writing; inside the jailer's chroot, the path must be inside the chroot directory.
The logger schema carries five fields. log_path is the only one without a
default; it must name a pre-existing file or named pipe. level defaults to
"Info" and accepts Error, Warning, Info, Debug, Trace, and Off
(case-insensitive). show_level (default false) prepends the level string to
each log line. show_log_origin (default false) prepends the Rust source file
path and line number — useful when debugging, expensive when processing logs
at scale. module filters log output to a specific Rust module path, for
example "api_server::request" to trace only incoming API traffic.
The logger can also be configured via CLI flags — --log-path, --level,
--show-level, --show-log-origin — passed to the firecracker binary at
startup. The CLI path and the PUT /logger API path are mutually exclusive; use
one or the other per process.
Firecracker produces no log output between jailer startup and logging-system
initialization — a window that covers seccomp filter installation, the chroot,
and the initial API socket setup. If the named pipe fills because the consumer
is slow, Firecracker drops log entries silently and increments the lost-logs
metric counter. A blocked log consumer starves you of operational visibility
without any visible error.
Metrics
Where the logger emits human-readable lines, the metrics subsystem emits
machine-readable JSON. PUT /metrics initializes it, and like the logger, it
can only be called once — a second call returns 400.
Before calling
PUT /metrics: the destination must already exist:mkfifo /tmp/fc-metrics.fifoortouch /tmp/fc-metrics.txt.
The schema has one field: metrics_path, the path to the named pipe or file
that receives JSON metric objects.
Metrics are flushed in two ways. Automatically, every 60 seconds, Firecracker
writes one JSON object containing all current counters. On demand,
PUT /actions with action_type: "FlushMetrics" triggers an immediate flush
without waiting for the 60-second timer.
Each flush emits a single JSON object with 21 top-level category keys:
api_server, balloon, block, deprecated_api, entropy,
get_api_requests, i8042, latencies_us, logger, mmds, net,
patch_api_requests, put_api_requests, rtc, seccomp, signals, uart,
vcpu, vhost_user_block, vmm, and vsock. All 21 categories are emitted on
every flush regardless of whether the associated device is present; if you have
no balloon device, the balloon category still appears, with zero values. The
naming convention within each category is consistent: counters with _bytes or
_bytes_count are byte quantities, _ms is milliseconds, _us is
microseconds, and bare names without a suffix are event counts.
The metrics configuration is not captured in a VM snapshot and is not restored from one. A VM restored from a snapshot must reconfigure its metrics endpoint from scratch.
The back-pressure behavior mirrors the logger: if the named pipe is full, metric
flush events are dropped and the lost-metrics counter is incremented.
Starting The VM
With boot source, drives, network interfaces, and optionally logger and metrics configured, the VM is ready to start. The call is:
PUT /actions
{
"action_type": "InstanceStart"
}
This is the point of no return. InstanceStart can only be called once per
firecracker process lifecycle; there is no API to stop and restart a running
VM. The VMM issues KVM_SET_USER_MEMORY_REGION to map the guest memory into the
VM, issues KVM_CREATE_VCPU for each configured vCPU, loads the kernel into
guest memory starting at 0x100000 (HIMEM_START), selects the boot protocol
(PVH when the kernel exports a PVH entry point, Linux 64-bit otherwise), sets
up the initial register state for that protocol, and calls KVM_RUN. The
guest CPU begins executing.
From that moment, the clock is running against the 125 ms SLA. Under the
specified conditions — serial console absent from boot_args, Firecracker-tuned
kernel, minimal root filesystem — the kernel traverses its init path, probes the
virtio-MMIO devices (no PCI scan, no timeout), mounts the root filesystem, and
forks /sbin/init inside that budget.
PUT /actions also exposes two post-boot operations. "FlushMetrics" triggers
an immediate metrics flush as described above. "SendCtrlAltDel" emits a
CTRL+ALT+DEL sequence through the emulated i8042 keyboard controller; this is
x86_64 only and requires the guest kernel to have been compiled with
CONFIG_SERIO_I8042 and CONFIG_KEYBOARD_ATKBD. The guest receives the signal,
initiates its orderly shutdown, and when the guest CPU issues a reset
instruction, Firecracker exits the process. There is no API-level pause or
resume of a running VM without snapshot machinery (covered in Chapter 16).
The Config-File Shortcut
Issuing API calls individually requires a caller that can speak HTTP/1.1 over a
Unix socket and sequence its requests correctly. For scripting and for
integration tests, firecracker accepts a --config-file flag pointing to a
JSON file that expresses the same configuration in one shot.
The file's JSON structure mirrors the API endpoints, with one naming convention
worth internalizing: top-level keys use hyphens (boot-source,
machine-config, network-interfaces, cpu-config) while nested field names
use snake_case (kernel_image_path, vcpu_count, host_dev_name). The
hyphens come from API URL path segments; the underscores come from JSON body
field names. Mixing these up produces a parse error with no helpful message.
At minimum, the file must contain boot-source.kernel_image_path and at least
one entry in drives with is_root_device: true. All other sections are
optional. When --config-file is supplied without --no-api, the Unix domain
socket is still created and remains available for subsequent API calls — so
config-file and runtime API are not mutually exclusive.
The full set of top-level keys the config file accepts as of Firecracker v1.14
and later: boot-source, drives, machine-config, cpu-config, balloon,
network-interfaces, vsock, logger, metrics, mmds-config, entropy,
pmem, and memory-hotplug.
sequenceDiagram
participant C as Caller
participant F as "firecracker"
participant K as KVM
C->>F: --api-sock /run/fc.sock [--config-file]
Note over F: KVM_CREATE_VM, API socket ready (~12 ms)
C->>F: PUT /logger
C->>F: PUT /metrics
C->>F: PUT /machine-config
C->>F: PUT /boot-source
C->>F: PUT /drives/rootfs
C->>F: PUT /network-interfaces/eth0
C->>F: PUT /actions {"action_type":"InstanceStart"}
F->>K: KVM_SET_USER_MEMORY_REGION
F->>K: KVM_CREATE_VCPU (×vcpu_count)
F->>K: KVM_RUN
Note over F: Guest executing, state = Running
Beyond The Core Four
Several additional pre-boot endpoints extend the VM beyond a bare compute node: host-guest communication without a network stack, memory pressure signaling, persistent memory, dynamic memory pools, and the metadata service that replaces IMDS in the guest.
PUT /vsock attaches a virtio-vsock device for host-guest communication without
a network stack. guest_cid (minimum 3; CIDs 0–2 are reserved) and uds_path
are required. The vsock device appears to the guest as a standard AF_VSOCK
socket endpoint.
PUT /balloon attaches the virtio-balloon device. The required fields are
amount_mib and deflate_on_oom. The optional stats_polling_interval_s (0
to disable) controls how often the balloon driver reports guest memory statistics
to the VMM. The balloon device cannot reclaim hugepage-backed guest memory: the
4 KiB granularity of the balloon protocol does not align with 2 MB huge pages.
PUT /pmem/{id}, added in Firecracker v1.14.0, attaches a virtio-pmem device
exposing a host file as persistent memory in the guest. The guest kernel requires
CONFIG_VIRTIO_PMEM, CONFIG_LIBNVDIMM, CONFIG_BLK_DEV_PMEM, and
CONFIG_DAX. Devices appear as /dev/pmem0, /dev/pmem1, and so on. The
backing file must be 2 MB aligned.
PUT /hotplug/memory, also added in v1.14.0, allocates a pool of memory
available for dynamic adjustment via PATCH /hotplug/memory while the VM is
running. The required field is total_size_mib, which must be a multiple of
slot_size_mib (default 128 MiB). The guest kernel must support CONFIG_VIRTIO_MEM
(Linux 5.16 or later on x86_64, 5.18 or later on aarch64). Memory within this
pool can be offered to the guest with requested_size_mib at runtime; the guest
kernel decides how much to accept based on memhp_default_state. Boot memory
and hotplug memory are separate; only memory allocated in the hotplug pool can be
dynamically adjusted.
PUT /mmds-config configures the MicroVM Metadata Service — the in-process
datastore the guest queries at 169.254.169.254 via HTTP. It requires
network_interfaces, an array of iface_id strings naming which network
interfaces MMDS listens on. The optional version field selects "V1" (default,
no authentication) or "V2" (token-based sessions). MMDS is the subject of
Chapter 17.
Sources And Further Reading
- Firecracker OpenAPI spec (v1.17.0-dev): https://raw.githubusercontent.com/firecracker-microvm/firecracker/main/src/firecracker/swagger/firecracker.yaml
- Firecracker specification (boot SLA, pre-boot state machine): https://github.com/firecracker-microvm/firecracker/blob/main/SPECIFICATION.md
- Getting started guide: https://github.com/firecracker-microvm/firecracker/blob/main/docs/getting-started.md
- Reference VM config (
tests/framework/vm_config.json): https://github.com/firecracker-microvm/firecracker/blob/main/tests/framework/vm_config.json - CPU templates documentation: https://github.com/firecracker-microvm/firecracker/blob/main/docs/cpu_templates/cpu-templates.md
- CPUID normalization documentation: https://github.com/firecracker-microvm/firecracker/blob/main/docs/cpu_templates/cpuid-normalization.md
- Static template sources (x86_64 —
c3.rs,t2.rs,t2s.rs,t2cl.rs,t2a.rs): https://github.com/firecracker-microvm/firecracker/tree/main/src/vmm/src/cpu_config/x86_64/static_cpu_templates - Static template sources (aarch64 —
v1n1.rs): https://github.com/firecracker-microvm/firecracker/tree/main/src/vmm/src/cpu_config/aarch64/static_cpu_templates - Huge pages documentation: https://github.com/firecracker-microvm/firecracker/blob/main/docs/hugepages.md
- Production host setup guide: https://github.com/firecracker-microvm/firecracker/blob/main/docs/prod-host-setup.md
- Logger documentation: https://github.com/firecracker-microvm/firecracker/blob/main/docs/logger.md
- Metrics documentation: https://github.com/firecracker-microvm/firecracker/blob/main/docs/metrics.md
- Actions documentation: https://github.com/firecracker-microvm/firecracker/blob/main/docs/api_requests/actions.md
- PATCH block drive documentation: https://github.com/firecracker-microvm/firecracker/blob/main/docs/api_requests/patch-block.md
- PATCH network interface documentation: https://github.com/firecracker-microvm/firecracker/blob/main/docs/api_requests/patch-network-interface.md
- Firecracker CHANGELOG (v1.0.0 SMT rename, v1.5.0 static template deprecation, v1.7.0 huge pages, v1.14.0 pmem/hotplug, v1.16.0 MTU): https://github.com/firecracker-microvm/firecracker/blob/main/CHANGELOG.md
- Memory hotplug documentation: https://github.com/firecracker-microvm/firecracker/blob/main/docs/memory-hotplug.md
- virtio-pmem documentation: https://github.com/firecracker-microvm/firecracker/blob/main/docs/pmem.md
- Firecracker Discussion #3092 (vCPU count limit, memory limits): https://github.com/firecracker-microvm/firecracker/discussions/3092
- Firecracker Discussion #3061 (rootfs overlay pattern): https://github.com/firecracker-microvm/firecracker/discussions/3061
- firecracker-containerd root-filesystem documentation: https://github.com/firecracker-microvm/firecracker-containerd/blob/main/docs/root-filesystem.md
- firectl
options.go(flag-to-API mapping): https://raw.githubusercontent.com/firecracker-microvm/firectl/main/options.go - Linux SMT sysfs ABI (
/sys/devices/system/cpu/smt/control): https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-devices-system-cpu - OASIS virtio 1.2 specification, section 5.2 (virtio-blk): https://docs.oasis-open.org/virtio/virtio/v1.2/csd01/virtio-v1.2-csd01.html