Chapter 15: Boot And Configuration

By the time you call PUT /actions with action_type: "InstanceStart", the firecracker process has been running for perhaps a dozen milliseconds and has done almost nothing. It opened /dev/kvm, issued KVM_CREATE_VM, and started listening on a Unix domain socket. The guest CPU has never executed an instruction. What happens between process start and that first instruction is the subject of this chapter: a sequence of HTTP/1.1 API calls over a Unix socket that assembles every parameter the VMM needs to build a virtual machine from scratch — and hands control to the guest kernel in under 125 ms.

That number, 125 ms from InstanceStart to the start of the guest user-space /sbin/init process, is not an aspiration. It is a contractual requirement stated in SPECIFICATION.md, measured under specific conditions: serial console disabled, minimal kernel, minimal root filesystem. Clearing it constrains every API design decision. Configuration has to be complete before boot, not discovered during it. The API has to be stateless across calls, not transactional. The device model has to start without probing for hardware that does not exist.

The Transport Layer

Firecracker does not expose its API on a TCP port. Every request travels over a Unix domain socket (UDS) using HTTP/1.1, with Content-Type: application/json on both sides. The socket path is operator-chosen and passed to the firecracker binary via --api-sock; there is no default. The OpenAPI 2.0 swagger document at src/firecracker/swagger/firecracker.yaml declares host: localhost and schemes: [http] as artifacts of the spec format, not as TCP configuration.

The choice of a Unix socket over a TCP port is not accidental. The jailer creates the socket inside the chroot before execing firecracker; the host process that manages the VM holds the socket path and is the only caller that can reach it. There is no exposed port for a network scanner to find, no bind address to misconfigure, and no TCP stack overhead on what is already a localhost path. The swagger spec version in the repository as of June 2026 is 1.17.0-dev.

The Pre-Boot State Machine

The API endpoints divide cleanly into two classes based on when they are valid. Most configuration calls are pre-boot only: PUT /boot-source, PUT /drives/{id}, PUT /machine-config, and PUT /network-interfaces/{id} all refuse to execute after the VM has started. PUT /actions with action_type: "InstanceStart" is the transition that moves the VMM from the "Not started" state to "Running". After that crossing, a different set of endpoints becomes valid: PATCH /drives/{id}, PATCH /network-interfaces/{id}, and PUT /actions with action_type: "FlushMetrics" or "SendCtrlAltDel".

flowchart LR A["Not started"] -->|"PUT /actions InstanceStart"| B["Running"] B -->|"PUT /actions SendCtrlAltDel\n(guest shuts down)"| C["Exited"]

GET / is valid in all states. It returns an InstanceInfo object with four required fields: app_name (always the string "Firecracker"), id (the operator-assigned or auto-generated instance identifier), state (one of "Not started", "Running", or "Paused"), and vmm_version (the build version of the running firecracker binary). The state strings matter when you are scripting: "Not started" is two words with a space, and the field for the build version is vmm_version, not firecracker_version.

Configuring The Boot Source

PUT /boot-source sets three fields. Only one is required.

kernel_image_path is an absolute path on the host filesystem to the uncompressed kernel image. On x86_64 that means vmlinux, an ELF binary typically in the 4–8 MB range. On aarch64 it means a PE-formatted Image file. The kernel must be uncompressed because Firecracker does not implement a bootloader — it loads the kernel image directly into guest memory and launches it via one of two boot protocols. When the kernel exports a PVH entry point (pvh_boot_cap), Firecracker uses PVH boot (BootProtocol::PvhBoot), which is the protocol its tuned kernels use. If no PVH entry point is present, Firecracker falls back to the Linux 64-bit boot protocol (BootProtocol::LinuxBoot). The guest memory load region starts at 0x100000 (HIMEM_START), but the entry address differs between the two protocols. There is no GRUB, no BIOS, no real-mode stage.

boot_args is the kernel command line. The API does not enforce a default; if you omit the field, no command-line arguments are passed. In practice, a working command line for a block-device boot looks like:

ro console=ttyS0 noapic reboot=k panic=1 pci=off nomodules

The pci=off suppresses PCI bus enumeration entirely — there is no PCI bus to find, but without that flag some kernel configurations will wait for one. The reboot=k tells the kernel to send a keyboard controller reset on reboot, which Firecracker intercepts and converts to a clean VMM exit. nomodules is appropriate here because a Firecracker guest kernel carries no loadable modules; the compiled-in device drivers are all that exist. firectl, the thin Go wrapper around the Firecracker API, uses these as its default when --kernel-opts is not specified.

initrd_path is optional. Set it to a host path to load an initramfs image into guest memory before jumping to the kernel entry point. Set it to null or omit it entirely for a direct boot into the root block device. The distinction matters for the required kernel configuration: initrd-only boot does not need CONFIG_ACPI=y, CONFIG_PCI=y, or CONFIG_VIRTIO_BLK=y, while a block-device boot requires all three.

PUT /boot-source accepts only PUT — no PATCH, no GET. A second PUT before InstanceStart replaces the previous configuration entirely. The semantics are idempotent in the replace-the-whole-thing sense, not in the merge-fields sense.

Machine Configuration

PUT /machine-config — or its field-merging sibling PATCH /machine-config, which is also pre-boot only — controls the virtual hardware the guest sees before it executes its first instruction: how many CPUs, how much memory, whether hyperthreading is advertised, and which CPU template shapes the CPUID leaves.

vcpu_count is required. The minimum is 1; the maximum is 32, a hard limit enforced in the Firecracker source and declared with maximum: 32 in the swagger schema. The KVM_CREATE_VCPU ioctl is called once per vCPU, and each vCPU becomes one host thread — 32 is therefore also the thread-count ceiling on the VMM process.

mem_size_mib is required. The swagger schema carries no minimum constraint. The practical floor is 128 MiB, the value the FAQ documents as its default; below that, the Linux boot sequence may fail before reaching init. The effective ceiling is set by KVM_MEM_MAX_NR_PAGES, a KVM constant that depends on host kernel version and architecture.

smt is a boolean, optional, defaulting to false, and valid only on x86_64. Setting it to true on aarch64 returns a 400. When true, Firecracker presents a CPU/cache topology to the guest that reflects simultaneous multithreading — concretely, the topology structures visible via cpuid leaf 0xb will indicate that threads share a core. This field was named ht_enabled before Firecracker v1.0.0; the rename to smt happened in that release, and the field became optional with a default of false.

Before setting smt: true in production: the Firecracker production host setup guide recommends disabling SMT at the host level in multi-tenant deployments because "SMT is frequently a precondition for speculation issues utilized in side channel attacks such as Spectre variants and MDS." The host kernel exposes SMT control at /sys/devices/system/cpu/smt/control; writing off to that file disables SMT host-wide. The forceoff and notsupported values are read-only and set by the kernel, not the operator. Guest-visible SMT and host SMT are independent settings; you can present SMT to the guest while running the host with SMT disabled, but the presented topology will not reflect real hardware.

track_dirty_pages enables KVM's dirty bitmap for the guest's memory slot. When true, KVM_GET_DIRTY_LOG can be used to identify which 4 KiB pages the guest has written since the last flush — the mechanism that underpins live migration and snapshots. Enabling dirty tracking at 4 KiB granularity has a measurable cost when combined with huge pages, discussed below.

huge_pages takes the string enum "None" (the default) or "2M". When set to "2M", Firecracker backs the guest memory slot with 2 MB hugetlbfs pages instead of 4 KiB base pages. 1 GB pages are not supported. This feature was added as a developer preview in Firecracker v1.7.0. The boot time improvement in Firecracker's own benchmarks reached up to 50%, because 2 MB pages mean fewer TLB misses during the kernel's initial page walks across guest memory.

Three caveats apply to huge_pages: "2M". First, the host must pre-allocate a large enough 2 MB huge page pool before the VM starts; Firecracker maps guest memory with MAP_NORESERVE, so an undersized pool causes SIGBUS at access time, not at map time. Second, snapshots of a hugepage-backed VM can only be restored via userfaultfd (UFFD); the standard mmap-based restore path does not support them. Third — and this interaction is easy to miss — setting track_dirty_pages: true alongside huge_pages: "2M" destroys the performance benefit: KVM reverts to 4 KiB PTE granularity for dirty tracking, which is exactly what 2 MB pages were bought to avoid.

CPU Templates

cpu_template accepts a string enum selecting a static CPU template, which shapes the CPUID leaves and MSRs the guest sees. Static templates have been deprecated since Firecracker v1.5.0 in favor of PUT /cpu-config, which accepts a JSON structure with architecture-specific overrides. If both a static template and a custom template are configured, whichever was set most recently wins.

The official documentation is explicit that "CPU templates shall not be used as a security protection against malicious guests." They exist for fleet homogeneity, not for isolation.

The available static templates and their target hardware are:

Template Architecture Target host CPUs
C3 x86_64 (Intel) Skylake, Cascade Lake, Ice Lake
T2 x86_64 (Intel) Skylake, Cascade Lake, Ice Lake
T2S x86_64 (Intel) Skylake, Cascade Lake
T2CL x86_64 (Intel) Cascade Lake, Ice Lake
T2A x86_64 (AMD) Milan
V1N1 aarch64 (ARM) Neoverse V1, presented as Neoverse N1
None both No template applied

T2CL and T2A together present a homogeneous instruction-set surface across Intel Cascade Lake and AMD Milan hosts in a mixed fleet — the same guest binary runs identically on either without branching on CPU vendor.

On x86_64, what these templates actually do is set specific CPUID leaf values and MSR contents. The C3 template sets CPUID leaf 0x1 EAX to 0x000306e4, which decodes to family 6, model 0x3e — the Ivy Bridge / Xeon E5 v2 signature. T2 sets the same leaf to 0x000306F2 (family 6, model 0x3f, Haswell). T2S sets CPUID leaf 0x1 EAX to the same value — 0x000306F2, family 6, model 0x3f — so T2 and T2S present an identical CPU version identifier to the guest. The distinction between them is in ECX/EDX feature bits on other leaves. T2S additionally writes MSR IA32_ARCH_CAPABILITIES at address 0x10A to 0x000000000C080C4C, exposing the specific capability bits needed for safe snapshot migration between Skylake and Cascade Lake hosts.

The V1N1 aarch64 template modifies four system registers to make a Neoverse V1 host look like a Neoverse N1 to the guest: ID_AA64PFR0_EL1 (clears SVE at bits 35:32 and DIT at bits 51:48), ID_AA64ISAR0_EL1 (clears SHA3, SM3, SM4, ASIMDFHM, FLAGM, and RND; sets SHA2 to 0b0001), ID_AA64ISAR1_EL1 (clears JSCVT, FCMA, BF16, DGH, and I8MM; sets DPB to 0b0001 and LRCPC to 0b0001), and ID_AA64MMFR2_EL1 (clears USCAT at bits 35:28). The effect is that code checking these ID registers on a V1 host sees the N1 feature set — the guest binaries compiled for N1 run without modification.

Regardless of which template is selected, Firecracker applies CPUID normalization on x86_64 after template application, so a template-set bit can be overwritten by normalization. Leaves touched on all CPUs include 0x0, 0x1, 0xb, 0x80000005, and 0x80000006. On Intel hosts, normalization additionally touches leaves 0x4, 0x6, 0x7, 0xa, 0x1f, and 0x800000020x80000004. On AMD, it touches 0x7, 0x80000001, 0x800000020x80000004, 0x80000008, 0x8000001d, and 0x8000001e. The practical effects include disabling Intel Turbo Boost, disabling performance monitoring counters, setting the HYPERVISOR bit in leaf 0x1 ECX, and enabling TSC_DEADLINE for APIC timer use.

For operators who need finer control than the static templates offer, PUT /cpu-config accepts a JSON body with cpuid_modifiers, msr_modifiers, and kvm_capabilities on x86_64, or reg_modifiers, vcpu_features, and kvm_capabilities on aarch64. Individual bits are expressed with a bitmap notation where "x" means pass-through (inherit from hardware), "0" forces the bit clear, and "1" forces it set. Underscore characters in bit strings are visual separators only.

Drives And The Root Filesystem

Every storage device the guest sees is a virtio-block device. Firecracker implements the OASIS virtio 1.2 specification's block device type (Device ID 2, section 5.2). The guest addresses all block devices at sector granularity of 512 bytes; the sector field in every virtio-blk request is a 512-byte offset into the device.

PUT /drives/{drive_id} attaches a drive. The drive_id path parameter and the matching body field together identify the device. is_root_device: true marks the drive that the guest kernel will find as /dev/vda — the paravirtualized block device that holds the root filesystem. Exactly one drive may carry is_root_device: true.

path_on_host is the absolute host filesystem path to the backing block file — a raw image, an ext4 file, a squashfs image. The file must exist and be readable by the firecracker process at the time of the PUT call.

is_read_only sets the virtio feature bit VIRTIO_BLK_F_RO (bit 5). When this bit is advertised, write requests from the guest driver must fail with VIRTIO_BLK_S_IOERR; the host-side device emulator refuses to pass any write through to the backing file. The guest kernel detects the read-only state during feature negotiation, before any I/O is attempted.

io_engine selects between "Sync" (the default, using standard blocking file I/O) and "Async" (added in Firecracker v1.0.0, backed by io_uring). The async engine reduces per-operation latency under load by avoiding the host thread context switch per I/O, at the cost of requiring io_uring support in the host kernel (Linux 5.1 or later).

cache_type selects "Unsafe" (default, writeback not guaranteed to be flushed) or "Writeback", which maps to the VIRTIO_BLK_F_CONFIG_WCE feature bit (bit 11) and enables explicit writeback cache semantics. The naming is deliberate: "Unsafe" is the correct choice when the backing file is on a fast local SSD whose durability you do not need to guarantee; "Writeback" is correct when the guest's writes must survive a host crash.

PATCH /drives/{drive_id} is post-boot only. It allows updating path_on_host and the rate limiter fields, and nothing else. is_read_only and is_root_device cannot be patched. This is not hot-plug of a new device; the virtio device the guest sees is the same device, with a different backing file on the host side. The use case is swapping the data volume of a running VM without the guest noticing a device disappear and reappear.

The Root Filesystem As A Block Device

The canonical setup for a Firecracker VM assigns an ext4 image to the root drive and passes appropriate boot arguments to the kernel:

PUT /drives/rootfs { "drive_id": "rootfs", "is_root_device": true, "path_on_host": "/path/to/rootfs.ext4", "is_read_only": false, "io_engine": "Sync", "cache_type": "Unsafe" }
PUT /boot-source
{
  "kernel_image_path": "/path/to/vmlinux",
  "boot_args": "root=/dev/vda console=ttyS0 reboot=k panic=1 pci=off"
}

The root=/dev/vda argument tells the kernel which device holds the root filesystem. Because virtio-blk devices appear as /dev/vda, /dev/vdb, and so on in the order they were attached, the root drive is always /dev/vda when is_root_device: true was set.

Read-Only Base Plus Overlay

When thousands of VMs share the same base operating system image, copying a multi-gigabyte root filesystem into each VM's backing file is expensive in both time and disk space. The standard pattern is to use a single read-only base image, attach a per-VM overlay drive for writes, and wire them together inside the guest with the Linux overlay filesystem.

The setup uses two drives:

flowchart TB
  A["base-rootfs.squashfs\n(is_read_only: true)\nguest: /dev/vda"] --> C["overlay filesystem in guest\n(mount -t overlay)"]
  B["overlay-vm-N.ext4\n(is_read_only: false)\nguest: /dev/vdb"] --> C
  C --> D["pivot_root → new /"]
  1. Attach the base image — typically a squashfs image — as the root drive with is_read_only: true. The guest will see it as /dev/vda.

  2. Attach a per-VM sparse ext4 file as a second drive with is_read_only: false. The guest sees it as /dev/vdb.

  3. Set boot_args to include init=/sbin/overlay-init overlay_root=vdb.

  4. The /sbin/overlay-init script in the base image mounts the overlay ext4 at /overlay, then calls mount -t overlay overlay -o lowerdir=/,upperdir=/overlay/root,workdir=/overlay/work /mnt, and calls pivot_root to switch to the merged view. After pivot_root, the running system's / is the overlay, writes go to the ext4 file on /dev/vdb, and the squashfs base is never modified.

The directories /overlay/root, /overlay/work, /mnt, and /rom must be pre-created inside the squashfs image, because the base filesystem is read-only at runtime and cannot be written during init.

For a temporary overlay — state lost on VM termination — replace overlay_root=vdb with overlay_root=ram or omit it; a tmpfs layer handles writes and vanishes with the process. For a persistent overlay, overlay_root=vdb directs writes to the ext4 sparse file, which survives VM termination. The sparse file starts near empty and grows only as the guest writes; a typical minimal workload accumulates tens of megabytes per VM rather than gigabytes.

Network Interfaces

PUT /network-interfaces/{iface_id} connects the guest to the host network. Firecracker requires the TAP device to already exist on the host before the API call; the VMM does not create it. The host_dev_name field names the TAP interface, and Firecracker opens it by name and holds the file descriptor for the lifetime of the VM.

Before running this step: creating a TAP device and attaching it to a bridge requires root (or CAP_NET_ADMIN) on the host. The sequence is: ip tuntap add dev tap0 mode tap and ip link set tap0 up. When running under the jailer, the jailer creates the TAP before exec'ing firecracker, because the chroot eliminates access to /dev/net/tun after exec.

guest_mac is optional. If omitted, Firecracker generates a MAC address deterministically from the interface index. Specifying it explicitly is necessary when the address must be stable across VM recreation or must match a DHCP reservation.

mtu was added in Firecracker v1.16.0. It sets the MTU advertised to the guest via VIRTIO_NET_F_MTU. The valid range is 68–65535. If omitted, the guest uses the default MTU negotiated by the virtio-net driver. In practice, setting this to match the host's physical interface MTU (typically 1500, or 9000 for jumbo frames) avoids silent fragmentation at the host TAP boundary.

rx_rate_limiter and tx_rate_limiter attach token-bucket rate limiters to the ingress and egress paths. Both fields accept a RateLimiter object containing two independent TokenBucket sub-objects: bandwidth (measured in bytes per second) and ops (measured in operations per second). Each bucket has three fields: size sets the bucket capacity, one_time_burst sets an initial fill that allows bursting above the steady-state rate, and refill_time sets the time in milliseconds to refill the bucket from zero to size. The steady-state rate is size / refill_time. Setting both size and refill_time to 0 disables the limiter.

PATCH /network-interfaces/{iface_id} is post-boot only and accepts only the rate limiter fields. The host_dev_name and guest_mac cannot be changed on a running VM; those are part of the virtio device identity that the guest negotiated at boot.

Logging

Firecracker's logger does not emit a word until it is explicitly initialized. The process starts, opens its API socket, and waits — silently. PUT /logger performs that initialization, and it can only be called once. A second call before the logger is active returns 400; once initialized, there is no API to reconfigure it.

Before calling PUT /logger: the destination file or named pipe must already exist on the filesystem. For a named pipe: mkfifo /tmp/fc-log.fifo. For a regular file: touch /tmp/fc-log.txt. The firecracker process must be able to open the path for writing; inside the jailer's chroot, the path must be inside the chroot directory.

The logger schema carries five fields. log_path is the only one without a default; it must name a pre-existing file or named pipe. level defaults to "Info" and accepts Error, Warning, Info, Debug, Trace, and Off (case-insensitive). show_level (default false) prepends the level string to each log line. show_log_origin (default false) prepends the Rust source file path and line number — useful when debugging, expensive when processing logs at scale. module filters log output to a specific Rust module path, for example "api_server::request" to trace only incoming API traffic.

The logger can also be configured via CLI flags — --log-path, --level, --show-level, --show-log-origin — passed to the firecracker binary at startup. The CLI path and the PUT /logger API path are mutually exclusive; use one or the other per process.

Firecracker produces no log output between jailer startup and logging-system initialization — a window that covers seccomp filter installation, the chroot, and the initial API socket setup. If the named pipe fills because the consumer is slow, Firecracker drops log entries silently and increments the lost-logs metric counter. A blocked log consumer starves you of operational visibility without any visible error.

Metrics

Where the logger emits human-readable lines, the metrics subsystem emits machine-readable JSON. PUT /metrics initializes it, and like the logger, it can only be called once — a second call returns 400.

Before calling PUT /metrics: the destination must already exist: mkfifo /tmp/fc-metrics.fifo or touch /tmp/fc-metrics.txt.

The schema has one field: metrics_path, the path to the named pipe or file that receives JSON metric objects.

Metrics are flushed in two ways. Automatically, every 60 seconds, Firecracker writes one JSON object containing all current counters. On demand, PUT /actions with action_type: "FlushMetrics" triggers an immediate flush without waiting for the 60-second timer.

Each flush emits a single JSON object with 21 top-level category keys: api_server, balloon, block, deprecated_api, entropy, get_api_requests, i8042, latencies_us, logger, mmds, net, patch_api_requests, put_api_requests, rtc, seccomp, signals, uart, vcpu, vhost_user_block, vmm, and vsock. All 21 categories are emitted on every flush regardless of whether the associated device is present; if you have no balloon device, the balloon category still appears, with zero values. The naming convention within each category is consistent: counters with _bytes or _bytes_count are byte quantities, _ms is milliseconds, _us is microseconds, and bare names without a suffix are event counts.

The metrics configuration is not captured in a VM snapshot and is not restored from one. A VM restored from a snapshot must reconfigure its metrics endpoint from scratch.

The back-pressure behavior mirrors the logger: if the named pipe is full, metric flush events are dropped and the lost-metrics counter is incremented.

Starting The VM

With boot source, drives, network interfaces, and optionally logger and metrics configured, the VM is ready to start. The call is:

PUT /actions
{
  "action_type": "InstanceStart"
}

This is the point of no return. InstanceStart can only be called once per firecracker process lifecycle; there is no API to stop and restart a running VM. The VMM issues KVM_SET_USER_MEMORY_REGION to map the guest memory into the VM, issues KVM_CREATE_VCPU for each configured vCPU, loads the kernel into guest memory starting at 0x100000 (HIMEM_START), selects the boot protocol (PVH when the kernel exports a PVH entry point, Linux 64-bit otherwise), sets up the initial register state for that protocol, and calls KVM_RUN. The guest CPU begins executing.

From that moment, the clock is running against the 125 ms SLA. Under the specified conditions — serial console absent from boot_args, Firecracker-tuned kernel, minimal root filesystem — the kernel traverses its init path, probes the virtio-MMIO devices (no PCI scan, no timeout), mounts the root filesystem, and forks /sbin/init inside that budget.

PUT /actions also exposes two post-boot operations. "FlushMetrics" triggers an immediate metrics flush as described above. "SendCtrlAltDel" emits a CTRL+ALT+DEL sequence through the emulated i8042 keyboard controller; this is x86_64 only and requires the guest kernel to have been compiled with CONFIG_SERIO_I8042 and CONFIG_KEYBOARD_ATKBD. The guest receives the signal, initiates its orderly shutdown, and when the guest CPU issues a reset instruction, Firecracker exits the process. There is no API-level pause or resume of a running VM without snapshot machinery (covered in Chapter 16).

The Config-File Shortcut

Issuing API calls individually requires a caller that can speak HTTP/1.1 over a Unix socket and sequence its requests correctly. For scripting and for integration tests, firecracker accepts a --config-file flag pointing to a JSON file that expresses the same configuration in one shot.

The file's JSON structure mirrors the API endpoints, with one naming convention worth internalizing: top-level keys use hyphens (boot-source, machine-config, network-interfaces, cpu-config) while nested field names use snake_case (kernel_image_path, vcpu_count, host_dev_name). The hyphens come from API URL path segments; the underscores come from JSON body field names. Mixing these up produces a parse error with no helpful message.

At minimum, the file must contain boot-source.kernel_image_path and at least one entry in drives with is_root_device: true. All other sections are optional. When --config-file is supplied without --no-api, the Unix domain socket is still created and remains available for subsequent API calls — so config-file and runtime API are not mutually exclusive.

The full set of top-level keys the config file accepts as of Firecracker v1.14 and later: boot-source, drives, machine-config, cpu-config, balloon, network-interfaces, vsock, logger, metrics, mmds-config, entropy, pmem, and memory-hotplug.

sequenceDiagram
    participant C as Caller
    participant F as "firecracker"
    participant K as KVM

    C->>F: --api-sock /run/fc.sock [--config-file]
    Note over F: KVM_CREATE_VM, API socket ready (~12 ms)
    C->>F: PUT /logger
    C->>F: PUT /metrics
    C->>F: PUT /machine-config
    C->>F: PUT /boot-source
    C->>F: PUT /drives/rootfs
    C->>F: PUT /network-interfaces/eth0
    C->>F: PUT /actions {"action_type":"InstanceStart"}
    F->>K: KVM_SET_USER_MEMORY_REGION
    F->>K: KVM_CREATE_VCPU (×vcpu_count)
    F->>K: KVM_RUN
    Note over F: Guest executing, state = Running

Beyond The Core Four

Several additional pre-boot endpoints extend the VM beyond a bare compute node: host-guest communication without a network stack, memory pressure signaling, persistent memory, dynamic memory pools, and the metadata service that replaces IMDS in the guest.

PUT /vsock attaches a virtio-vsock device for host-guest communication without a network stack. guest_cid (minimum 3; CIDs 0–2 are reserved) and uds_path are required. The vsock device appears to the guest as a standard AF_VSOCK socket endpoint.

PUT /balloon attaches the virtio-balloon device. The required fields are amount_mib and deflate_on_oom. The optional stats_polling_interval_s (0 to disable) controls how often the balloon driver reports guest memory statistics to the VMM. The balloon device cannot reclaim hugepage-backed guest memory: the 4 KiB granularity of the balloon protocol does not align with 2 MB huge pages.

PUT /pmem/{id}, added in Firecracker v1.14.0, attaches a virtio-pmem device exposing a host file as persistent memory in the guest. The guest kernel requires CONFIG_VIRTIO_PMEM, CONFIG_LIBNVDIMM, CONFIG_BLK_DEV_PMEM, and CONFIG_DAX. Devices appear as /dev/pmem0, /dev/pmem1, and so on. The backing file must be 2 MB aligned.

PUT /hotplug/memory, also added in v1.14.0, allocates a pool of memory available for dynamic adjustment via PATCH /hotplug/memory while the VM is running. The required field is total_size_mib, which must be a multiple of slot_size_mib (default 128 MiB). The guest kernel must support CONFIG_VIRTIO_MEM (Linux 5.16 or later on x86_64, 5.18 or later on aarch64). Memory within this pool can be offered to the guest with requested_size_mib at runtime; the guest kernel decides how much to accept based on memhp_default_state. Boot memory and hotplug memory are separate; only memory allocated in the hotplug pool can be dynamically adjusted.

PUT /mmds-config configures the MicroVM Metadata Service — the in-process datastore the guest queries at 169.254.169.254 via HTTP. It requires network_interfaces, an array of iface_id strings naming which network interfaces MMDS listens on. The optional version field selects "V1" (default, no authentication) or "V2" (token-based sessions). MMDS is the subject of Chapter 17.

Sources And Further Reading