Using ftrace to trace function calls from qemu-guest-agent
Posted on Wed 21 August 2019 in hints-and-kinks
When you are using functionality that is buried deep in the Linux
kernel, ftrace can be
extremely useful. Here are some suggestions on how to use it, using
the example of tracing function calls from qemu-guest-agent.
What’s this about?
Recently I used, for the first time, libvirt’s functionality to indicate to a virtual guest that it is about to have a point-in-time copy of its disks — a snapshot — taken. In doing so, it can tell the virtual machine (VM) to freeze I/O on all its mounted filesystems.
The rationale behind this is, I hope, obvious: you want the VM to momentarily stop I/O to its virtual disks, so that you can take a snapshot when no I/O is in-flight, and the snapshot image can thus be expected to be internally consistent. The snapshot itself will only take a second or so, and the minor interruption is a small price to pay for the added consistency guarantee you get.
You might be wondering how this works and it is, indeed, a bit involved.
-
First, you’ll need a virtual serial console that allows the hypervisor (in the host) to communicate with the guest. This will be defined in your libvirt domain XML, and in OpenStack Nova, this automatically pops up if you are booting your instance off an image which has the
hw_qemu_guest_agent=yesproperty set. -
Then, you’ll need a daemon within the guest that listens for commands received over the serial port. This daemon is called
qemu-guest-agent, orqemu-gafor short. All you’ll need for it to run is to install the package of that name, which you can do in various ways (apt-get install qemu-guest-agentbeing the simplest, on Ubuntu guests). -
One of the many commands that said daemon supports is
guest-fsfreeze-freeze. When it receives that command over the virtual serial link, the daemon will loop over your mounted filesystems, and issue theFIFREEZEioctl on all of them. This happens in reverse order, meaning your root (/) filesystem is frozen last. -
That ioctl then calls the
freeze_super()kernel function, which flushes each filesystem’s superblock, blocks (“freezes”) all new I/O to the filesystem, and syncs (flushes) all I/O that is currently in flight on that filesystem.
The combined net effect of all of the above is that you get a virtual machine that is temporarily read-only, with pending I/O piling up, until you are done taking your snapshot. When that happens, there are a few more actions that happen:
-
The hypervisor sends the
guest-fsfreeze-thawcommand over the virtual serial link. Now, the daemon will loop over all your mounted filesystems again, and issue theFITHAWioctl on them. This time, it is taking the mounts in forward order, thawing the root filesystem first. -
That ioctl then calls the
thaw_super()kernel function, which unblocks (“thaws”) all new I/O to the filesystem, and allows the VM to continue normal operations.
Now there’s a bit of an issue with that. All of the aforementioned
kernel functions only write printk’s on
error,
but they don’t tell you when they succeed. So you can try a snapshot,
then type dmesg in the guest, and you’ll have no way of telling
whether the whole freeze/thaw dance succeeded, or was never even
attempted.
But fear not, there’s a way that you can trace exactly what the kernel is doing!
tracefs, and configuring ftrace
If your guest runs any modern kernel, then chances are that it will,
by default, mount a virtual tracefs filesystem to the
/sys/kernel/debug/tracing mount point (although as of kernel 4.1,
this is nominally an alias, with /sys/kernel/tracing being the
canonical mount point). Regardless of its path, tracefs exposes the
kernel’s ftrace
functionality.
So the first thing you’ll tell ftrace, in your guest VM, is the
process for which you’ll want to do function tracing. In our case,
that’s your guest’s qemu-ga. So, you can do:
pidof qemu-ga > /sys/kernel/debug/tracing/set_ftrace_pid
Then, you’ll want to instruct ftrace to trace kernel function calls:
echo "function" > /sys/kernel/debug/tracing/current_tracer
And, you’ll want to make sure that we don’t trace only function calls
from qemu-ga itself, but also from its child processes:
echo "function-fork" > /sys/kernel/debug/tracing/trace_options
Let’s see what’s happening!
Now you have a guest that’s properly instrumented for tracing kernel
function calls that originate with qemu-ga. So now, go ahead and
take a snapshot. On OpenStack Nova, you’d do that with:
openstack server image create --name <image-name> <instance-name>
Then, shell back into your guest, and interrogate your trace for
ioctl calls:
grep -E '(freeze|thaw)_super.*ioctl' /sys/kernel/debug/tracing/trace
And voilà:
qemu-ga-14574 [001] .... 264.059109: freeze_super <-do_vfs_ioctl
qemu-ga-14574 [001] .... 265.837955: thaw_super <-do_vfs_ioctl
qemu-ga-14574 [001] .... 265.855048: thaw_super <-do_vfs_ioctl
qemu-ga-14574 [001] .... 265.855084: thaw_super <-do_vfs_ioctl
So that’s the FIFREEZE ioctl that maps to freeze_super(), and the
FITHAW ioctl that maps to thaw_super(). And that’s how you know that
your guest is freezing and thawing I/O as you expect it to!
Where to go from here
Feel free to dig further into your trace file (cat or less will
help), and play with other ftrace options. There’s a massive amount
of things you can do with it, as the
documentation
explains. You’ll probably also find this blog
post
from Julia Evans useful for exploring
ftrace.
Also, thank Steven Rostedt when you see him! He is the primary author of the ftrace framework.