Using ftrace to trace function calls from qemu-guest-agent
Posted on Wed 21 August 2019 in hints-and-kinks • 4 min read
When you are using functionality that is buried deep in the Linux
kernel, ftrace
can be
extremely useful. Here are some suggestions on how to use it, using
the example of tracing function calls from qemu-guest-agent
.
What’s this about?
Recently I used, for the first time, libvirt’s functionality to indicate to a virtual guest that it is about to have a point-in-time copy of its disks — a snapshot — taken. In doing so, it can tell the virtual machine (VM) to freeze I/O on all its mounted filesystems.
The rationale behind this is, I hope, obvious: you want the VM to momentarily stop I/O to its virtual disks, so that you can take a snapshot when no I/O is in-flight, and the snapshot image can thus be expected to be internally consistent. The snapshot itself will only take a second or so, and the minor interruption is a small price to pay for the added consistency guarantee you get.
You might be wondering how this works and it is, indeed, a bit involved.
-
First, you’ll need a virtual serial console that allows the hypervisor (in the host) to communicate with the guest. This will be defined in your libvirt domain XML, and in OpenStack Nova, this automatically pops up if you are booting your instance off an image which has the
hw_qemu_guest_agent=yes
property set. -
Then, you’ll need a daemon within the guest that listens for commands received over the serial port. This daemon is called
qemu-guest-agent
, orqemu-ga
for short. All you’ll need for it to run is to install the package of that name, which you can do in various ways (apt-get install qemu-guest-agent
being the simplest, on Ubuntu guests). -
One of the many commands that said daemon supports is
guest-fsfreeze-freeze
. When it receives that command over the virtual serial link, the daemon will loop over your mounted filesystems, and issue theFIFREEZE
ioctl on all of them. This happens in reverse order, meaning your root (/
) filesystem is frozen last. -
That ioctl then calls the
freeze_super()
kernel function, which flushes each filesystem’s superblock, blocks (“freezes”) all new I/O to the filesystem, and syncs (flushes) all I/O that is currently in flight on that filesystem.
The combined net effect of all of the above is that you get a virtual machine that is temporarily read-only, with pending I/O piling up, until you are done taking your snapshot. When that happens, there are a few more actions that happen:
-
The hypervisor sends the
guest-fsfreeze-thaw
command over the virtual serial link. Now, the daemon will loop over all your mounted filesystems again, and issue theFITHAW
ioctl on them. This time, it is taking the mounts in forward order, thawing the root filesystem first. -
That ioctl then calls the
thaw_super()
kernel function, which unblocks (“thaws”) all new I/O to the filesystem, and allows the VM to continue normal operations.
Now there’s a bit of an issue with that. All of the aforementioned
kernel functions only write printk
’s on
error,
but they don’t tell you when they succeed. So you can try a snapshot,
then type dmesg
in the guest, and you’ll have no way of telling
whether the whole freeze/thaw dance succeeded, or was never even
attempted.
But fear not, there’s a way that you can trace exactly what the kernel is doing!
tracefs, and configuring ftrace
If your guest runs any modern kernel, then chances are that it will,
by default, mount a virtual tracefs filesystem to the
/sys/kernel/debug/tracing
mount point (although as of kernel 4.1,
this is nominally an alias, with /sys/kernel/tracing
being the
canonical mount point). Regardless of its path, tracefs exposes the
kernel’s ftrace
functionality.
So the first thing you’ll tell ftrace, in your guest VM, is the
process for which you’ll want to do function tracing. In our case,
that’s your guest’s qemu-ga
. So, you can do:
pidof qemu-ga > /sys/kernel/debug/tracing/set_ftrace_pid
Then, you’ll want to instruct ftrace
to trace kernel function calls:
echo "function" > /sys/kernel/debug/tracing/current_tracer
And, you’ll want to make sure that we don’t trace only function calls
from qemu-ga
itself, but also from its child processes:
echo "function-fork" > /sys/kernel/debug/tracing/trace_options
Let’s see what’s happening!
Now you have a guest that’s properly instrumented for tracing kernel
function calls that originate with qemu-ga
. So now, go ahead and
take a snapshot. On OpenStack Nova, you’d do that with:
openstack server image create --name <image-name> <instance-name>
Then, shell back into your guest, and interrogate your trace for
ioctl
calls:
grep -E '(freeze|thaw)_super.*ioctl' /sys/kernel/debug/tracing/trace
And voilà:
qemu-ga-14574 [001] .... 264.059109: freeze_super <-do_vfs_ioctl
qemu-ga-14574 [001] .... 265.837955: thaw_super <-do_vfs_ioctl
qemu-ga-14574 [001] .... 265.855048: thaw_super <-do_vfs_ioctl
qemu-ga-14574 [001] .... 265.855084: thaw_super <-do_vfs_ioctl
So that’s the FIFREEZE
ioctl that maps to freeze_super()
, and the
FITHAW
ioctl that maps to thaw_super()
. And that’s how you know that
your guest is freezing and thawing I/O as you expect it to!
Where to go from here
Feel free to dig further into your trace
file (cat
or less
will
help), and play with other ftrace
options. There’s a massive amount
of things you can do with it, as the
documentation
explains. You’ll probably also find this blog
post
from Julia Evans useful for exploring
ftrace
.
Also, thank Steven Rostedt when you see him! He is the primary author of the ftrace framework.