When you are using functionality that is buried deep in the Linux
ftrace can be
extremely useful. Here are some suggestions on how to use it, using
the example of tracing function calls from
What’s this about?
Recently I used, for the first time, libvirt’s functionality to indicate to a virtual guest that it is about to have a point-in-time copy of its disks — a snapshot — taken. In doing so, it can tell the virtual machine (VM) to freeze I/O on all its mounted filesystems.
The rationale behind this is, I hope, obvious: you want the VM to momentarily stop I/O to its virtual disks, so that you can take a snapshot when no I/O is in-flight, and the snapshot image can thus be expected to be internally consistent. The snapshot itself will only take a second or so, and the minor interruption is a small price to pay for the added consistency guarantee you get.
You might be wondering how this works and it is, indeed, a bit involved.
First, you’ll need a virtual serial console that allows the hypervisor (in the host) to communicate with the guest. This will be defined in your libvirt domain XML, and in OpenStack Nova, this automatically pops up if you are booting your instance off an image which has the
Then, you’ll need a daemon within the guest that listens for commands received over the serial port. This daemon is called
qemu-gafor short. All you’ll need for it to run is to install the package of that name, which you can do in various ways (
apt-get install qemu-guest-agentbeing the simplest, on Ubuntu guests).
One of the many commands that said daemon supports is
guest-fsfreeze-freeze. When it receives that command over the virtual serial link, the daemon will loop over your mounted filesystems, and issue the
FIFREEZEioctl on all of them. This happens in reverse order, meaning your root (
/) filesystem is frozen last.
That ioctl then calls the
freeze_super()kernel function, which flushes each filesystem’s superblock, blocks (“freezes”) all new I/O to the filesystem, and syncs (flushes) all I/O that is currently in flight on that filesystem.
The combined net effect of all of the above is that you get a virtual machine that is temporarily read-only, with pending I/O piling up, until you are done taking your snapshot. When that happens, there are a few more actions that happen:
The hypervisor sends the
guest-fsfreeze-thawcommand over the virtual serial link. Now, the daemon will loop over all your mounted filesystems again, and issue the
FITHAWioctl on them. This time, it is taking the mounts in forward order, thawing the root filesystem first.
That ioctl then calls the
thaw_super()kernel function, which unblocks (“thaws”) all new I/O to the filesystem, and allows the VM to continue normal operations.
Now there’s a bit of an issue with that. All of the aforementioned
kernel functions only write
but they don’t tell you when they succeed. So you can try a snapshot,
dmesg in the guest, and you’ll have no way of telling
whether the whole freeze/thaw dance succeeded, or was never even
But fear not, there’s a way that you can trace exactly what the kernel is doing!
tracefs, and configuring
If your guest runs any modern kernel, then chances are that it will,
by default, mount a virtual tracefs filesystem to the
/sys/kernel/debug/tracing mount point (although as of kernel 4.1,
this is nominally an alias, with
/sys/kernel/tracing being the
canonical mount point). Regardless of its path, tracefs exposes the
So the first thing you’ll tell ftrace, in your guest VM, is the
process for which you’ll want to do function tracing. In our case,
that’s your guest’s
qemu-ga. So, you can do:
pidof qemu-ga > /sys/kernel/debug/tracing/set_ftrace_pid
Then, you’ll want to instruct
ftrace to trace kernel function calls:
echo "function" > /sys/kernel/debug/tracing/current_tracer
And, you’ll want to make sure that we don’t trace only function calls
qemu-ga itself, but also from its child processes:
echo "function-fork" > /sys/kernel/debug/tracing/trace_options
Let’s see what’s happening!
Now you have a guest that’s properly instrumented for tracing kernel
function calls that originate with
qemu-ga. So now, go ahead and
take a snapshot. On OpenStack Nova, you’d do that with:
openstack server image create --name <image-name> <instance-name>
Then, shell back into your guest, and interrogate your trace for
grep -E '(freeze|thaw)_super.*ioctl' /sys/kernel/debug/tracing/trace
qemu-ga-14574  .... 264.059109: freeze_super <-do_vfs_ioctl qemu-ga-14574  .... 265.837955: thaw_super <-do_vfs_ioctl qemu-ga-14574  .... 265.855048: thaw_super <-do_vfs_ioctl qemu-ga-14574  .... 265.855084: thaw_super <-do_vfs_ioctl
So that’s the
FIFREEZE ioctl that maps to
freeze_super(), and the
FITHAW ioctl that maps to
thaw_super(). And that’s how you know that
your guest is freezing and thawing I/O as you expect it to!
Where to go from here
Feel free to dig further into your
trace file (
help), and play with other
ftrace options. There’s a massive amount
of things you can do with it, as the
explains. You’ll probably also find this blog
from Julia Evans useful for exploring
Also, thank Steven Rostedt when you see him! He is the primary author of the ftrace framework.