Exploding memory usage in Django/uWSGI containers
Posted on Sat 07 December 2024 in hints-and-kinks • 3 min read
When running Open edX on Kubernetes clusters, one of its Pods is the lms
Pod, which runs the core of the Open edX Learning Management System (LMS).
This is a relatively complex Django application, which runs in the Pod’s sole container. Said Django application is being launched with uWSGI.
At work, we had previously run this platform on Kubernetes clusters managed with OpenStack Magnum, and were in the process of migrating to Gardener. Apart from the fact that we were upgrading to a newer Kubernetes release, this also meant that the base operating system of our Kubernetes worker nodes changed from Fedora CoreOS to Garden Linux (which is effectively a Kubernetes-optimised Debian). The virtualisation platform underpinning the Kubernetes cluster remained the same (OpenStack).
Mid-migration, we suddenly noticed that our cluster was oom-killing our lms
pods.
Now this shouldn’t happen, for the following reasons:
- Normally, Kubernetes only kills a Pod for excessive memory usage when a memory limit is set on that Pod, which wasn’t the case.
- Otherwise (that is, with no memory limit set), Pods get killed only by the “regular” kernel oom-killer, and that should only happen when the Pod is grossly misconfigured — that is, its actual memory usage far exceeds its configured memory request.
We quickly found out (via kubectl top pod
) that we were dealing with the latter of these two: our lms
Pod was consuming a whopping 8 GiB of memory when running on the Gardener-managed cluster — nearly 4 times the memory request of 2 GiB.
This had us scratching our heads, for on the Magnum-managed cluster it was previously running on, that same pod had typically consumed only 80-120 MiB of memory (with occasional spikes). Thus, we were dealing with baseline memory usage that had suddenly increased by two orders of magnitude.
Now to explain this memory usage jump, you’ll need this background information:
- The
corerouter
plugin in uWSGI maintains an array of file descriptor references. - The size of this array, and with it its memory usage, is a multiple of the value set for uWSGI’s
max-fd
configuration option.1 - If
max-fd
has not been set in the uWSGI configuration, its default is the maximum number of open file handles allowed for the process per the system-wide configuration. - Said default can be defined by the
nofiles
ulimit, or a cgroups restriction. A cgroups restriction is also what systemd uses to implement theLimitNOFILE
option, which can be set on any systemd unit.2 - If neither the ulimit nor a cgroups restriction is in place, the
fs.nr_open
sysctl, if set, acts as a backstop.
Prior to release 256, systemd effectively bumped the default for LimitNOFILE
from 1048576 (2²⁰) to infinity
, which meant that rather than setting its own cgroups limit, it would rely on fs.nr_open
.
And that value was recently upped in some distributions to 1073741824 (2³⁰) — an increase by a factor of 2¹⁰ or 1024 over the previously applicable value.
This change was also applied on Debian (which Garden Linux is based on), and it was even discussed on the Debian mailing list — where ironically, concerns about raising this limit were pre-emptively quashed with the assertion that file descriptors are such an “extremely cheap resource” that it does not hurt to allow absurdly high numbers of them.
In the uWSGI case, however, this had the somewhat devastating effect of increasing memory usage to insane levels.
To their credit, the Garden Linux developers identified this flaw (which, to my knowledge was baked into their version 1592.2), and fixed it in version 1592.3.
Still, to insulate ourselves from further such issues, we have opted to reconfigure our systems to run uWSGI with an explicitly defined max-fd
option, set to the prior system-wide default of 1048576 (although setting it to something as low as 1024 would probably work too).
Acknowledgements
Lothar Bach, Brennan Kinney, Piotr Kucułyma, Namrata Sitlani, and Maari Tamm all contributed to the findings discussed in this article.3
-
See the source, which at the time of writing reads:
ucr->cr_table = uwsgi_malloc(sizeof(struct corerouter_session *) * uwsgi.max_fd);
↩ -
As far as I can tell, at the time of writing the table captioned “Resource limit directives” in the
systemd.exec
man page is outdated and incorrect as far asLimitNOFILE
’s default is concerned, and also the “Don’t use” admonition seems misguided at this point. ↩ -
I’ve listed these individuals in alphabetical order by surname. ↩