By Jonathan Corbet
May 4, 2022
LSFMM
A memory-folio update
The folio project is not yet two years old, but it has already resulted in significant
changes to the kernel's memory-management and filesystem layers. While much
work has been done, quite a bit remains. In the opening plenary session at the 2022
Linux Storage, Filesystem, Memory-management and BPF Summit, Matthew
Wilcox provided an update on the folio transition and led a discussion on the work
that remains to be done.
Wilcox began with an overview of the folio work, a more complete description of which can be found in the
above-linked article. In short, a folio is a way of representing a set of physically contiguous base pages. It is a
response to a longstanding confusion in the memory-management subsystem, wherein a "page" can refer either
to a base page or a larger compound page. Adding a new term disambiguates the term "page" and simplifies
many memory-management interfaces.
Beyond terminology, there is another motivation for the folio work. The kernel really needs to manage memory
in larger chunks than 4KB base pages. There are millions of those pages even on a typical laptop; that is a lot of
pages to manage and a pain to deal with in general, causing the waste of a lot of time and energy. Better
interfaces are needed to facilitate management of larger units, though; folios are meant to be that better
interface.
Current status
A folio is represented by struct folio; it is essentially an alias for the head page of a compound page. Wilcox
has been adding uses of folios into the kernel over the course of the last year; this project has come a long way
but is not yet complete.
One open question concerns when the kernel should allocate large folios — those containing more than one base
page. Only the readahead code allocates them now; the filesystem write path still does everything in terms of
base pages. If writes are done to large folios that were brought in via readahead,
they will see and use those large folios. Appending to a file will always use base
pages, though. There are almost certainly advantages to using large folios in the
write path, but it will be necessary to figure out what the criteria for creating
them will be.
Meanwhile, the process of converting filesystem code to folios continues.
Wilcox encouraged filesystem developers to look for infrastructure that already
exists when possible rather than reimplementing it themselves. He pointed out
the support layer for network filesystems that was recently rewritten by David
Howells. It would also be good for filesystems to move away from the old
buffer-head APIs and use the relatively new iomap infrastructure whenever
possible.
Ted Ts'o said that more guidance on conversion to iomap would be useful.
Moving a filesystem over can be a daunting task, he said, but developers should understand that it can be done
incrementally. For example, a filesystem's read path can be converted while leaving the write path unchanged
for now. This can be useful, Wilcox agreed, especially since iomap is still missing some capabilities, such a
support for features like fs-verity or compression. That lack is often more problematic on the write side than on
the read side.