TL; DR: vmtouch, the brainchild of Developer Doug Hoyte, is a time-tested tool for inspecting and controlling the portable file system cache of Unix and Unix-like systems. The open-source solution helps users discover precisely what files the operating system is caching, speed up batch and cron jobs, preserve virtual memory, and plot file system cache usage over time, among other use cases. Doug told us he plans to continue his legacy in the open-source space through advancements of vmprobe and an upcoming guide on virtual memory.
Some of the best innovations are rooted in an entrepreneur’s desire to fulfill a personal need.
For example, Matt Mullenweg created WordPress after the developer of the blogging tool he was using unexpectedly stopped releasing updates. Left at an impasse, Matt took the codebase into his own hands, transforming it into the popular CMS it is today.
“I realized there was no easy way to inspect or control what is in the file system cache,” Doug told us. “I was also very interested in observing what happens behind the scenes in the operating system. So I quickly wrote up this little program in C, used it to solve my problem, published it, and didn’t think much about it.”
That changed when people started to email Doug to send in patches or make requests. At that point, he put the open-source tool on GitHub, where it attracted users in droves.
“It turned out to be more of a success than I had originally thought,” Doug said. “It was such a simple tool, but it really started to occur to me that it might be interesting and valuable when I started to get messages from companies like Instagram and Spotify.”
Such companies were using vmtouch to pre-cache important files before adding a server to an existing pool, which allowed them to avoid spikes in latency. They also used the tool to preserve the cache after a reboot.
“In response, I started to develop other features for vmtouch as well as another tool called vmprobe,” Doug said. “Moving forward, I’m working on a book about virtual memory in which I am going to refine vmtouch into a reasonable library and offer more flexible tools.”
Providing Portable File System Cache Visibility for Over 10 Years
Today, vmtouch has evolved into a time-tested tool for inspecting and controlling the portable file system cache of Unix and Unix-like systems. The open-source solution helps users discover what files the OS is caching, speed up batch and cron jobs, preserve virtual memory, and plot file system cache usage over time, among other use cases.
Doug said that, over the years, one of the biggest and most relevant shifts he observed in the industry was the widespread adoption of solid-state drives (SSDs).
“When I first wrote vmtouch, SSDs weren’t widely recognized, and they were still extremely expensive,” he said. “But nowadays everything has SSDs. The categorical difference is that people think it’s not quite as valuable to maintain a file system cache. However, there’s still a huge jump between DRAM and SSD latency, so I think vmtouch is quite relevant in cases like that.”
In the future, Doug predicts memory technologies will continue to evolve, citing innovations such as Intel’s 3D XPoint, a breakthrough in non-volatile memory that provides similar latency to DRAM. “We will look to rewrite our applications to take advantage of this shift in storage,” he said.
Ultimately, Doug is a developer at heart, and he’s always searching for R&D opportunities.
“I like to experiment with things to see how far I can take the interfaces that are provided,” Doug said. “If there’s any way I can use them in an unusual manner to get an interesting result, I like to see what comes of it. I find this stuff really fun.”
OS Caching Discovery, Memory Preservation, and Faster Batch Jobs
vmtouch is suitable in a wide variety of use cases. The tool is helpful for discovering which files the operating system is caching, directing the operating system to cache or evict files, locking files into memory, and preserving virtual memory profiles when failing over servers.
It can also be used to create redundancy with a hot-standby file server, plot file system cache usage over time, maintain soft quotes of cache usage, and speed up batch and cron jobs, among other purposes.
“I think the coolest aspect of vmtouch is that it gives you visibility into what’s actually happening with the page cache and the operating system’s managed memory,” Doug said. “A lot of those interfaces are very opaque, so it’s not really clear what the system is doing in terms of fine-grained diagnostics.”
With vmtouch, Doug said users can get answers to detailed questions. “You might ask, ‘What is the page cache doing right now? What is all this memory?’” he said. “Or, if 40% of your memory is storing file system cache, you would want to know what files are included in that. These are the questions that vmtouch allows you to answer.”
Another aspect of vmtouch that made the open-source solution increasingly popular is that it allows for high levels of control.
“The Linux kernel, in particular, gives you a lot of little features and flags and ways of controlling things, but they’re not exposed as command-line tools,” Doug said. “So I think that was what made vmtouch take off — the fact that people could play with it on the command line as opposed to having to write a C program.”
The Next Generation of vmtouch: vmprobe
Doug is also helping the open-source community speed up database queries and improve the failover process using vmprobe, a utility for inspecting and optimizing virtual memory. The program, which Doug wrote in 2016 as the next generation of the vmtouch, is part research platform and part production tool.
On one hand, vmprobe technology can be used to maintain a hot-standby server: defined as a backup server that will automatically fail over in the case that the primary server fails. Using the vmprobe daemon to duplicate the virtual memory state from the primary to the standby makes failover events far less stressful.
And, because it’s impossible to predict when a failure may happen, vmprobe delivers a standby feature that lets users continuously copy the filesystem cache from the primary server to multiple backups.
The technology can also be used to speed up database operations through intelligent management of the filesystem cache. While many low-level algorithms commonly used today only make use of information that can be determined locally, vmprobe uses all information available.
“vmprobe will give you much more information about the Linux kernel, like the most recently accessed pages and memory,” Doug said.
“I have some plans in the future to also support anonymous memory mapping because essentially they’re the same thing on Linux — page cache files back mappings, and then you have swap backed mappings which are anonymous, but essentially it’s all the same machinery,” he said.
Zero Copy: Hoyte’s Upcoming Guide to Virtual Memory
The book, a comprehensive exploration of virtual memory, is intended to help developers write better performing, more efficient applications. It contains everything from a thorough introduction to the topic to what the future will hold in terms of computing with virtual memory-assisted technologies.
“I’m also writing a bunch of tools to go along with it,” Doug said. I’m excited about all of the interesting stuff that is going to come out of this project.”