How configuring VirtualBox to use Large Pages is greatly compromised under Linux.

One of the things which Linux users will often do is, to set up a Virtual Machine such as VirtualBox, so that a legitimate, paid-for instance of Windows can run as a Guest System, to our Linux Host System. And, because of the way VMs work, there is some possibility that to get them to use “Large Pages”, which under Linux have simply been named “Huge Pages”, could improve overall performance, mainly because without Huge Page support, the VM needs to allocate Memory Maps, which are subdivided into 512 standard pages, each of which has a standard size of 4KiB. What this means is that in practice, 512 individual memory allocations usually take place, where the caching and remapping requires 2MiB of memory. Such a line of memory can also end up, getting saved to the .VDI File – in the case of VirtualBox, from 512 discontiguous pieces of RAM.

The available sizes of Huge Pages depend on the CPU, and, in the case of the x86 / x86_64 CPUs, they tend to be either 2MiB in size or 1GiB, where 2MiB is already quite ambitious. One way to set this up is being summarized in the following little snip of commands, which need to be given as user:

 


VBoxManage modifyvm "PocketComp_20H2" --nestedpaging on
VBoxManage modifyvm "PocketComp_20H2" --largepages on

 

In my example, I’ve given these commands for the Virtual machine instance named ‘PocketComp_20H2‘, and, if the CPU is actually an Intel with ‘VT-x’ (hardware support for virtualization), large page or huge page -support should be turned on. Yet, like several other people, what I obtained next in the log file for the subsequent session, was the following line of output:

 


00:00:31.962754 PGMR3PhysAllocateLargePage: allocating large pages takes too long (last attempt 2813 ms; nr of timeouts 1); DISABLE

 

There exist users who searched the Internet in vain, for an explanation of why this feature would not work. I want to explain here, what goes wrong with most simple attempts. This is not really an inability of the platform to support the feature, as much as it’s an artifact, of how the practice of Huge Pages under Linux, differs from the theoretical, hypothetical way in which some people might want to use them. What will happen, if Huge Pages are to be allocated after the computer has started fully, is that Linux will be excruciatingly slow in doing so, at the request of the VM, because some RAM would need to be defragmented first.

This is partially due to the fact, that VirtualBox will want to map all the virtual RAM of the Guest System using them, and not, the .VDI File. (:1)  I.e., if the very modest Guest System has 4GiB of (virtual) RAM, it implies that 2048 Huge (2MiB) Pages will be needed, and those will take several minutes to allocate. If that Guest System is supposed to have larger amounts of RAM, the problem just gets worse. If the VM fails to allocate them within about 2 seconds of requesting them, it aborts, and continues with standard pages.

What Linux will offer as an alternative behaviour is, to allocate a fixed number of Virtual Pages on boot-up – when the memory is not yet very fragmented – and then, to allow any applications which ‘know how’, to help themselves to some of those Huge Pages. Thus, if 128 Huge Pages are to be preallocated, then the following snip shows, roughly how to do so, assuming a Debian distro. (:2)  Lines that begin with hash-marks (‘#‘) are commands that would need to be given as root. I estimate this number of Huge Pages to be appropriate for a system with 12GiB of RAM:

 


# groupadd hugetlbfs
# adduser dirk hugetlbfs
# getent group hugetlbfs

hugetlbfs:x:1002:dirk


# cd /etc
# edit text/*:sysctl.conf

vm.nr_hugepages = 128
vm.hugetlb_shm_group = 1002

# edit text/*:fstab

hugetlbfs       /hugepages      hugetlbfs mode=1770,gid=1002        0       0


# ulimit -H -l

(...)


# cd /etc/security
# edit text/*:limits.conf

@hugetlbfs      -       memlock         unlimited



 

The problem here is, that for a Guest System with 4GiB of virtual RAM to launch, 2048 Huge Pages would need to be preallocated, not, 128. To make things worse, Huge Pages cannot be swapped out! They remain locked in RAM. This means that they also get subtracted from the maximum number of KiB that a user is allowed to lock in RAM. In effect, 4GiB of RAM would end up, being tied up, not doing anything useful, until the user actually decides to start his VM (at which point, little additional RAM should be requested by VirtualBox).

Now, there could even exist Linux computers which are set up, on that set of assumptions. Those Linux boxes do not count as standard personal, desktop computers.

If the user wishes to know, how slow Linuxes tend to be, actually allocating some number of Huge Pages, after they have started to run fully, then he or she can just enter the following commands, after configuring the above, but, before rebooting. Normally, a reboot is required after what is shown has been configured, but instead, the following commands could be given in a hurry. My username ‘dirk‘ will still not belong to the group ‘hugetlbfs‘…

 


# sync ; echo 3 > /proc/sys/vm/drop_caches
# sysctl -p

 

I found that, on a computer which had run for days, with RAM that had gotten very fragmented, the second command took roughly 30 seconds to execute. Imagine how long it might take, if 2048 Huge Pages are indeed to be allocated, instead of 128.


 

What some people have researched on the Web – again, to find that nobody seems to have the patience to provide a full answer – is if, as indicated above, the mount-point for the HugeTLBFS is ‘/hugepages‘ – which few applications today would still try to use – whether that mount-point could just be used as a generic Ramdisk. Modern Linux applications simply use “Transparent Huge Pages”, not, access to this mount-point as a Ramdisk. And the real answer to this hypothetical question is No…

(Updated 5/20/2021, 8h20… )

 

Continue reading How configuring VirtualBox to use Large Pages is greatly compromised under Linux.

An Aspect to Hugetlbfs, which Many Sites Omit

I was recently troubleshooting the question of Huge Pages under Linux, which are pages of virtual memory with a size of 2MB each, instead of 4KB. And I was interested in the question of statically-allocated ones, even though modern, 64-bit Linux systems also offer Transparent Huge Pages. There was a gap in what was posted online, concerning the possibility or need to mount an actual ‘hugetlbfs’ file system, according to whether specific programs use one.

Under systems such as my own, as soon as the system boots with hugetlbfs enabled – i.e. a non-zero number of them – the kernel automatically creates a mount-point at ‘/dev/hugepages’. This mount-point is created without any administrator ‘fstab’ entry, and belongs to user and group ‘root’. According to some needs, such a mount-point would better be created as belonging to a specific group, and as having the option ‘mode=1770′ set. And so before checking the default behavior of my kernel, I also created a suitable mount-point at ‘/mnt/hugepages’. The question remained in my mind, of whether any way exists to give the automatic mount-point at ‘/dev/hugepages’ my custom options. And the answer seems to be No.

Here’s what the online documentation forgets to mention:

If you have a program which requires such a mount-point, there will be a line in its config file, that states where it’s located. Any files created in such a mount-point will then consume statically-allocated huge pages of RAM.

Because MySQL is not an example of a program that requires this, its config file also requires no line, to tell it which mount-point to use. MySQL uses memory-allocation functions directly, to ask the kernel for huge pages.

I suppose that it does no real harm, if there is more than one hugetlbfs mounted at any time, as long as the unneeded one is not wasting any pages, let’s say as long as absolutely no files are being created in the unused mount-point. And then if we need to, we can in fact give our custom hugetlbfs mount-point whatever properties we think it should have, via the mount options or the fstab.

Because I didn’t need the one I had created, I simply got rid of it again. Besides which, the fact that one is automatically created at ‘/dev/hugepages’ these days, suggests that future programs that need it, will already be configured to look for it there. And then it would also make sense, if those programs were able to deal with the fact that that one belongs to user and group ‘root’.

Dirk