How configuring VirtualBox to use Large Pages is greatly compromised under Linux.

One of the things which Linux users will often do is, to set up a Virtual Machine such as VirtualBox, so that a legitimate, paid-for instance of Windows can run as a Guest System, to our Linux Host System. And, because of the way VMs work, there is some possibility that to get them to use “Large Pages”, which under Linux have simply been named “Huge Pages”, could improve overall performance, mainly because without Huge Page support, the VM needs to allocate Memory Maps, which are subdivided into 512 standard pages, each of which has a standard size of 4KiB. What this means is that in practice, 512 individual memory allocations usually take place, where the caching and remapping requires 2MiB of memory. Such a line of memory can also end up, getting saved to the .VDI File – in the case of VirtualBox, from 512 discontiguous pieces of RAM.

The available sizes of Huge Pages depend on the CPU, and, in the case of the x86 / x86_64 CPUs, they tend to be either 2MiB in size or 1GiB, where 2MiB is already quite ambitious. One way to set this up is being summarized in the following little snip of commands, which need to be given as user:

 


VBoxManage modifyvm "PocketComp_20H2" --nestedpaging on
VBoxManage modifyvm "PocketComp_20H2" --largepages on

 

In my example, I’ve given these commands for the Virtual machine instance named ‘PocketComp_20H2‘, and, if the CPU is actually an Intel with ‘VT-x’ (hardware support for virtualization), large page or huge page -support should be turned on. Yet, like several other people, what I obtained next in the log file for the subsequent session, was the following line of output:

 


00:00:31.962754 PGMR3PhysAllocateLargePage: allocating large pages takes too long (last attempt 2813 ms; nr of timeouts 1); DISABLE

 

There exist users who searched the Internet in vain, for an explanation of why this feature would not work. I want to explain here, what goes wrong with most simple attempts. This is not really an inability of the platform to support the feature, as much as it’s an artifact, of how the practice of Huge Pages under Linux, differs from the theoretical, hypothetical way in which some people might want to use them. What will happen, if Huge Pages are to be allocated after the computer has started fully, is that Linux will be excruciatingly slow in doing so, at the request of the VM, because some RAM would need to be defragmented first.

This is partially due to the fact, that VirtualBox will want to map all the virtual RAM of the Guest System using them, and not, the .VDI File. (:1)  I.e., if the very modest Guest System has 4GiB of (virtual) RAM, it implies that 2048 Huge (2MiB) Pages will be needed, and those will take several minutes to allocate. If that Guest System is supposed to have larger amounts of RAM, the problem just gets worse. If the VM fails to allocate them within about 2 seconds of requesting them, it aborts, and continues with standard pages.

What Linux will offer as an alternative behaviour is, to allocate a fixed number of Virtual Pages on boot-up – when the memory is not yet very fragmented – and then, to allow any applications which ‘know how’, to help themselves to some of those Huge Pages. Thus, if 128 Huge Pages are to be preallocated, then the following snip shows, roughly how to do so, assuming a Debian distro. (:2)  Lines that begin with hash-marks (‘#‘) are commands that would need to be given as root. I estimate this number of Huge Pages to be appropriate for a system with 12GiB of RAM:

 


# groupadd hugetlbfs
# adduser dirk hugetlbfs
# getent group hugetlbfs

hugetlbfs:x:1002:dirk


# cd /etc
# edit text/*:sysctl.conf

vm.nr_hugepages = 128
vm.hugetlb_shm_group = 1002

# edit text/*:fstab

hugetlbfs       /hugepages      hugetlbfs mode=1770,gid=1002        0       0


# ulimit -H -l

(...)


# cd /etc/security
# edit text/*:limits.conf

@hugetlbfs      -       memlock         unlimited



 

The problem here is, that for a Guest System with 4GiB of virtual RAM to launch, 2048 Huge Pages would need to be preallocated, not, 128. To make things worse, Huge Pages cannot be swapped out! They remain locked in RAM. This means that they also get subtracted from the maximum number of KiB that a user is allowed to lock in RAM. In effect, 4GiB of RAM would end up, being tied up, not doing anything useful, until the user actually decides to start his VM (at which point, little additional RAM should be requested by VirtualBox).

Now, there could even exist Linux computers which are set up, on that set of assumptions. Those Linux boxes do not count as standard personal, desktop computers.

If the user wishes to know, how slow Linuxes tend to be, actually allocating some number of Huge Pages, after they have started to run fully, then he or she can just enter the following commands, after configuring the above, but, before rebooting. Normally, a reboot is required after what is shown has been configured, but instead, the following commands could be given in a hurry. My username ‘dirk‘ will still not belong to the group ‘hugetlbfs‘…

 


# sync ; echo 3 > /proc/sys/vm/drop_caches
# sysctl -p

 

I found that, on a computer which had run for days, with RAM that had gotten very fragmented, the second command took roughly 30 seconds to execute. Imagine how long it might take, if 2048 Huge Pages are indeed to be allocated, instead of 128.


 

What some people have researched on the Web – again, to find that nobody seems to have the patience to provide a full answer – is if, as indicated above, the mount-point for the HugeTLBFS is ‘/hugepages‘ – which few applications today would still try to use – whether that mount-point could just be used as a generic Ramdisk. Modern Linux applications simply use “Transparent Huge Pages”, not, access to this mount-point as a Ramdisk. And the real answer to this hypothetical question is No…

(Updated 5/20/2021, 8h20… )

 

 

(As of 5/19/2021: )

The reason why not is the fact that, if a System Software Engineer decides to implement some sort of specialized File System, he or she has the option, of only implementing a subset of the I/O operations on that file system, of the I/O operations that regular File Systems must support. Thus, it’s easier to implement an FS, which allows files to be created and deleted, but which only implements Memory Maps as the way to access those files. Programming time can be skipped, by not implementing Stream I/O operations. But, trying to use this as a G.P. FS, requires that Stream I/O be implemented, because this method of accessing the FS is still commonly used in regular programming, as well as by ‘bash’.


 

If the above method does not work, then there is an alternative set of commands which can be given:

 


# cat /sys/kernel/mm/transparent_hugepage/enabled
always [madvise] never
# echo always > /sys/kernel/mm/transparent_hugepage/enabled
# cat /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never

 

 

What this changed parameter will do is, to tell the kernel to promote regular memory maps to Transparent Huge Pages, even if the ‘madvise()‘ function has not been called on them, provided that the existing state of fragmentation does not prevent doing so (see below). There does exist a corresponding kernel parameter, which could additionally cause defragmentation of memory, such that this goal is always attained:

 


# cat /sys/kernel/mm/transparent_hugepage/defrag
always defer [madvise] never
# echo defer+madvise > /sys/kernel/mm/transparent_hugepage/defrag
# cat /sys/kernel/mm/transparent_hugepage/defrag
always [defer] madvise never

 

 

However, I don’t think it would be wise on my box, to set this to ‘always‘ as well. Too pervasive a reorganization of my memory would take place, and each allocation of a memory map would wait on that memory map being defragmented into a huge page -based memory map. That would actually threaten the stability of my session. ‘defer+madvise‘ Seems like a good compromise.


 

1:)

An interesting fact which I have just read on the subject is, that conventional Linux file systems cannot be memory-mapped to huge pages. As of 2017, this included ‘ext4‘ file systems. This restriction kind of makes sense, because in order for that to work, the huge pages would also need to be contiguous on the drive. OTOH, If a patched Linux kernel does support it, then the file system must be mounted with the option ‘huge=always‘. This would make sure that data on the drive is aligned with huge pages at all times.

It’s standard behaviour that if the length of the file, or, the length of its first non-contiguous segment, falls short of a multiple of the page size, that last page of the memory map will initially contain zeros corresponding to byte locations not allocated in the file. Then, data written to those bytes will just be ignored and not written to the file. For this reason, the function ‘ftruncate()‘ will frequently be used before the file descriptor is memory-mapped (to standard-size pages), to extend the length of the file explicitly to a multiple of the page size. At that point, if the length of the file has been extended in a way known to the programmer in advance, those added bytes in the memory map can also just have zeros written to them, and be left ‘dirty’, in order to speed the execution of the program. Once the file is closed, those bytes will either contain the zeros, or contain whatever data was written to the memory map afterwards.


 

2:)

An interesting article which I read about this here, suggests that whether the first suggested method of getting VirtualBox to use Huge Pages will work, depends on whether VirtualBox uses the ‘mmap()‘ function-call, with the ‘MAP_HUGETLB‘ flag, to allocate them, as opposed to what the true meaning of Transparent Virtual pages implies, which is, that the ‘madvise()‘ function is called, with the ‘MADV_HUGEPAGE‘ flag, on an existing memory map’s base pointer.


 


$ cat /proc/meminfo

(...)
AnonHugePages:    524288 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
HugePages_Total:     128
HugePages_Free:      123
HugePages_Rsvd:       27
HugePages_Surp:        0
Hugepagesize:       2048 kB
(...)

 


 

(Update 5/20/2021, 8h20: )

What this posting was suggesting, without saying it explicitly, was, that it’s the second method shown above, not the first, which will improve the speed of a VirtualBox VM, as long as the Host Machine has an Intel CPU with VT-x (and, as long as ‘--nestedpaging‘ is ‘on‘). According to my brief experimentation, I did seem to notice a performance increase.

But, two questions this had left unanswered were, whether the computer should just be left with the two parameters modified, for long periods of time, and, whether the effects of having done this can be reversed.

What I found was, that even though the speed of programs can be improved this way, that allocate and manage large, contiguous regions of RAM (even, sparsely populated regions of RAM), it actually makes loading numerous programs slower, each of which only manage smaller regions of RAM… Although I noticed no outright malfunctions, I did not feel comfortable leaving these settings as suggested, after a trial period of one day, at which point I had ‘~1GB of Anonymous Huge Pages’.

While the character string ‘madvise‘ can be fed to the virtual locations shown above, as easily as ‘always‘ and ‘defer+madvise‘ were – using a shell script run as root – this will affect the way in which subsequently allocated memory maps behave, by no longer trying to promote those to memory maps based on Huge Pages. However, the kernel has no functions, which would actually tessellate already-allocated memory maps, that happen to have Huge Pages. If the kernel did that, it could actually break the way some programs work. Therefore, already-allocated memory maps that use Huge Pages will simply remain that way, until they are naturally deallocated during the normal operation of the computer and its programs.

After having reverted the settings last night, this morning I found, that the computer in question still had “430080 kB” worth of Anonymous Huge Pages allocated. I don’t think that the number will get much smaller until I perform a reboot.

And yet, as I wrote, this does not seem to cause any malfunctions. And, the loading of numerous smaller applications has become quick again.

 

Dirk

 

Print Friendly, PDF & Email

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>