The Advantages of using a Slab Allocator

When people take their first C programming courses, they are taught about the standard allocator named ‘malloc()‘, while when learning C++, we were first taught about its standard allocator, named ‘new‘.

These allocators work on the assumption that a program is running in user space, and may not always be efficient at allocating smaller chunks of memory. They assume that a standard method of managing the heap is in-place, where the heap of any one process is a part of that process’s memory-image, and partially managed by the kernel.

Not only that, but when we tell either of these standard operators to allocate a chunk of memory, the allocator recognizes the size of that chunk, prepends to the chunk of memory a binary representation of its size, and before returning a pointer to the allocated memory, subtracts the size of the binary representation, of the size originally requested by the programmer. Thus, the pointer returned by either of these allocators points directly to the memory which the programmer can use, even though the allocated chunk is larger, and preceded by a binary representation of its own size. That way, when the command is given to deallocate, all the deallocation-function needs to receive in principle, is a pointer to the allocated chunk, and the deallocation-function can then find the header that was inserted from there, to derive how much memory to delete.

I suppose that one conclusion to draw from this is, that even though it looks like a good exercise to teach programming students, the exercise of always allocating a 32-bit or a 64-bit object – i.e., a 4-byte or an 8-byte object – such as an integer, to obtain an 8-byte pointer to that integer, is actually not a good one, because in addition to the requested 8 bytes, an additional header is always being allocated, which may add 4 bytes if the maximum allocated size is a 32-bit number, or add 8 bytes if the maximum allocated size (of one chunk) is a 64-bit number.

Additionally, these allocators assume the support of the kernel, to a user-space process, the latter of which has a heap. On 64-bit systems that are ‘vmalloc‘-based, this requires the user-space application try to access virtual address ‘0x0000 0000 0000 0000‘, which intentionally results in a page-fault, and stops the process. The kernel then needs to examine why the page-fault occurred, and since this was a legitimate reason, needs to set up the virtual page-frame, of an address returned to the (restarted) user-space process, via the usual methods for returning values.

And so means also needed to exist, by which a kernel can manage memory more-efficiently, even under the assumption that the kernel does not have the sort of heap, that a user-space process does. And one main mechanism for doing so, is to use a slab allocator. It will allocate large numbers of small chunks, without requiring as much overhead to do so, as the standard user-space allocators did. In kernel-space, these slabs are the main replacement for a heap.

(Updated 06/20/2017 … )

(Edit 06/20/2017 :

I suppose that I should add, that when the standard, user-space allocator – such as ‘vmalloc‘ – gets used by an application, there are essentially two types of allocations taking place.

The first is the allocation by the kernel of an entire 4KB page of virtual memory, to the process. The second would be, the allocation of a smaller chunk of memory – such as even an 8-byte chunk – to the code which executed the ‘malloc()‘ or the ‘new‘ instruction.

Because of this, the page of virtual memory cannot be deallocated again, until every chunk of (programmer-visible) memory has been deallocated first, that resided within it. In practice, this happens infrequently in the case of smaller chunks. Therefore, the heap tends to grow over time, because newer pages are still being allocated, but older pages, that now have deallocated chunks within them, still cannot be deallocated from the process’ address-space. )

But then applications also exist in user-space, which might want to use a slab-allocator, for some but not all of the reasons the kernel has one. And so a C-function exists under Linux, called ‘kmalloc()‘, which offers this in user-space. If the number of small objects to be allocated truly becomes large, it starts to become memory-efficient. And using a slab-allocator is generally more time-efficient.

In user-space, memory is first allocated to the heap of the process using the standard allocator, corresponding to 1 page of the slab-allocator, but then usually remains allocated for as long as the application keeps running, so that the deallocation and reallocation of chunks belonging to the slabs, no longer involves the standard allocator.

So there exist specialized applications, such as ‘memcached, where the main constraint is time, i.e. where allocation should be extremely fast.

Now, there exist experts who will tell the reader, that their implementation of the slab allocator is actually a ‘slub allocator’, but I consider this to be a refinement, that improves performance, and which I don’t need to know all the details about.

While on my computer, slab-classes 2 and 3 offer 120-byte and 152-byte chunk-sizes, it would have made sense for slab-class 1 to offer a much smaller chunk-size, to catch requests from certain programmers for the smallest-possible chunk.

In practical computing, a possible C++ class that one might like to implement could be a ‘cons object’, which contains two addresses, the head and the tail of a linked-list. This also represents the smallest chunk that I could think to allocate via ‘new‘, and on 64-bit systems, would represent a 16-byte allocation.

Possibly, slab-class 1 is still larger than that. (Note: The chunk-sizes of slab-classes 2 and 3 seem to differ by 32 for some reason. )

(Edit : Below are the real slab-sizes, in Base 10. In many cases, the same information is represented in Base 2^10 or in Base 2^20, to define KB and MB respectively. )

 


dirk@Phoenix:~$ memcached -vv
slab class   1: chunk size        96 perslab   10922
slab class   2: chunk size       120 perslab    8738
slab class   3: chunk size       152 perslab    6898
slab class   4: chunk size       192 perslab    5461
slab class   5: chunk size       240 perslab    4369
slab class   6: chunk size       304 perslab    3449
slab class   7: chunk size       384 perslab    2730
slab class   8: chunk size       480 perslab    2184
slab class   9: chunk size       600 perslab    1747
slab class  10: chunk size       752 perslab    1394
slab class  11: chunk size       944 perslab    1110
slab class  12: chunk size      1184 perslab     885
slab class  13: chunk size      1480 perslab     708
slab class  14: chunk size      1856 perslab     564
slab class  15: chunk size      2320 perslab     451
slab class  16: chunk size      2904 perslab     361
slab class  17: chunk size      3632 perslab     288
slab class  18: chunk size      4544 perslab     230
slab class  19: chunk size      5680 perslab     184
slab class  20: chunk size      7104 perslab     147
slab class  21: chunk size      8880 perslab     118
slab class  22: chunk size     11104 perslab      94
slab class  23: chunk size     13880 perslab      75
slab class  24: chunk size     17352 perslab      60
slab class  25: chunk size     21696 perslab      48
slab class  26: chunk size     27120 perslab      38
slab class  27: chunk size     33904 perslab      30
slab class  28: chunk size     42384 perslab      24
slab class  29: chunk size     52984 perslab      19
slab class  30: chunk size     66232 perslab      15
slab class  31: chunk size     82792 perslab      12
slab class  32: chunk size    103496 perslab      10
slab class  33: chunk size    129376 perslab       8
slab class  34: chunk size    161720 perslab       6
slab class  35: chunk size    202152 perslab       5
slab class  36: chunk size    252696 perslab       4
slab class  37: chunk size    315872 perslab       3
slab class  38: chunk size    394840 perslab       2
slab class  39: chunk size    493552 perslab       2
slab class  40: chunk size    616944 perslab       1                            
slab class  41: chunk size    771184 perslab       1                            
slab class  42: chunk size   1048576 perslab       1


 

Observation: Each chunk-size is divisible by 8 bytes, which fits the addressing-constraints of 64-bit PCs well. So the method applied seems to be:

  • Start with 96.
  • For each successive slab-class, multiply by 1.25, and round up to the nearest 8.
  • Except for the last slab-class, in which the chunk-size just equals the page-size. In this case, the next chunk-size in the sequence would still have been smaller. This suggests that in the case of this slab-class, an added layer of parent-pages needs to keep track of the data-pages.

But according to my own software, a certain slab-allocator has never been requested to allocate chunks to slab-class 1, because everything that software was requested to cache, was larger.

(Edit : )

Just as any other memory-management system does, a slab-allocator needs a fast methodology, to find and identify free chunks, so that when the command is given to allocate one, this can be done in the fewest steps possible.

A plausible methodology could be, that each time a new page is allocated to a slab-class, it is entirely filled with a pointer in each chunk, that states the next-available free chunk. The last chunk would start out as containing the NULL pointer, and a header belonging to the page would point to the head of this linked list.

  • If a chunk was to be allocated, then the first free chunk pointed to by this header would be chosen, but the linked list also edited, no longer to point to the same chunk. Instead, the header would now be set to the address which the allocated chunk pointed to.
  • If a chunk was to be deallocated, it would be inserted at the beginning of this list, so that it now contains the starting-address that the header previously contained, and the header would be set to point to the newly-deallocated chunk.
  • If the corresponding header-field ever contained NULL, this would mean that the entire page has been allocated, and has zero free chunks remaining.

It is my expectation that the header information actually holds more than one address, and what-else it states, probably also differs in a slub-allocator, from what it stated in a 1st-generation slab-allocator. Since a slub-allocator has greater complexity, it might even hold meta-data in special pages.

(Edit 06/30/2017 : )

According to information which I’ve read, it’s possible for an individual slab allocator to have a smallest, slab-class 1, with smaller chunks than 96 Bytes. Only, the ‘memcached‘ version on my computer reserves 96 bytes for the chunk-size of this slab-class, because ‘memcached‘ specifically requires that 48 bytes be used to store the key – for use in its hash-table. And then it starts to become efficient, to store an additional 48 bytes as the value of the key.

Dirk

 

Print Friendly, PDF & Email

Leave a Reply

Your email address will not be published. Required fields are marked *

Please Prove You Are Not A Robot *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>