What the name of the memory allocator is, under C++.

Usually, when one is programming in C, the name of the memory allocator is ‘malloc()‘. When programming in C++, this function has been replaced with the ‘new‘ operator, which the reader may not realize, he or she has been using the whole time. The way this works is that, if an object is to be placed on the heap, ‘new‘ allocates a region of memory, and then calls one of the class’s constructors, to fill that region of memory with the object. If the C++ has been properly written, but contains no ‘new‘ operator for an object, then the compiler allocates space for it, usually on the stack, but the constructor call does the same thing. This is also why, technically, the constructor does not return anything. If ‘new‘ was used, then ‘new‘ will return a pointer to this memory location, after the constructor call succeeded.

I suppose a question which could be asked, and which would be slightly more useful, would be, whether it’s legal to use both allocators in the same program. And this question comes back to, whether it’s legal to put pieces of C into a C++ program. And the answer is, ‘Sometimes, Yes.’ I think the biggest problem that some programmers might encounter when doing so, is, failing to keep track of which data-structures are ‘C’ data structures, and which are ‘C++’ objects, in the way the code was written. At that point, C++ does not contain any inherent mechanism to make this distinction obvious. Another problem which some programmers might encounter is, that certain header files will not be compatible with each other, if included in the same program.

Thus, a type of problem to look out for could be, creating C++ ‘string‘ objects, and C arrays prototyped with ‘char *‘ or ‘char[]‘, the latter of which are also referred to as ‘c_strings’, in the same program. A C function that expects a ‘c_string’, will certainly cause a compile error, if it was suddenly fed a C++ ‘string’ object. And this can be even harder to recognize, because C++ ‘string’ objects may be initialized with ‘c_string’ instances, in their constructor calls (just to keep the appearance of the code manageable).

Also, certain environments provide special macros for such string literals, such as, ‘L"Some Text"‘. It’s important not to confuse this one with a standard, legacy c_string. A standard, legacy c_string consists of an array of 8-bit characters (‘char‘), while this prefix creates of an array of ‘wchar_t‘ structures, each of which is 16 bits wide, not 8 bits wide. There is also the difference to know, between ~16-bit Unicode~, which actually refers to UTF-16, I think, and  32-bit Unicode, the latter of which would be invoked with ‘U"Some Text"‘…

If a programmer is explicitly using the “Standard Template Library” (‘STL’)… ‘std::map<>‘ construct, then one of its valid forms is, ‘std::map<string, int> myMap;‘. An object called ‘myMap’ is created on the stack, that maps from specific strings, to integers. Because the STL is so strongly based on C++, it’s important to understand that the template instantiation used here, refers to a ‘C++ string’ object at all times, for which reason ‘#include <string>‘ must also be given. Yet, other functions that are strictly C exist, such as ‘dlsym()‘, which expects that its second parameter be a(n 8-bit) ‘c_string’, which can just be typed into the source code. Well, the C++ string class implicit in the template instantiation ‘std::map<string, int>‘, can also be invoked, by typing ‘myMap["Object1"] = 12;‘. What happens in this last case is, that “Object1″ starts out as a c_string, due to the legacy of what the compilers do, but it will get used within ‘myMap’, to initialize a C++ string object. The appearance of the code doesn’t make this obvious. (:1)

But, just to prove that sometimes, it’s okay to put C into a C++ program, one thing which a programmer is eventually allowed to do is, to overload the default ‘new‘ operator, with his or her own version, which will be applied to specific classes of objects. When a programmer does this, most likely, they will have something similar to a ‘malloc()‘ function-call, in the definition of the custom ‘new‘ operator, in any case, probably C and not C++. And, doing so also requires, that such a programmer overload the ‘delete‘ operator… This type of programming opens a can of worms, in the question of how this custom ‘new‘ operator, or its counterpart, the custom ‘delete‘, are supposed to behave, if the constructor call or destructor call throw an exception, and, how to define this behaviour, when defining a custom ‘new‘ and ‘delete‘. Defining these should not be undertaken, unless the programmer who does so, has studied that specific question. (:2)

(Updated 3/14/2021, 0h55… )

Continue reading What the name of the memory allocator is, under C++.

The Advantages of using a Slab Allocator

When people take their first C programming courses, they are taught about the standard allocator named ‘malloc()‘, while when learning C++, we were first taught about its standard allocator, named ‘new‘.

These allocators work on the assumption that a program is running in user space, and may not always be efficient at allocating smaller chunks of memory. They assume that a standard method of managing the heap is in-place, where the heap of any one process is a part of that process’s memory-image, and partially managed by the kernel.

Not only that, but when we tell either of these standard operators to allocate a chunk of memory, the allocator recognizes the size of that chunk, prepends to the chunk of memory a binary representation of its size, and before returning a pointer to the allocated memory, subtracts the size of the binary representation, of the size originally requested by the programmer. Thus, the pointer returned by either of these allocators points directly to the memory which the programmer can use, even though the allocated chunk is larger, and preceded by a binary representation of its own size. That way, when the command is given to deallocate, all the deallocation-function needs to receive in principle, is a pointer to the allocated chunk, and the deallocation-function can then find the header that was inserted from there, to derive how much memory to delete.

I suppose that one conclusion to draw from this is, that even though it looks like a good exercise to teach programming students, the exercise of always allocating a 32-bit or a 64-bit object – i.e., a 4-byte or an 8-byte object – such as an integer, to obtain an 8-byte pointer to that integer, is actually not a good one, because in addition to the requested 8 bytes, an additional header is always being allocated, which may add 4 bytes if the maximum allocated size is a 32-bit number, or add 8 bytes if the maximum allocated size (of one chunk) is a 64-bit number.

Additionally, these allocators assume the support of the kernel, to a user-space process, the latter of which has a heap. On 64-bit systems that are ‘vmalloc‘-based, this requires the user-space application try to access virtual address ‘0x0000 0000 0000 0000‘, which intentionally results in a page-fault, and stops the process. The kernel then needs to examine why the page-fault occurred, and since this was a legitimate reason, needs to set up the virtual page-frame, of an address returned to the (restarted) user-space process, via the usual methods for returning values.

And so means also needed to exist, by which a kernel can manage memory more-efficiently, even under the assumption that the kernel does not have the sort of heap, that a user-space process does. And one main mechanism for doing so, is to use a slab allocator. It will allocate large numbers of small chunks, without requiring as much overhead to do so, as the standard user-space allocators did. In kernel-space, these slabs are the main replacement for a heap.

(Updated 06/20/2017 … )

Continue reading The Advantages of using a Slab Allocator