What the name of the memory allocator is, under C++.

Usually, when one is programming in C, the name of the memory allocator is ‘malloc()‘. When programming in C++, this function has been replaced with the ‘new‘ operator, which the reader may not realize, he or she has been using the whole time. The way this works is that, if an object is to be placed on the heap, ‘new‘ allocates a region of memory, and then calls one of the class’s constructors, to fill that region of memory with the object. If the C++ has been properly written, but contains no ‘new‘ operator for an object, then the compiler allocates space for it, usually on the stack, but the constructor call does the same thing. This is also why, technically, the constructor does not return anything. If ‘new‘ was used, then ‘new‘ will return a pointer to this memory location, after the constructor call succeeded.

I suppose a question which could be asked, and which would be slightly more useful, would be, whether it’s legal to use both allocators in the same program. And this question comes back to, whether it’s legal to put pieces of C into a C++ program. And the answer is, ‘Sometimes, Yes.’ I think the biggest problem that some programmers might encounter when doing so, is, failing to keep track of which data-structures are ‘C’ data structures, and which are ‘C++’ objects, in the way the code was written. At that point, C++ does not contain any inherent mechanism to make this distinction obvious. Another problem which some programmers might encounter is, that certain header files will not be compatible with each other, if included in the same program.

Thus, a type of problem to look out for could be, creating C++ ‘string‘ objects, and C arrays prototyped with ‘char *‘ or ‘char[]‘, the latter of which are also referred to as ‘c_strings’, in the same program. A C function that expects a ‘c_string’, will certainly cause a compile error, if it was suddenly fed a C++ ‘string’ object. And this can be even harder to recognize, because C++ ‘string’ objects may be initialized with ‘c_string’ instances, in their constructor calls (just to keep the appearance of the code manageable).

Also, certain environments provide special macros for such string literals, such as, ‘L"Some Text"‘. It’s important not to confuse this one with a standard, legacy c_string. A standard, legacy c_string consists of an array of 8-bit characters (‘char‘), while this prefix creates of an array of ‘wchar_t‘ structures, each of which is 16 bits wide, not 8 bits wide. There is also the difference to know, between ~16-bit Unicode~, which actually refers to UTF-16, I think, and  32-bit Unicode, the latter of which would be invoked with ‘U"Some Text"‘…

If a programmer is explicitly using the “Standard Template Library” (‘STL’)… ‘std::map<>‘ construct, then one of its valid forms is, ‘std::map<string, int> myMap;‘. An object called ‘myMap’ is created on the stack, that maps from specific strings, to integers. Because the STL is so strongly based on C++, it’s important to understand that the template instantiation used here, refers to a ‘C++ string’ object at all times, for which reason ‘#include <string>‘ must also be given. Yet, other functions that are strictly C exist, such as ‘dlsym()‘, which expects that its second parameter be a(n 8-bit) ‘c_string’, which can just be typed into the source code. Well, the C++ string class implicit in the template instantiation ‘std::map<string, int>‘, can also be invoked, by typing ‘myMap["Object1"] = 12;‘. What happens in this last case is, that “Object1″ starts out as a c_string, due to the legacy of what the compilers do, but it will get used within ‘myMap’, to initialize a C++ string object. The appearance of the code doesn’t make this obvious. (:1)

But, just to prove that sometimes, it’s okay to put C into a C++ program, one thing which a programmer is eventually allowed to do is, to overload the default ‘new‘ operator, with his or her own version, which will be applied to specific classes of objects. When a programmer does this, most likely, they will have something similar to a ‘malloc()‘ function-call, in the definition of the custom ‘new‘ operator, in any case, probably C and not C++. And, doing so also requires, that such a programmer overload the ‘delete‘ operator… This type of programming opens a can of worms, in the question of how this custom ‘new‘ operator, or its counterpart, the custom ‘delete‘, are supposed to behave, if the constructor call or destructor call throw an exception, and, how to define this behaviour, when defining a custom ‘new‘ and ‘delete‘. Defining these should not be undertaken, unless the programmer who does so, has studied that specific question. (:2)

(Updated 3/14/2021, 0h55… )

(As of 3/13/2021, 15h15… )

1:)

I suppose that this poses the question, of whether the following template instantiation is valid or not:

 


std::map<char *, int> myMap;

 

The short answer is, that This is invalid. And the reason it is, is the fact that to use ‘char *‘ in this way, requires that ‘std::map<>‘ be fed a third template parameter, which states what function should be used, to compare any two instances of ‘char[]‘. Left to the primitive default, what the template instantiation would do is, to compare the values of the two pointers (i.e., the addresses of two buffers), to try to sort actual strings. I think that the standard C function ‘strcmp()‘ will compare c_strings to put them in some consistent order, so that the following instantiation could count as Valid:

 


std::map<char *, int, strcmp> myMap;

 

The third template-parameter refers to a function, without invoking it immediately. However, since I have not tried to code this way, I’ll just wait for some reader to tell me, whether they got into any trouble, coding in C to such an extent, while using the STL:)


 

(Comment, 3/13/2021, 22h20: )

The way I visualize this works is, that ‘std::map<>‘ will actually compare two objects of the specified key-type, using either the relational operator ‘<‘ or ‘>’ alone, by default. Because C++ string objects have been given many ‘practical’ features (for text), they have also been given both relational operators, in a way that sorts them alphabetically. I think, if ‘std::map<>‘ has been given a third template-(function-)parameter, then that gets fed two instances of the key-type as arguments, and the result again, interpreted using C.

What the reader should understand, however, is, not to put the following in their source files:

 


#include <string>
#include <string.h>

 

One interesting factoid that I just read on a cited reference page was, that the include file ‘<string>‘ also overloads the relational operators for ‘(char *)‘, just so that programmers who use it, will not need to #include ‘<string.h>‘ into their C++ source files as well. However, it’s in ‘<string.h>‘, that such legacy functions as ‘strcmp()‘ are defined…

(End of comment, 3/13/2021, 22h20.)


 

(Update 3/13/2021, 22h45: )

2:)

What happens with a custom ‘new‘ operator, when the programmer defines more than one custom ‘delete‘ operator, and the constructor throws an exception, is explained on This BB, as if there were no controversy. The ‘delete‘ operator is called, with one additional parameter. Yet, some programmers define a custom ‘new‘ operator, with the specific intent of passing-in their own, additional arguments. If the constructor succeeds, then later, a version of ‘delete‘ will be called, which receives the special arguments that were supplied by the programmer, when he or she called ‘new‘, but, as additional parameters after the first.

So, where is the controversy? According to This reference site, the programmer could have defined the three, following ‘new‘ operators:

 


operator new (std::size_t sz1);
operator new (std::size_t sz1, std::size_t sz2);
operator new (std::size_t sz1, MyClass *obj);

 

This is not common, but could happen. Usually, the compiler supplies ‘sz1‘ as the size of the object that needs to be allocated, but some programmers might want to specify their own size-parameter. Then, according to This reference site, there are also three valid prototypes, which a custom ‘delete‘ operator might implement separately:

 


void operator delete (void *ptr);
void operator delete (void *ptr, std::size_t sz);
void operator delete (void *ptr, MyClass *obj);

 

What will generally happen if all three are defined is, that C++ will call the second, if the constructor threw an exception !

The problem is, that the programmer does not know, whether the second of these ‘delete‘ operators has been called, because during the call of the first ‘new‘ operator, the constructor threw an exception, or, because the same programmer just happened to create his object, with his second form of the ‘new‘ operator, but the constructor did not throw an exception. In other words, the programmer could have left himself the second form of the ‘delete‘ operator as a signal, to the effect that the constructor threw an exception, but then also not know whether ‘sz‘ originated as ‘sz1‘ or as ‘sz2‘ from the ‘new‘ operators… The second form of the ‘delete‘ operator could also just be seen, as a special case of the third form.


 

Hence, a good note to the programmer could be, either to use the second form of ‘delete‘ as such an indicator, and not to define the second form of ‘new‘ as well, or, just not to take the fact that the second form of ‘delete‘ has been called, as implying anything about whether the constructor-call threw an exception or not… If it did, then any argument leading to ‘sz2‘ has been ignored, just as any argument leading to ‘obj’ would be ignored.

Luckily, there is a silver lining to this cloud. IF it was the only intent with which the programmer defined the second form of the ‘new‘ operator I showed above, To allocate the amount of memory which he later supplied as the custom argument to it, but, still by using ‘malloc()‘,  THEN, regardless of whether the second parameter his ‘delete‘ operator receives originated as the compiler-guessed amount to be deallocated, or, whether it does in fact equal the custom-defined ‘sz2‘ parameter which that programmer once supplied, The Way ‘malloc()‘ Works is such, that to call:

 


free(ptr);

 

… Will have as effect, to deallocate how much was allocated. In that case, maybe the most proper thing to do in the ‘delete‘ operator, would just be, to ignore the ‘sz‘ parameter received. The way to do that in the method implementation is, to state the data-type in the correct position of the parameter-list (still, comma-separated), but, to omit an actual parameter-name.


 

(Update 3/14/2021, 0h55: )

I would say that this problem could be aggravated by the fact that ‘in the field’, a class that has a special ‘new‘ operator, which specifically asks that a custom-defined amount of memory be allocated, has a slightly higher probability of seeing its constructor throw an exception, simply because the constructor may not have been given enough memory, to construct the object in. Perhaps, when implementing such a special ‘new‘ operator, a good precaution to take might be to do this:

 


#include <cstdlib>    //  std::malloc, std:free
#include <new>        //  std::bad_alloc

void * MyClass::operator new (std::size_t sz1, std::size_t sz2) {
    //  Passed in by copy...

    void *ptr = NULL;

    if (sz2 < sz1)
        sz2 = sz1;

    try {
        ptr = std::malloc(sz2);
    } catch (std::bad_alloc) {
        ptr = NULL;
    }

    return ptr;
}

void MyClass::operator delete (void *ptr, std::size_t) noexcept {
    if (ptr)  std::free(ptr);
}

 

As is generally known, such a custom ‘new‘ and ‘delete‘ operator also need to be declared as public methods, in the class-declaration of ‘MyClass‘, in this case…


 

(Update 3/14/2021, 0h30: )

At the beginning of this posting I wrote, that the subject of ‘destructors which throw exceptions’, is as important to study, as the subject of ‘throwing constructors‘. This turns out not be to be 100% true, when one is only interested in the design of a custom ‘delete‘ operator.

A destructor vector invokes element destructors. If an element throws, the vector throws. However, it’s worse if the destructor throws, while an exception is already being processed. From what I read, the application terminates. Yet, when a mere constructor throws, what happens next is that C++ will try to execute destructors, on all the successfully constructed elements of its member initialization list, in reverse, and recursively over base-classes. If a destructor throws in this phase, because it’s already taking place within the exception-handling of a constructor that threw, the application terminates.

Fortunately, the ‘delete‘ operator is only called, after this roll-back has been completed. So, the only thing really needed, to prevent the application from terminating, is that the ‘delete‘ operator itself never throw.

A newfangled feature which C++ has, is the ‘noexcept‘ keyword, which can be added to (any) function-prototypes, the same way ‘const‘ can be added – after the function’s parenthesized parameter-list. What this keyword does is, to cause the compiler to fail with an error, unless it can verify at compile-time, that a certain function-call cannot throw an exception.

Custom ‘delete‘ operators should either be implemented with the ‘nothrow‘ tag, or with the ‘noexcept‘ compiler keyword. That way, what was already an exception-handling, won’t terminate the application.

I don’t really know how to implement the ‘nothrow‘ tag. References which I read indicate, that it’s only available for ‘new‘ and ‘delete‘ operators, which do not use “placement”, which is a fancy way to say, ‘which do not pass additional, user-defined arguments’. If the tag is set by a ‘new‘ operator-call, a failure to allocate simply returns a Null pointer instead of throwing ‘bad_alloc. I suppose it then also gets passed-on to the corresponding ‘delete‘ operator (which is never called with any arguments).

 

Enjoy,

Dirk

 

Print Friendly, PDF & Email

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>