What the name of the memory allocator is, under C++.

Usually, when one is programming in C, the name of the memory allocator is ‘malloc()‘. When programming in C++, this function has been replaced with the ‘new‘ operator, which the reader may not realize, he or she has been using the whole time. The way this works is that, if an object is to be placed on the heap, ‘new‘ allocates a region of memory, and then calls one of the class’s constructors, to fill that region of memory with the object. If the C++ has been properly written, but contains no ‘new‘ operator for an object, then the compiler allocates space for it, usually on the stack, but the constructor call does the same thing. This is also why, technically, the constructor does not return anything. If ‘new‘ was used, then ‘new‘ will return a pointer to this memory location, after the constructor call succeeded.

I suppose a question which could be asked, and which would be slightly more useful, would be, whether it’s legal to use both allocators in the same program. And this question comes back to, whether it’s legal to put pieces of C into a C++ program. And the answer is, ‘Sometimes, Yes.’ I think the biggest problem that some programmers might encounter when doing so, is, failing to keep track of which data-structures are ‘C’ data structures, and which are ‘C++’ objects, in the way the code was written. At that point, C++ does not contain any inherent mechanism to make this distinction obvious. Another problem which some programmers might encounter is, that certain header files will not be compatible with each other, if included in the same program.

Thus, a type of problem to look out for could be, creating C++ ‘string‘ objects, and C arrays prototyped with ‘char *‘ or ‘char[]‘, the latter of which are also referred to as ‘c_strings’, in the same program. A C function that expects a ‘c_string’, will certainly cause a compile error, if it was suddenly fed a C++ ‘string’ object. And this can be even harder to recognize, because C++ ‘string’ objects may be initialized with ‘c_string’ instances, in their constructor calls (just to keep the appearance of the code manageable).

Also, certain environments provide special macros for such string literals, such as, ‘L"Some Text"‘. It’s important not to confuse this one with a standard, legacy c_string. A standard, legacy c_string consists of an array of 8-bit characters (‘char‘), while this prefix creates of an array of ‘wchar_t‘ structures, each of which is 16 bits wide, not 8 bits wide. There is also the difference to know, between ~16-bit Unicode~, which actually refers to UTF-16, I think, and  32-bit Unicode, the latter of which would be invoked with ‘U"Some Text"‘…

If a programmer is explicitly using the “Standard Template Library” (‘STL’)… ‘std::map<>‘ construct, then one of its valid forms is, ‘std::map<string, int> myMap;‘. An object called ‘myMap’ is created on the stack, that maps from specific strings, to integers. Because the STL is so strongly based on C++, it’s important to understand that the template instantiation used here, refers to a ‘C++ string’ object at all times, for which reason ‘#include <string>‘ must also be given. Yet, other functions that are strictly C exist, such as ‘dlsym()‘, which expects that its second parameter be a(n 8-bit) ‘c_string’, which can just be typed into the source code. Well, the C++ string class implicit in the template instantiation ‘std::map<string, int>‘, can also be invoked, by typing ‘myMap["Object1"] = 12;‘. What happens in this last case is, that “Object1″ starts out as a c_string, due to the legacy of what the compilers do, but it will get used within ‘myMap’, to initialize a C++ string object. The appearance of the code doesn’t make this obvious. (:1)

But, just to prove that sometimes, it’s okay to put C into a C++ program, one thing which a programmer is eventually allowed to do is, to overload the default ‘new‘ operator, with his or her own version, which will be applied to specific classes of objects. When a programmer does this, most likely, they will have something similar to a ‘malloc()‘ function-call, in the definition of the custom ‘new‘ operator, in any case, probably C and not C++. And, doing so also requires, that such a programmer overload the ‘delete‘ operator… This type of programming opens a can of worms, in the question of how this custom ‘new‘ operator, or its counterpart, the custom ‘delete‘, are supposed to behave, if the constructor call or destructor call throw an exception, and, how to define this behaviour, when defining a custom ‘new‘ and ‘delete‘. Defining these should not be undertaken, unless the programmer who does so, has studied that specific question. (:2)

(Updated 3/14/2021, 0h55… )

Continue reading What the name of the memory allocator is, under C++.

About Encoding And Decoding Base-64 In FORTH

In This Previous Posting, I wrote that I had written some source-code in the language FORTH, that decodes standard Base-64 into a binary array of data, in output sizes that are multiples of 36 Bytes. For my own purposes, there might be no need to output Base-64, because I can use command-line utilities to prepare Base-64 strings, and then only use those as a means to enter the data, and embed it into future, hypothetical source code.

But the purposes of other, hypothetical software-developers have not been met with this exercise, because those people may need to be able to output Base-64, which means they’d need a matching encoder.

Unfortunately, the language does not lend itself to that easily, if a standard Base-64 radix is being implied, because 6-bit output-numerals would need to be bit-aligned, and trying to align fields of bits in FORTH is difficult.

(Edit 07/25/2017 : )

One subject which I have investigated more completely now, is the fact that the numeral-to-text conversion utilities built-in to FORTH, seem to continue to produce output, even if a Base of 64 has been set. In theory, the FORTH developers could have adopted a custom radix, in order to be able to state, that their binary-to-FB64 conversion is computed faster, than standard Base-64 could be. But OTOH, the characters output, could just become garbage, by the time 24-bit numerals are to be streamed:

 


dirk@Klystron:~$ gforth
Gforth 0.7.2, Copyright (C) 1995-2008 Free Software Foundation, Inc.
Gforth comes with ABSOLUTELY NO WARRANTY; for details type `license'
Type `bye' to exit
: list-forth-b64 [ base @ decimal ] 64 base ! &255 &0 do i . space loop [ base ! ] ;  ok
list-forth-b64 0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z  [  \  ]  ^  _  `  a  b  c  d  e  f  g  h  i  j  k  l  m  n  o  p  q  r  s  t  u  v  10  11  12  13  14  15  16  17  18  19  1A  1B  1C  1D  1E  1F  1G  1H  1I  1J  1K  1L  1M  1N  1O  1P  1Q  1R  1S  1T  1U  1V  1W  1X  1Y  1Z  1[  1\  1]  1^  1_  1`  1a  1b  1c  1d  1e  1f  1g  1h  1i  1j  1k  1l  1m  1n  1o  1p  1q  1r  1s  1t  1u  1v  20  21  22  23  24  25  26  27  28  29  2A  2B  2C  2D  2E  2F  2G  2H  2I  2J  2K  2L  2M  2N  2O  2P  2Q  2R  2S  2T  2U  2V  2W  2X  2Y  2Z  2[  2\  2]  2^  2_  2`  2a  2b  2c  2d  2e  2f  2g  2h  2i  2j  2k  2l  2m  2n  2o  2p  2q  2r  2s  2t  2u  2v  30  31  32  33  34  35  36  37  38  39  3A  3B  3C  3D  3E  3F  3G  3H  3I  3J  3K  3L  3M  3N  3O  3P  3Q  3R  3S  3T  3U  3V  3W  3X  3Y  3Z  3[  3\  3]  3^  3_  3`  3a  3b  3c  3d  3e  3f  3g  3h  3i  3j  3k  3l  3m  3n  3o  3p  3q  3r  3s  3t  3u  3v   ok
bye 
dirk@Klystron:~$ 


 

My conclusion is, that This pseudo- Base-64 streaming remains usable, even when 24-bit numerals are given.

This conclusion reverses a negative, tentative conclusion, which I had only given yesterday.

I have by now coded both the encoder and decoder for standard Base-64, which I’ve named ‘b64-stream’ and ‘b64-parse’ respectively, but as well the encoder and decoder for the pseudo- Base-64, which I call ‘fb64-stream’ and ‘fb64-parse’. At this point, Base-64 has been implemented in a way software-experts would consider complete, with a full non-standard version of Base-64. This is what the code ultimately does:

 


dirk@Klystron:~$ cd ~/Programs
dirk@Klystron:~/Programs$ gforth fb64-parse-6.fs
Gforth 0.7.2, Copyright (C) 1995-2008 Free Software Foundation, Inc.
Gforth comes with ABSOLUTELY NO WARRANTY; for details type `license'
Type `bye' to exit
S" generate1234TERRIBLE.,?-" 4 / b64-stream b64-parse 0 4 * type generate1234TERRIBLE.,?- ok
S" generate1234TERRIBLE.,?-" 4 / fb64-stream fb64-parse 0 4 * type generate1234TERRIBLE.,?- ok
bye 
dirk@Klystron:~/Programs$ 


 

My custom-semantics assume that on the stack, a binary array exists, with a numeric value placed on top of it, which warns each encoder, how many 32-bit words each array holds. OTOH, the input to each decoder expects a standard, full string, which the corresponding encoder outputs, and which also exist as two items on the stack each time, where the top numeral states how many characters long the string is, as per standard FORTH.

And below is the source-code (Updated 08/02/2017 : )

Continue reading About Encoding And Decoding Base-64 In FORTH