## Creating a C++ Hash-Table quickly, that has a QString as its Key-Type.

This posting will assume that the reader has a rough idea, what a hash table is. In case this is not so, this WiKiPedia Article explains that as a generality. Just by reading that article, especially if the reader is of a slightly older generation like me, he or she could get the impression that, putting a hash-table into a practical C++ program is arduous and complex. In reality, the most efficient implementation possible for a hash table, requires some attention to such ideas as, whether the hashes will generally tend to be coprime with the modulus of the hash table, or at least, have the lowest GCDs possible. Often, bucket-arrays with sizes that are powers of (2), and hashes that contain higher prime factors help. But, when writing practical code in C++, one will find that the “Standard Template Library” (‘STL’) already provides one, that has been implemented by experts. It can be put into almost any C++ program, as an ‘unordered_map’. (:1)

But there is a caveat. Any valid data-type meant to act as a key needs to have a hashing function defined, as a template specialization, and not all data-types have one by far. And of course, this hashing function should be as close to unique as possible, for unique key-values, while never changing, if there could be more than one possible representation of the same key-value. Conveniently, C-strings, which are denoted in C or C++ as the old-fashioned ‘char *’ data-type, happen to have this template specialization. But, because C-strings are a very old type of data-structure, and, because the reader may be in the process of writing code that uses the Qt GUI library, the reader may want to use ‘QString’ as his key-data-type, only to find, that a template specialization has not been provided…

For such cases, C++ allows the programmer to define his own hashing function, for QStrings if so chosen. In fact, Qt even allows a quick-and-dirty way to do it, via the ‘qHash()’ function. But there is something I do not like about ‘qHash()’ and its relatives. They tend to produce 32-bit hash-codes! While this does not directly cause an error – only the least-significant 32 bits within a modern 64-bit data-field will contain non-zero values – I think it’s very weak to be using 32-bit hash-codes anyway.

Further, I read somewhere that the latest versions of Qt – as of 5.14? – do, in fact, define such a specialization, so that the programmer does not need to worry about it anymore. But, IF the programmer is still using an older version of Qt, like me, THEN he or she might want to define their own 64-bit hash-function. And so, the following is the result which I arrived at this afternoon. I’ve tested that it will not generate any warnings when compiled under Qt 5.7.1, and that a trivial ‘main()’ function that uses it, will only generate warnings, about the fact that the trivial ‘main()’ function also received the standard ‘argc’ and ‘argv’ parameters, but made no use of them. Otherwise, the resulting executable also produced no messages at run-time, while having put one key-value pair into a sample hash table… (:2)

/*  File 'Hash_QStr.h'
*
*  Regular use of a QString in an unordered_map doesn't
* work, because in earlier versons of Qt, there was no
* std::hash<QString>() specialization.
* Thus, one could be included everywhere the unordered_map
* is to be used...
*/

#ifndef HASH_QSTR_H
#define HASH_QSTR_H

#include <unordered_map>
#include <QString>
//#include <QHash>
#include <QChar>

#define PRIME 5351

/*
namespace cust {
template<typename T>
struct hash32 {};

template<> struct hash32<QString> {
std::size_t operator()(const QString& s) const noexcept {
return (size_t) qHash(s);
}
};
}

*/

namespace cust
{
inline size_t QStr_Orther(const QChar & mychar) noexcept {
return ((mychar.row() << 8) | mychar.cell());
}

template<typename T>
struct hash64 {};

template<> struct hash64<QString>
{
size_t operator()(const QString& s) const noexcept
{
size_t hash = 0;

for (int i = 0; i < s.size(); i++)
hash = (hash << 4) + hash + QStr_Orther(s.at(i));

return hash * PRIME;
}
};
}

#endif  //  HASH_QSTR_H



(Updated 4/12/2021, 8h00… )

## Why C++ compilers use name-mangling.

A concept which exists in C++ is, that the application programmer can simply define more than one function, which will seem to have the same names in his or her source code, but which will differ, either just because they have different parameter-types, or, because they are member functions of a class, i.e., ‘Methods’ of that class. This can be done again, for each declared class. In the first case, it’s a common technique called ‘function overloading’. And, if the methods of a derived class replace those of a base-class, then it’s called ‘function overriding‘.

What people might forget when programming with object-oriented semantics is, that all the function definitions still result in subroutines when compiled, which in turn reside in address-ranges of RAM, dedicated for various types of code, either in ‘the code segment of a process’, or in ‘the addresses which shared libraries will be loaded to’. This differs from the actual member variables of each class-object, also known as its properties, as well as for entries that the object might have, for virtual methods. Those will reside in ‘the data-segment of the process’, if the object was allocated with ‘new’. Each method would be incapable of performing its task if, in addition to the declared parameters, it did not receive an invisible parameter, that will be its ‘this’ pointer, which will allow it to access the properties of one object. And such a hidden ‘this’ pointer is also needed by any constructors.

Alternatively, properties of an object can reside on the stack, and therefore, in ‘the stack segment of the process’, if they were just declared to exist as local variables of a function-call. And, if an array of objects was declared, let’s say mistakenly, and not, of pointers to those objects, then each entry in the array will, again, need to have a size determined at compile-time, for which reason such objects will not be polymorphic. I.e., in these two cases, any ‘virtuality’ of the functions is discarded, and only the declared class of the object will be considered, for resolving function-calls. Such an object ends up ‘statically bound’, in an environment which really supports ‘dynamically bound’ method-invocation.

First of all, when programming in C, it is not allowed to overload functions by the same name like that. According to C, a function by one name can only be defined once, as receiving the types in one parameter-list. And the only real exception to this is in the existence of ‘variadic functions,’ which are beyond the scope of this one posting. (:1)

Further, C++ functions that have the same name, are not (typically) an example of variadic functions.

This limitation ‘makes sense’, because the compiler of either language still needs to generate one subroutine, which is the machine-language version of what the function in the source-code defined. It will have a fixed expectation, of what parameter list it was fed, even in the case of ‘variadic functions’. I think that what happens with variadic functions is, that the machine-language code will search its parameter list on the stack, for whatever it finds, at run-time. They tend to be declared with an ellipsis, in other words with ‘…’, for the additional parameters, after the entries for any fixed parameters.

So, the way in which C++ resolves this problem is, that it “mangles” the names of the functions in the source code, deterministically, but, with a system that takes into account, which parameter types they receive, and which class they may belong to, if any. The following is an example of C++ source code that demonstrates this. I have created 3 versions of the function ‘MyFunc()’, each of which only has as defined behaviour, to return the exact data which they received as input. Obviously, this would be useless in a real program.

But what I did next was to compile this code into a shared library, and then to use the (Linux) utility ‘nm’, to list the symbols which ended up being defined in the shared library…

Source Code:

/*  Sample_Source.cpp
*
* This snippet is designed to illustrate a capability which C++ has,
* but which requires name-mangling...
*
*/

#include <cmath>
#include <complex>

/*  If this were a regular C program, then we'd include...
*
#include <math.h>
#include <complex.h>
*
*/

using std::complex;

typedef complex<double> CC;

class HasMethods {
public:
HasMethods() { }
~HasMethods() { }

CC MyFunc(CC input);
};

//  According to the given headers, there are at least 3 functions
// that I could define below. First, two free functions, aka
// global functions...

double MyFunc(double input) {
return input;
}

CC MyFunc(CC input) {
return input;
}

//  Next, the member function of HasMethods can be defined, aka
// the supposed main 'Method' of a HasMethods object...

CC HasMethods::MyFunc(CC input) {
return input;
}



(Updated 4/12/2021, 21h30… )

## Using Factory Functions to Export C++ Objects from Shared Libraries / DLLs, but also applying Factory Class Design Pattern.

(Edited 3/15/2021, 22h45: )

The main reason for which factory functions are sometimes used is the fact that, while object-oriented code can in fact be compiled into shared libraries, situations exist in which the application programmer is not aware of specific libraries that will need to be loaded at run-time. As a result, this developer cannot specify them during the linking stage of his or her application.

(End of Edit, 3/15/2021, 22h45.)

What one does in practical cases is, to define “factory functions” in the shared library, such as the function “maker()” in the example below, and to give the compiler directive ‘extern “C”‘ when doing so:

https://www.linuxjournal.com/article/3687

I think that the example I’ve just linked to suffers from some seriously bad typesetting. But to my mind, it gets the point across. The methods that belong to C++ classes have mangled names, which are difficult to load from shared libraries directly (using the ‘dlsym()’ function). In principle, one could try to predict the name mangling in the client program, but in practice, this is avoided. Instead, a factory function is exported from the shared library to the client program, the name of which is explicitly not mangled, due to this compiler directive, and what that function does when called is, to create an object of the specified type. The client program’s C++ ‘knows’ what methods of the object it can call because of header files…

When I was studying System Software at Concordia University, an exercise the whole class was required to carry out was, to define a function that was declared with ‘extern “C”‘, to store that in a shared library, and, to write a client program which loaded that function. We were never required to load C++ objects from that shared library. But, the article which I just linked to above, explains how to do that, at least well enough so that I can follow it.

(…)

If you’re one of those people like me, who have seen C++ with many ‘Create…()’ function-names, even though we know that in general, in C++, the constructors of class-objects have the same name as the class, what is written above is likely the reason, for which so many Creator functions have been defined.

I think, though, that there is one way in which I must second-guess the author of the article I just linked to. He ended up using the ‘extern “C”‘ directive in more places than he needed to. If it was something his project set out to do from the beginning, for the source, shared libraries to register their factory functions in an array of function-pointers automatically, then there is really no longer any reason, why their names should not be mangled. In fact, in certain cases conflicts could result, as soon as two functions have the same name, such as “proxy()”. And so, in such cases, there is really only one object in the whole process, the name of which must not be mangled, and in the case cited above, that would be the associative array.

Another fact which the cited article does not mention is simply, that it might result in tedious code, if the factory function was defined separately, for every class of objects that a shared library can export. It might simplify things, if a template Creator class could be defined, which has a factory method as one of its methods, which can be applied to a series of classes that have default constructors… The following article describes a slightly different use:

https://refactoring.guru/design-patterns/factory-method/cpp/example

Of course, the cited function ‘SomeOperation()’ could not be used…

//  The following lines should probably go into some header file.

#include <map>
#include <string>
#include <stdlib.h>
#include <stdio.h>
#include <dlfcn.h>

using std::string;

//  If the library existed, either IMPORT or WINDLL would be defined...

typedef std::map<string, void *> function_array;

function_array *reg_factories_p = NULL;

#ifdef WIN32
#define DLL_EXPORT extern "C" __declspec(dllexport)
#define DLL_IMPORT extern "C" __declspec(dllimport)
#else
#define DLL_EXPORT extern "C"
#define DLL_IMPORT extern "C"
#endif

#ifdef WINDLL
DLL_EXPORT function_array reg_factories;
function_array reg_factories;
#else
#ifndef IMPORT
function_array reg_factories;
#endif
#endif

//  What comes below this line, goes into the shared library / DLL.

class Obj_1 {
public:
Obj_1() {}
};

class Obj_2 {
public:
Obj_2() {}
};

class Obj_3 {
public:
Obj_3() {}
};

//  Template definition (can't be compiled).

template<class T>
class Creator {
public:

typedef T * (Creator<T>::*method)() const;
method m_ptr;

Creator(string object_name) {
m_ptr = &Creator<T>::FactoryMethod;
#ifndef IMPORT
if (! reg_factories_p) {
reg_factories_p = &reg_factories;
}
reg_factories[object_name] = (void *) this->m_ptr;
#endif
}

T *FactoryMethod() const {
return new T;
}

};

//  Template instantiations (can be compiled).

Creator<Obj_1>  mk1("Object1");
Creator<Obj_2>  mk2("Object2");
Creator<Obj_3>  mk3("Object3");

//  Static objects will be constructed, if the shared library
//  has been opened with 'RTLD_NOW'.
//  This will cause the constructor within 'Creator' to be
//  called, and therefore, the associative array to be populated.

//  Code below this line is meant to go into the client program...

//  The following template class will serve to cast void pointers back
//  to Maker objects, each with a method that makes the object...

#ifndef WINDLL

#define LIB_NAME "./libfunnylib.so"

template<class T>
class Maker {

public:
typedef T * (*obj_maker) ();

obj_maker m_fptr;

Maker(void *fptr) {
m_fptr = reinterpret_cast<obj_maker> (fptr);
}

T * Make() {
return (*m_fptr) ();
}

};

int main(int argc, char* argv[]) {

void *hndl = dlopen(LIB_NAME, RTLD_NOW);
if(hndl == NULL) {
printf("%s\n\n", dlerror());
} else {
void *symbol = dlsym(hndl, "reg_factories");
if (symbol) {
reg_factories_p = (function_array *) symbol;
} else {
printf("%s Did not export symbol reg_factories.\n\n", LIB_NAME);
return -1;
}
}

char *buffer;
size_t bufsize = 32;

printf("The purpose here is, to find out whether\n");
printf("I was able to store non-trivial adresses in the\n");
printf("associative array.\n");
printf("\n");
printf("Object1 factory address: %p\n", (*reg_factories_p)["Object1"]);
printf("Object2 factory address: %p\n", (*reg_factories_p)["Object2"]);
printf("Object3 factory address: %p\n", (*reg_factories_p)["Object3"]);
printf("\n");
printf("Now attempting to execute those factories...\n");

//  Exploiting the compiler's willingness to create temporary
//  objects, invoked directly by their class-name...
try {
Maker<Obj_1>((*reg_factories_p)["Object1"]).Make();
Maker<Obj_2>((*reg_factories_p)["Object2"]).Make();
Maker<Obj_3>((*reg_factories_p)["Object3"]).Make();
} catch (...) {
printf("Some type of error took place!\n");
return -1;
}

printf("All pointer casts executed.\n\n");

printf("Press Enter to quit program.\n");

buffer = (char *) malloc(bufsize * sizeof(char));
if( buffer == NULL)
{
perror("Unable to allocate buffer");
exit(-1);
}

getline(&buffer,&bufsize,stdin);

//  We're done. Cleaning up.

free(buffer);

return 0;
}
#endif



One fact which must be kept in mind, if this code is ever to be used ‘in the real world’, is, that The associative array should be declared as ‘extern “C”‘ in a header file, but additionally allocated wherever it’s going to be used.  This results in two similar-looking C++ statements, when building the shared library.

Also, compiling this code will generate one warning per factory method, unless, under ‘g++’, the flag ‘-Wno-pmf-conversions‘ is set. Additionally, on Linux computers, linking requires that the flag ‘-ldl‘ be set.

Now, an astute reader will ask, ‘Mainstream code can be compiled without requiring special flags. Why does this guy’s code require that special flags be set?‘ And the answer to that question is, ‘Because this code cheats. The warning which up-to-date compilers will generate – at best – exists, because on some platforms, a pointer-to-method cannot be cast to a pointer-to free function, in the form of a single address, and always work. Therefore, this practice is actually shunned.’

At the end of this posting, I will show compatible code, that does not cheat…

(Updated 3/22/2021, 3h55… )

## Trying to turn an ARM-64 -based, Android-hosted, prooted Linux Guest System, into a software development platform.

In a preceding posting I described, how I had used an Android app that does not require or benefit from having ‘root’, to install a Linux Guest System on a tablet, that has an ARM-64 CPU, which is referred to more precisely as an ‘aarch64-linux-gnu’ architecture. The Android app sets up a basic Linux system, but the user can use apt-get to extend it – if he chose a Debian 10 / Buster -based system as I did. And then, for the most part, the user’s ability to run software depends on how well the Debian package maintainers cross-compiled their packages to ‘AARCH64′. Yet, on some occasions, even in this situation, a user might want to write and then run his own code.

To make things worse, the main alternative to a pure text interface, is a VNC Session, based on ‘TightVNC’, by the choice of the developers of this app. On a Chromebook, I chose differently, by setting up a ‘TigerVNC’ desktop instead, but on this tablet, the choice was up to the Android developers alone. What this means is, that the Linux applications are forced to render purely in software mode.

Many factors work against writing one’s own code, that include, the fact that executables will result, that have been compiled for the ‘ARM’ CPU, and linked against Linux libraries!

But one of the immediate handicaps could be, that the user might want to program in Python, but can’t get any good IDEs to run. Every free IDE I could try would segfault, and I don’t even believe that these segfaults are due to problems with my Python libraries. The IDEs were themselves written in Python, using Qt5, Gtk3 or wxWidgets modules. These types of libraries are as notorious as the Qt5 Library, for relying on GPU acceleration, which is nowhere to be found, and one reason I think this is most often the culprit, is the fact that one of the IDE’s – “Eric” – actually manages to report with a gasp, that it could not create an OpenGL rendering surface – and then Segfaults. (:3)

(Edit 9/15/2020, 13h50: )

I want to avoid any misinterpretations of what I just wrote. This does not happen out of nowhere, because an application developer decided to build his applications using ‘python3-pyqt5′ etc… When I give the command:


# apt install eric



Doing so pulls in many dependencies, including an offending package. (:1) Therefore, the application developer who wrote ‘Eric’ not only chose to use one of the Python GUI libraries, but chose to use OpenGL as well.

Of course, after I next give the command to remove ‘eric’, I also follow up with the command:


# apt autoremove



Just so that the offending dependencies are no longer installed.

(End of Edit, 9/15/2020, 13h50.)

Writing convoluted code is more agreeable, if at the very least we have an IDE in front of us, that can highlight certain syntax errors, and scan includes for code completion, etc. (:2)

Well, there is a Text Editor cut out for that exact situation, named “CudaText“. I must warn the reader though, that there is a learning curve with this text editor. But, just to prove that the AARCH64-ported Python 3.7 engine is not itself buggy, the text editor’s plug-in framework is written in Python 3, and as soon as the user has learned his first lesson in how to configure CudaText, the plug-in system comes to full life, and without any Segfaults, running the Guest System’s Python engine. I think CudaText is based on Gtk2.

This might just turn out to be the correct IDE for that tablet.

(Updated 9/19/2020, 20h10… )