Creating a C++ Hash-Table quickly, that has a QString as its Key-Type.

This posting will assume that the reader has a rough idea, what a hash table is. In case this is not so, this WiKiPedia Article explains that as a generality. Just by reading that article, especially if the reader is of a slightly older generation like me, he or she could get the impression that, putting a hash-table into a practical C++ program is arduous and complex. In reality, the most efficient implementation possible for a hash table, requires some attention to such ideas as, whether the hashes will generally tend to be coprime with the modulus of the hash table, or at least, have the lowest GCDs possible. Often, bucket-arrays with sizes that are powers of (2), and hashes that contain higher prime factors help. But, when writing practical code in C++, one will find that the “Standard Template Library” (‘STL’) already provides one, that has been implemented by experts. It can be put into almost any C++ program, as an ‘unordered_map’. (:1)

But there is a caveat. Any valid data-type meant to act as a key needs to have a hashing function defined, as a template specialization, and not all data-types have one by far. And of course, this hashing function should be as close to unique as possible, for unique key-values, while never changing, if there could be more than one possible representation of the same key-value. Conveniently, C-strings, which are denoted in C or C++ as the old-fashioned ‘char *’ data-type, happen to have this template specialization. But, because C-strings are a very old type of data-structure, and, because the reader may be in the process of writing code that uses the Qt GUI library, the reader may want to use ‘QString’ as his key-data-type, only to find, that a template specialization has not been provided…

For such cases, C++ allows the programmer to define his own hashing function, for QStrings if so chosen. In fact, Qt even allows a quick-and-dirty way to do it, via the ‘qHash()’ function. But there is something I do not like about ‘qHash()’ and its relatives. They tend to produce 32-bit hash-codes! While this does not directly cause an error – only the least-significant 32 bits within a modern 64-bit data-field will contain non-zero values – I think it’s very weak to be using 32-bit hash-codes anyway.

Further, I read somewhere that the latest versions of Qt – as of 5.14? – do, in fact, define such a specialization, so that the programmer does not need to worry about it anymore. But, IF the programmer is still using an older version of Qt, like me, THEN he or she might want to define their own 64-bit hash-function. And so, the following is the result which I arrived at this afternoon. I’ve tested that it will not generate any warnings when compiled under Qt 5.7.1, and that a trivial ‘main()’ function that uses it, will only generate warnings, about the fact that the trivial ‘main()’ function also received the standard ‘argc’ and ‘argv’ parameters, but made no use of them. Otherwise, the resulting executable also produced no messages at run-time, while having put one key-value pair into a sample hash table… (:2)

 

/*  File 'Hash_QStr.h'
 * 
 *  Regular use of a QString in an unordered_map doesn't
 * work, because in earlier versons of Qt, there was no
 * std::hash<QString>() specialization.
 * Thus, one could be included everywhere the unordered_map
 * is to be used...
 */

#ifndef HASH_QSTR_H
#define HASH_QSTR_H


#include <unordered_map>
#include <QString>
//#include <QHash>
#include <QChar>

#define PRIME 5351


/*
namespace cust {
	template<typename T>
	struct hash32 {};

  template<> struct hash32<QString> {
    std::size_t operator()(const QString& s) const noexcept {
      return (size_t) qHash(s);
    }
  };
}

*/


namespace cust
{
	inline size_t QStr_Orther(const QChar & mychar) noexcept {
		return ((mychar.row() << 8) | mychar.cell());
	}
	
	template<typename T>
	struct hash64 {};
	
    template<> struct hash64<QString>
    {
        size_t operator()(const QString& s) const noexcept
        {
            size_t hash = 0;

            for (int i = 0; i < s.size(); i++)
				hash = (hash << 4) + hash + QStr_Orther(s.at(i));

            return hash * PRIME;
        }
    };
}


#endif  //  HASH_QSTR_H

 

(Updated 4/12/2021, 8h00… )

Continue reading Creating a C++ Hash-Table quickly, that has a QString as its Key-Type.

(Solved) How the Latest Wine-Staging Release, for Debian, Introduces a Packaging Problem: Unresolved Dependencies.

One piece of software which many Linux users benefit from is called “Wine”, which is an acronym that stands for ‘Wine Is Not an Emulator’ . This software allows some Windows programs to run under Linux, but has recently been made more powerful, to allow graphics-intensive games to run as well. The version of wine which I’ve been subscribing to is called ‘wine-staging’ , and contains the latest, bleeding-edge features that Wine is to contain, at any later point down the road.

But as I’ve been receiving updates to wine-staging, the latest update would have taken all my computers from version 4.0-rc2, to version 4.0-rc4, which means, ‘…Release Candidate 4′ . At that point, some of my computers could not install the update, because of unresolved package dependencies.

Apparently what’s been happening with Wine-Staging is twofold:

  1. The original maintainers are no longer running the project, which also means that Wine is in the midst of a Code Freeze, and that present updates may offer fewer or no new features, from what Linux users are used to, and
  2. In the interest of allowing the Windows games with the highest graphics requirements to run (potentially), the devs have nevertheless created a dependency of the latest Wine packages, on the OpenCL packages that install OpenCL natively under Linux.

‘OpenCL’ is a user-side API, which also requires system-side drivers for the specific graphics hardware, and which allows greatly parallel computations to be carried out on the GPU, instead of on the CPU. Apparently, some games use either ‘OpenCL’ or ‘CUDA’ these days, in order to incorporate complex Physics simulations into the game.

(I will assume that the version of CUDA which Wine emulates, requires OpenCL to be installed on the side of Linux.)

The problem under Debian which I’ve detected is, that the newly-introduced dependency of ‘wine-staging-amd64′ (as an example) is simply mainly:

‘ocl-icd-libopencl1′

Which means, that Wine will now insist on using the generic OpenCL drivers, that were developed for and by the Linux community, while not allowing the use of proprietary OpenCL drivers, that are closed-source, and that were developed by the manufacturers of each graphics card.

The problem with this is, that as users, we may only have one or the other OpenCL driver-set installed. We cannot install both the generic and the proprietary drivers. When we try to update Wine-Staging now, doing so will try to install the package above, which will also try to ‘kick out’ , any proprietary packages we may have installed, that provide OpenCL.

Continue reading (Solved) How the Latest Wine-Staging Release, for Debian, Introduces a Packaging Problem: Unresolved Dependencies.

Understanding the long long data-type with C++

One of the conventions which I was taught when first learning C++, was that, when declaring an integer variable, the designations were to be used, such as:

 


short int a = 0;  // A 16-bit integer
int b = 0;  // A 32-bit integer
long int c = 0;  // Another, 32-bit integer

 

But, since those ancient days, many developments have come to computers, that include 64-bit CPUs. And so the way the plain-looking declarations now work, is:

 


short int a = 0;  // A 16-bit integer
int b = 0;  // A 32-bit integer
long int c = 0;  // Usually, either a 32-bit or a 64-bit, depending on the CPU.
long long d = 0;  // Almost always, a 64-bit integer

 

The fact is not so surprising, that even on 32-bit CPUs, modern C++ runtimes will support 64-bit values partially, because support for longer fields, than the CPU registers, can be achieved, using subroutines that treat longer numbers similarly to how in Elementary School, we were taught to perform multiplication, long division, etc., except that where people were once taught how to perform these operations in Base-10 digits, the subroutine can break longer fields into 32-bit words.

In order to avoid any such confusion, the code is sometimes written in C or C++, like so:

 


uint16_t a = 0;
uint32_t b = 0;
uint64_t c = 0;

 

The ability to put these variables into the source-code does not pose as much of a problem, as the need which sometimes exists, to state literals, which can be assigned to these variables. Hence, a developer on a 64-bit machine might be tempted to put something like this:

 


uint64_t c = 1L << 32;

 

Which literally means, ‘Compute a constant expression, in which a variable of type (long) is left-shifted 32 bits.’ The problem here would be, that if compiled on a 32-bit platform, the literal ‘1L’ just stands for a 32-bit long integer, and usually, some type of compiler warning will ensue, about the fact that the constant expression is being left-shifted, beyond the size of the word in question, which means that the value ‘0’ would result.

If we are to write source-code that can be compiled equally-well on 32-bit and 64-bit machines, we really need to put:

 


uint64_t c = 1ULL << 32;

 

So that the literal ‘1ULL’ will start out, as having a compatible word-size. Hence, the following source-code will become plausible, in that at least it will always compile:

 

#include <cstdlib>
#include <iostream>

using std::cout;
using std::endl;

int main(int argc, char* argv[]) {

	long int c = 1234567890UL;

	if (sizeof(long int) > 7) {
		c = (long int) 1234567890123456ULL;
	}

	cout << c << endl;

	return 0;
}

 

The problem that needed to be mitigated, was not so much whether the CPU has 32-bit or 64-bit registers at run-time, but rather, that the source-code needs to compile either way.

Dirk

 

I’ve just tweaked my kernel memory a little.

According to This previous posting of mine, I had set aside 270MiB of RAM, just in case the kernel needed it, on the computer I name ‘Phoenix’, to keep track of user-space programs, watching files and directories. I had also taken A Quick Glance at the way 64-bit kernel, virtual memory is organized under Linux, differently from how 32-bit virtual addresses were. Given the additional fact, that I only want to set aside 128MiB of user-space RAM, To cache my blog’s Web-pages, I decided that the earlier amount of kernel-memory was too large.

And so now I’ve reduced the amount, for ‘INotify’, to a mere 135MiB.

This computer has 4GiB total RAM.

Elaboration:

On 32-bit systems, the entire address space only had 4GB, but under no circumstances was the kernel expected to take up more than 1 GB. The computer ‘Phoenix’ is a 64-bit system, but one which only has 4GB of physical RAM. Therefore, even though it’s a 64-bit system, its kernel should still not take up more than 1GB. Yet, if I preallocate 270MB, just to perform one task, I’m pushing this system’s kernel closer to using up 1GB for real. So just for this one task, I’ve reduced this one commitment to 135MB.

Dirk