A Weakness in Exponential-Golomb Encoding

I have written before, about variable-length encoding, and one observation I had made then, was that Huffman Encoding of value-pairs was only really strong, as long as we do not need to encode consecutive non-zero values, or consecutive values significantly greater than zero. And so I had suggested Exponential-Golomb encoding as an alternative.

I also see a problem with that, in the way it was prescribed ages ago.

The way Exponential-Golomb works, is that the value of zero is actually encoded as a single bit equal to (1). Non-zero values are as if written out by hand in binary, so that their first bit is always equal to (1), and then the length in bits is signaled to the decoder with a prepended series of zeroes, numbering the length of the actual integer. This prefix of zeroes is one possible form of unary encoding.

The problem I see, is with how we are supposed to encode integers that may or may not be negative. In the classical approach, we should interleave the negative integers with the positive, and then encode as though we were encoding non-negative values. Doing so increases the length of non-zero integers by one bit in average.

What nobody seemed to point out, was that doing so will also lengthen the unary prefix by one zero, on average, to signal this longer unsigned integer. So on average, this approach will actually lengthen the variable-length bit-sequence by two bits.

The main exception will be with (+1), which continues to be represented by 2 bits. Yet, if the values are signed, then I would guess that a (-1) will have an equal probability of occurring as (+1), and (-1) will be encoded as 4 bits, thus forming and average length of 3 bits, and merely breaking even with what I am about to suggest.

And so as far as I am concerned, it makes greater sense just to Exponential-Golomb-encode the absolute, and if this absolute is not zero, to append a sign bit. Doing so will only lengthen the variable-length sequence by one bit. (+1) and (-1) would both be encoded as 3 bits, and (+2), (+3), (-2) and (-3) would all be encoded as 4 bits.