People have generally discovered on today’s Web, that even if they just right-click on any item and choose “Copy Link Location” as offered by their browser, they get a URL of such a sort, that next, when they embed that URL into a posting or into a site of their own, within some time the real URL changes and the one they copied / dragged no longer works.
This is why URLs / Hyperlinks have long been reinvented in the form of so-called “Permalinks”.
All my postings and pages offer Permalinks to the reader. On a blog posting, right-clicking or tap-holding on the Title, will offer to Copy the Permalink. For pages, right-clicking or tap-holding on the entry In the Sidebar or Header of my Main Page, will do same.
(Edit 01/14/2017 : This blog is generated by a collection of PHP server-scripts – aka CGI scripts – and uses HTML-5. This is a form of HTML which advises content-providers against using the old formatting tags for Italic, Underscore and Bold, instead suggesting Strong or Emphasized.
Because this blogging engine observes HTML-5 100%, the pieces of text which seem underlined, are generally URLs which the reader can click on.
When we use a Desktop or Laptop, our browser allows us to hover over these hyperlinks with our mouse, and displays a bubble which describes what sort of link it is.
But, when we use a tablet or a smart-phone to read a Web-page, there is no hover-support, because we usually do not use a Bluetooth Mouse. In such a case, some readers might have overlooked the fact, that each underlined segment of text is in fact a link, unless they were to tap on that link.
Generally, readers do not tap on random places within pages of text, unless they already know that the pages contain hyperlinks.)
(Edit 12/22/2018 : )
In fact, this development does not contradict older knowledge, about how hyperlinks, embedded into HTML-4, worked. The general piece of markup language to insert any hyperlink has traditionally been an ‘Anchor Tag’, and the following would be an example of how it’s formatted:
<a href="https://www.google.ca/">Arbitrary Text</a>
What would always display in the reader’s browser according to the code above is “Arbitrary Text”. And yet, if the treader clicked on that text, he’d be redirected to ‘https://www.google.ca/’ .
In fact, hyperlinks that displayed as an actual URL, were generally examples in which the Web-designer had entered the same piece of text twice, once as the ‘HREF’ parameter, and again, as the Arbitrary Text.
What has changed, however, is the introduction of Cascading Style Sheets, also referred to in short as ‘Styles’. So, the way in which ‘Arbitrary Text’ was highlighted before, to indicate to the reader that it was in fact click-able, was determined by the browser, before Styles, as a familiar blue, underlined format, which could change to some other color, if it pointed to a URL which the reader had visited before.
Through the use of Styles, the appearance of ‘Arbitrary Text’ can be changed, to whatever the Web-designer chooses. And in the case of the present WordPress Theme, the choice was made by other coders than me, to keep the Style of these links as unobtrusive as possible. But unfortunately, they are so unobtrusive, that some readers may simply think that they form underlined text.
(Edit 06/05/2017 : )
I suppose that there is another piece of information which I can offer, which actually describes permalinks.
According to more-old-fashioned thinking in HTML and Web-design, a site is organized into folders, which contain either HTML Files or CGI-Scripts. URLs would have put the folder-names into their path, leading up to the file-name of either the HTML File or the CGI-Script.
According to that rule, the following URL should have a nonsensical meaning:
According to ages-old wisdom, my site has as its root folder, a folder named ‘blog’, and it supposedly has a sub-folder to that one, named ‘archives’, and another sub-folder to that one named ‘3051’. Further, it would seem that our URL does not specify what file, or what type of file to open, either belonging to the folder ‘archives’ or belonging to the folder ‘3051’.
This way of representing URLs is used often today, by sites that actually manage a large collection of pages. The reason these URLs work, is the fact that before executing server-side CGI-Scripts, sophisticated Web-servers apply “Rewrite Rules”. These Rewrite Rules are specific to one site – such as to my blog – and consist of ‘Regular Expressions’, by which the server recognizes patterns in the URL, and by which it replaces the pattern systematically with another pattern. So the above URL gets rewritten by my Web-server, to do exactly as the following URL does, without your browser getting to see that this happens:
What the browser would be requesting with the above URL, is that the default CGI-Script for the root folder be executed on the server, using the GET-Method, and setting the parameter ‘p’ to the value ‘3051’.
The latter types of (more old-fashioned) URLs generally also work, with two main drawbacks:
- They are humanly undecipherable,
- Web-Masters are likely to change them. If they were to change, these would no longer qualify as permalinks.
‘WordPress’ offers its bloggers a selection of types of permalinks, including:
Again, there is no file in any folder ‘…/2017/06/’ by the name ‘Linear-Predictive-Coding’. But, subscribers to newspapers and some blogs would benefit from this rewrite rule, because if they were Copying and Pasting numerous URLs, they would be able to tell at a glance, which one was which. This could be more useful to some readers, than just to see that one of them was ‘…/3051′. And so this specific feature, of ‘A Descriptive URL’, has also become synonymous in some people’s minds, with the concept of ‘Permalinks’.
If the reader needs to know in greater detail how this works, This is an external explanation, specific to WordPress. It highlights the fact that PHP-Scripts can access what the original URL was, using the environment (array) variable
'REQUEST_URI', which indexes the array
(Update 08/15/2015 : )
This last detail is important, because it means that the CGI-script, regardless of whether it’s written in the language ‘PHP’ or not, has access to the original URL’s text, and is therefore able to analyze that to whatever level of complexity required, in order to determine what HTML document to send to the browser, when that URL is opened from the browser.
Therefore, even with Rewriting, the URL remains a mechanism to pass parameters from the browser, to the CGI-script.
(As Of 06/05/2017 : )
I suppose I should add another detail. When a ‘WordPress’ blogger changes his Permalink-Type, because the site’s PHP-Scripts are generating URLs as references to its own components, the blogger is mainly changing what type of permalinks the site is generating. But in general, the rewrite rules and CGI-Scripts are flexible enough, to parse any of them, if they arise as a URL. Only, what the scripts are also coded to do is recognize, If the requested URL is using a different type of permalink, from the currently-selected type. If so, the script makes sure that a permalink of the current type displays in the browser’s URL-field.
The reason this is done, is the fact that some readers will actually use the current URL which the browser sees, when they Copy and Paste URLs, and in this case the blogger would rather have it as well, that his currently-chosen type of permalinks are received by the reader.
This can be accomplished, when the server sends to browser an HTML-page which is mainly empty, except for a Header, which instructs the browser to request a redirect, with a time-delay of zero (= Client-Pull).
(Update 08/15/2018 : )
Given the reality that rewrite-compatible URLs are both commonplace today, and versatile, I suppose that the question could be asked, of why then, the older ‘GET’ Method is still used at all. And my answer would be as follows:
But, browsers are not able to provide URLs suitable for rewriting, natively, when the user Submits a simple HTML Form. And in some cases, simplicity in Web-design, such as by using the Form-tags in order to collect some user-originated information, still trumps the day. And it often still does so, either through the ‘GET’ or the ‘POST’ Method.
One idea which I can visualize happening today though, is just slightly more-complex, than either using the ‘POST’ or the ‘GET’ Method:
A subscriber’s Web-page could first submit some text to the server via the ‘POST’ Method, and the HTML which the corresponding server-script prints, would both have the naked URL of this script, and contain a redirect request to the browser, the URL of which already points to a document kept on the server. This second URL could either be a ‘GET’ URL, or a permalink, meant to be served by a different, second server-script, but meant always to serve up the same, on-line content, identified by a shorter ID-code in the URL.
This last idea would be in-keeping, with the present fashion-trend, by which personal devices and messages mainly communicate either URLs or URIs, and by which much of our content would be kept on cloud-servers. But this last idea automatically introduces complexity, in the form of Access Control issues, because in certain cases, one subscriber would presumably not want any other subscriber, to be able to fetch the first subscriber’s document.
Believe it or not, even those sorts of problems often have solutions in Computer Science, although the solutions that exist, also tend to be more complex than what most regular users might care to imagine. Specifically, the ‘GET’ Method appends information to the actual URL, which even in the case of ‘SSL’ or ‘TLS’, may not be encrypted, yet, the requests which such URLs make, need to be secured. Similarly, ‘AJAX’ will often send requests to a server-script, that are not encrypted by default, yet need to be secured.
And, even if the assumption was made that a client always establishes an encrypted Web-socket to the server, before even specifying the URL to retrieve, the following fact should be considered:
In the server log files, The URL of every HTTP request, regardless of whether that came in as an http:// or as an httpS:// URL, is logged in full. This would mean that If credit-card numbers or passwords had ever been submitted using the ‘GET’ Method, they would end up written in server-logs, in clear-text, for potentially anybody to read !
I believe that the way this (validation, not encryption) problem is solved most often, is similar in its nature to how Challenge-Response Authentication works. In principle, two components are needed for this to work:
- A piece of data which is a shared secret between the client and the server – for example,
- A smaller piece of data, which will simply never be reused. This smaller piece of data may be communicated openly.
In short, the smaller piece of data
gets appended to the shared secret, and the result hashed. In my linked posting, my main focus was to use a date-time stamp as the ‘challenge’. But in certain cases, the actual date-time stamp may be considered too lengthy to communicate, and instead, a single 32-bit integer may be used, which always advances (by 1). In such a case, this integer is also referred to as a ‘Nonce’. The ‘GET’ URL needs to contain:
- Whatever field identifies the piece of information to be retrieved,
- The smaller, non-repeating piece of data (The Nonce),
- The hash-code.
What the server needs to do is:
- Look up the object to be retrieved in a database and find out which User owns it,
- Look up what the last successfully-used Nonce
for that Userwas, which the submitted Nonce must be greater than,
- Recompute the hash-code based on the additional, shared secret
of the User, and compare the result with the submitted hash-code,
- If successful, enter the submitted Nonce as the last, successfully-used Nonce
for the User.
I suppose that this scheme could get messed up in some way, if the same User had more than one Session with the server at the same time, and, at some point the last, successfully-used Nonce must be communicated to the client to begin with, let’s say when the client logged in, or when the client pulled up a page, from which the ‘GET’ Request, or the ‘AJAX’ Request is to be made. More correctly, If the assumption can be made that such a scheme was to run entirely from a Web-browser, then Web-applications can be written which assume that the last-used Nonce can be communicated to the client securely via SSL, so that when the client makes a request to the server-script, (that Nonce +1) can be used to compute the Hash-Code, which will be visible. But according to the same assumption, the SSL-secured HTML document can also just communicate a new shared secret to the client at any time, so that the Nonce would not need to become a large number.
This scheme could be made more viable, if the shared secret used was not actually the password. One reason would be the degree of distrust that many users would have, of having transmitted a hash-code of their password, regardless of how secure the hashing-algorithm is supposed to be. Another is the fact that as I just described it, the solution would not scale well enough. The solution presented on this page might need to be applied to a large number of client-devices, that are not always Web-browsers.
Instead, the shared secret could be a Session-Key, that is held uniquely by any one client-device, for any one User. The modified set of data which the ‘GET’ URL needs to submit would become:
- Whatever fields identify the piece of information to be retrieved,
- The Session ID,
- The Nonce,
- (If ‘Bcrypt’ is being used,) The Level Of Difficulty with which the Hash was computed,
- The Hash-Code.
Instead of looking up the last-used Nonce as an entry that belongs to one User, the server would need to look up the last-used Nonce as an entry belonging to one Session…
Further, if the hashing algorithm to use happened to be ‘Bcrypt‘, this is an algorithm that can hash [1..72] bytes of text, with an explicitly-supplied 16-byte “Salt”, to arrive at a 24-byte Hash-Code. In this case, the Nonce could be fed to the algorithm, as the Salt to use. But just to avoid any possible overlap with other uses, I would add an established constant such as 231 to the Nonce named in the ‘GET’ URL, or in the ‘AJAX’ Call, to arrive at the Salt which is actually fed to ‘Bcrypt’.
(Update 08/16/2018 : )
I should add, that if the purpose of the challenge-response approach is to create mere URLs or URIs, then an important design-objective is, to keep the actual cyphers as short as possible. Therefore, even though I know that ‘Bcrypt’ allows for a (16-byte) Salt, the maximum unsigned (4-byte) value is close to 4 billion, which takes 10 decimal-digits to express. If the URLs only contain short decimal-notation values, then the system is not broken. But, if the URL actually needed to state a 20 decimal-digit value, then I’d consider this system to be broken. So I wouldn’t conclude that I actually need to use 8 bytes, out of 16 available bytes. The most-significant bits would simply remain zeroes.
Along the same lines, even though ‘Bcrypt’ generates a (24-byte == 192-bit) Hash Code, nothing would prevent a Software Engineer from only using a (16-byte == 128-bit) sub-field of the original Hash-Code, as if that was the Hash-Code. Doing so would keep these validated URLs shorter, and might even thwart some yet-unknown attacker’s attempts, to break the Hash Code, because such a hypothetical attacker would actually be missing 8 bytes from the final Hash-Code used.
(Edit 08/18/2018 : )
I suppose that there’s another observation I should add, about how this suggested scheme would need to work. The possibility exists, that the ‘URL’, and that therefore, the credentials which the client submits, are cryptographically correct, but that the ‘GET’ Request is faulty for some other reason, so that the client will receive some sort of erroneous reply. The problem with this situation is, that even though the client did not achieve what it wanted, a hypothetical attacker could nevertheless observe working credentials, which such an attacker could possibly replay, in a ‘URL’ that actually does something.
The ideal solution to this problem might be, that the server always respond with a code, that allows the client to recognize that his credentials were invalid, as opposed to his credentials having been valid, and therefore used up, but with some sort of other error taking place. Thus, an HTTP ‘401 – Unauthorized’ code might be meaningful in this context.
But I would expect that what gets done in practice can be kept simpler, than what might be done ideally. On the server, any credentials which were submitted, and which were cryptographically correct, could cause the Nonce to be updated. But on the client, any request which was actually made, could cause the Nonce to be advanced.
This means that the Nonce which is stored on the client might be ahead of the Nonce which is stored on the server, by more than (1). But because, as soon as a cryptographically successful request is submitted to the server again, at a later point in time, the Nonce that is stored on the server will also eventually be updated to the one submitted, the ensuing consequences can be mitigated. And, the approach which I suggested really only requires that the submitted Nonce be greater than the one which was stored on the server. I never made any stipulations on, by how much so.
That type of problem would have fewer consequences, if the challenge of the challenge-response authentication, was actually based on the time-stamp, on the client, and if the additional stipulation of the server was merely, that the submitted time-stamp not be any earlier that the server-time, by more than 5 minutes.