Site hosted by Angelfire.com: Build your free website today!


What's with all the strange notations in URLs like %20, %2b, and %3f?



type:              trick
reliability:        - very high
understandability:  - high
time saving:        - not applicable**
usefulness:         - low
difficulty:         - easy
required skill:     - intermediate
overall:           26 of 40 points, 


1. The basics



If you consider many URLs, you'd see something like:

http://google.yahoo.com/bin/query?p=ulillillia&hc=0&hs=0

If you were to do a search for "ulillillia" in google, you'd get this URL. What does it all mean at the end? The question mark "?" indicates that variables that make the page are after this point. p= is a specific variable [in this case, the search query is "ulillillia" without the quotes.]. The equal sign indicates what the specific variable is equal to sort of like mathematics. The ampersand [&] is an indication that separates different variables. There are other variables beyond that. However, some URLs have something like %2b in them. This indicates special characters that you usually can have in file names. The quote, when used in a search engine is noted as %22. Doing a search for "searching ulillillia" with quotes yields this URL:

http://google.yahoo.com/bin/query?p=%22searching+ulillillia%22&hc=0&hs=0

As before, it's got the ?, & and ='s involved, but this time it has %22 added. When special characters are introduced, it's encoded in a two-digit hexadecimal number. %22 is character number 34. Spaces are noted as %20. Ever notice that all URLs *NEVER* have a space [except, maybe, if it was an HTML file stored on your system [it, in fact, still never has a space, however, it seems like it as it shows up in the address field] ? When there is one, %20 appears.

2. The entire %xx list to %7e



Here's the complete list of special characters and their corresponding textual character:



There are no characters before %20 [the one before would be %1f, and that character is a special character. I don't know what it is. Beyond %7e are other special characters that are viewable. For example, you don't have the † and ‡ symbols you often find on my site as footnotes. If you were to type in %86 and %87 in the URL where the "%22" is, you'll see these characters. You'll notice that the the hexadecimal number directly corresponds to the character map*. Try typing %f7 and see what you get. It's a symbol you should almost immediately recognize.

3. Encoding your own URLs



If you have an E-mail account with Hotmail and click a link inside that message, you may see this encoding in the URL. If I were to send you a message with the link, and you opened it, this is what you'll see [without all the other stuff at the beginning preceding this]:

http%3a%2f%2fwww%2eangelfire%2ecom%2fjournal2%2fulillillia%2ftipsntricks%2fURLtrick%2ehtml

This is the encoding of this page you're viewing. Writing that out normally, you'll see:
https://www.angelfire.com/journal2/ulillillia/tipsntricks/URLtrick.html

If you were to fully encode that URL with the %## thing, it would end up very long. The homepage, "https://www.angelfire.com/journal2/ulillillia/index.html" is noted as:

%68%74%74%70%3a%2f%2f%77%77%77%2e%61%6e%67%65%6c%66%69%72%65%2e%63%6f%6d%2f%6a%6f%75%72%6e%61%6c%32%2f%75%6c%69%6c%6c%69%6c%6c%69%61%2f%69%6e%64%65%78%2e%68%74%6d%6c

Long isn't it, and that's just the homepage! See what happens when you copy and paste that URL into your browser. It doesn't work, does it? Well, there is something weird if you paste:

%77%77%77%2e%61%6e%67%65%6c%66%69%72%65%2e%63%6f%6d%2f%6a%6f%75%72%6e%61%6c%32%2f%75%6c%69%6c%6c%69%6c%6c%69%61%2f%69%6e%64%65%78%2e%68%74%6d%6c

into your browser's address. It gives a 404 error, but if you put the mouse arrow over the link with the encoding, it says "https://www.angelfire.com/journal2/ulillillia/index.html", but gives a 404. If you type it in manually, it works as "https://www.angelfire.com/journal2/ulillillia/index.html". Weird, isn't it? That's something strange I stirred up. Try encoding your own URLs without the http:// beginning and they'll match the exact URL. However, you'll only log 404's every time.

Now try pasting this into your browser:

http://%77%77%77%2e%61%6e%67%65%6c%66%69%72%65%2e%63%6f%6d%2f%6a%6f%75%72%6e%61%6c%32%2f%75%6c%69%6c%6c%69%6c%6c%69%61%2f%69%6e%64%65%78%2e%68%74%6d%6c

That should take you directly to the index page without a 404.

Footnotes:
* The character map is a program that comes with windows that lists all 224 characters that you can use in text. To see it, click on start, then run. Type "charmap.exe" into the field and okay it. That's it!
** This is only useful for decoding complicated URLs to get hidden info out of them and it's also something just for fun. That is, this field doesn't apply to this specific document.