|
Dan's Web Tips:URLs[<== Previous] | [Up] | [Next ==>] URLs (Uniform Resource Locators) are the standardized means of addressing pages in the Web. There are two basic types of URLs: absolute and relative. They each have their place for use in links in your Web sites. (As an aside, "URL" can be pronounced like "earl" or like "You Are 'Ell". This makes a problem figuring out whether to write "a URL" or "an URL"; which is correct depends on how you expect it to be pronounced. I decided on "a URL" for this document.) NOTE: These days, it's fashionable among Web purists to say "URI" (Uniform Resource Identifier) instead of "URL". Technically, a URI (presumably pronounced like the name of the psychic known for bending spoons) is any short string leading to a resource that is acceptable for use on the Web, while a URL is a specific kind of URI that identifies a specific protocol for retrieving the resource. URNs (Uniform Resource Name), presumably pronounced like a "Grecian Urn" (What's a Grecian Urn? About 50 drachmae.), are yet another kind of URI that isn't a URL, intended to provide a more stable method of addressing a resource that wouldn't be dependent on specific protocols or network addresses -- several URN schemes are defined now, but browsers are slow to implement them. An Internet draft (no longer online where I linked to it before) proposed a few more additions to this family -- URPs, URTs, and URVs. YET ANOTHER NOTE: In the above acronyms, the "U" is sometimes construed as standing for "Universal" rather than "Uniform". Absolute URLsDefinition: Absolute URLs specify the location of a Web page in full, and work identically no matter where in the world you are. Absolute URLs have the following form:
The first part, separated by a colon (:) from the rest of the URL, is the protocol, usually http for HyperText Transport Protocol, though other protocols such as ftp and gopher are sometimes used. For secure-server sites using an encrypted protocol, https is used as the protocol identifier. Next comes the hostname (domain name or IP address), preceded by a double slash (//). It seems to be a common misconception that the colon and double slash are an inseparable delimiter terminating the protocol -- for instance, the Mozilla team posted an online document regarding their implementation of irc:// URLs. Actually, the colon is the terminator of the protocol section, and the double slash is used to introduce a hostname or other site identifier (varying somewhat by protocol, with some less-common protocols taking things other than domain names in this section) and is absent in URIs lacking a hostname like mailto: and news: URLs. After that is the directory path to the Web page you're accessing, with forward slashes (/) separating directory levels (not backslashes (\) like in DOS/Windows systems). Pedantic Note: Actually, as many purists will tell you, it's not true that the "path" portion of a URL is necessarily a directory path. Servers can be configured to interpret a URL path any way they like, which might not necessarily correspond to any actual subdirectory tree. Sites generated dynamically from databases may use URL paths that have nothing to do with directory structures. However, most Web servers do use URLs corresponding to the file structure, so that's what I'll assume for this document. Finally, optionally, there is a "fragment identifier" separated by a pound (#) sign from the rest of the URL, indicating that the link is to an anchor within a document (if this is omitted, the link is to the top of the page). (Technically, the fragment identifier isn't actually part of the URL, but an addendum to it, because it isn't sent to the server; it's used by the browser to go to the appropriate part of the retrieved page once it is loaded.) There are a few special protocols with URLs of differing syntax. mailto: is followed with an e-mail address to create a link allowing users to send mail to that address. news: is followed by the name of a newsgroup (e.g., comp.infosystems.www.authoring.html) to let the user follow the link to see the newsgroup's messages (if the user's browser is configured to access a news server). Both of these URL types do not have slashes (single or double) in them; the syntax looks like mailto:webmaster@webtips.dan.info, not mailto://webmaster@webtips.dan.info/; developers used to the more common http: syntax often put extra slashes in these URLs and cause them to fail. (More information on mailto: URLs is in my page on e-mail.) Note that you can't leave out the protocol and use www.somewhere.com as a link URL without the http://. This syntax works when you're typing in a URL in most browsers, but in a link within your Web site it will be interpreted as a relative URL to a file named "www.somewhere.com" in the current directory. Are URLs case sensitive?Technically, yes. You should always be consistent in your use of upper or lower case in your URLs. Even in cases where the upper and lower case versions go to the same resource, you're imposing an unnecessary burden on browsers that need to retrieve and cache two copies of the same thing if they go to two variants of the same URL. As far as whether you can vary the case and still get the same resource, this depends. The protocol and hostname are not case sensitive, so you can write http://www.dan.info/ or HTTP://www.dan.info/ and they'll work identically. However, the directory and filenames may be case sensitive depending on what operating system the server is running under (UNIX is case-sensitive, while Windows isn't). Fragment names are case-sensitive. So be careful to match the directory, file, and anchor names in your links to the case of the actual files and anchors. Can I include spaces in my URLs?
No, the space is not a legal character in URLs. Spaces, and a number of other special
characters, must be encoded by using a percent sign (%) followed by a two character
hexadecimal number giving the character's position in the ASCII or ISO LATIN-1 encoding.
A space is represented as
Some Web servers might have file systems that allow documents with names containing
spaces, but if you use files with such names, their URLs will contain Relative URLsDefinition: Relative URLs are context-sensitive, giving a path with respect to your current location. There are several types of relative URL.
Which Type of URL Should You Use?TIP: Use absolute URLs when linking to a different site, and relative URLs when linking within your site. Within your site, it's best to use relative URLs, because this will allow you to move the entire site to a different location without having to change all the internal links. Avoid the forms of relative URL starting with slashes, as they are relative only to the root of the server and will become incorrect if you move to a different place in the full directory tree. However, the forms without leading slashes will work identically no matter where the site is relocated.
Use absolute URLs when linking to other sites. You may wish to consider
even some other pages you created yourself to be "other sites" for this
purpose, if they're part of a completely different logical grouping from
the current site and there's a chance one set of your pages will be
relocated while the other stays put. So, if you have two sites, at
Links
[<== Previous] | [Up] | [Next ==>]
This page was first created 10 Aug 1997, and was last modified 04 May 2008.
|