Intertwining Web pages within a site can become tedious without an understanding of Uniform Resource Locators (URLs) specifications.
After completing this lesson, you should be able to:
Define several types of commonly used URLs, such as HTTP, FTP, and mailto
Define advanced URL types such as Usenet and File
HTTP
HTTP URLs are by far the most common type because they point to other documents on the Web. HTTP is the protocol that World Wide Web servers use to transfer information to browsers. HTTP URLs follow this basic URL form:
If the URL ends in a slash, the last part of the URL is considered a directory name. The file that you get using a URL of this type is the default file for that directory as defined by the HTTP server, usually a file called index.html. (If the Web page you're designing is the top-level file for all of a directory's files, calling it index.html is a good idea.)
You also can specify the filename directly in the URL. In this case, the file at the end of the URL is the one that is loaded, as in the following examples:
Using HTTP URLs such as the following, where mycompany is a directory, is also usually acceptable. In this case, because mycompany is a directory, this URL should have a slash at the end:
Most Web servers can figure out that you meant this to be a directory and redirect to the appropriate file. Some older servers, however, might have difficulties resolving this URL, so you should always identify directories and files explicitly and make sure that a default file is available if you're indicating a directory.
Anonymous FTP
FTP URLs are used to point to files located on FTP servers. Anonymous FTP servers can log you in using anonymous as the login ID and your email address as the password. FTP URLs also follow the standard URL form, as shown in the following examples:
Because you can retrieve either a file or a directory list with FTP, the restrictions on whether you need a trailing slash at the end of the URL aren't the same as with HTTP. The first URL here retrieves a listing of all the files in the mycompany directory. The second URL retrieves and parses the file homepage.html in the mycompany directory.
NOTE
Although your browser uses FTP to fetch the file, if it's an HTML file, your browser will display it just as it would were it fetched using the HTTP protocol. Web browsers don't care how they get files. As long as they can recognize the file as HTML, either because the server explicitly says that the file is HTML or by the file's extension, browsers will parse and display that file as an HTML file.
Browsers can either display the file if they know what kind of file it is or just save the file to disk.
Non-Anonymous FTP
All the FTP URLs in the preceding section are used for anonymous FTP servers. You also can specify an FTP URL for named accounts on an FTP server. In this form of the URL, the username part is your login ID on the server, and password is that account's password:
Note that no attempt is made to hide the password in the URL. Be very careful that no one is watching you when you're using URLs of this form — and don't put them into links that someone else can find!
The URLs that you request might be cached or logged somewhere, either on your local machine or on a proxy server between you and the site you're connecting to. For that reason, it's probably wise to avoid using this type of non-anonymous FTP URL altogether.
Mailto
The mailto URL is used to send electronic mail. If the browser supports mailto URLs, when a link that contains one is selected the browser will prompt you for a subject and the body of the mail message, and send that message to the appropriate address when you're done. The mailto URL is different from the standard URL form:
Depending on how the user's browser and email client are configured, mailto links might not work for them.
If your email address includes a percent sign (%), you'll have to use the escape character %25 instead. Percent signs are special characters to URLs.
Unlike the other URLs, the mailto URL works strictly on the client side. The mailto link just tells the browser to compose an email message to the specified address. It's up to the browser to figure out how that should happen.
Most browsers will also let you add a default subject to the email by including it in the URL. When the user clicks on the link, most browsers will automatically stick Sales Report in the subject of the message.
You can also include more than one email address in a mailto URL along with cc and bcc email addresses:
Usenet Newsgroups
Usenet news URLs have one of two forms:
news:name_of_newsgroup — This is used to read an entire newsgroup. If your browser supports Usenet news URLs (either directly or through a newsreader), it'll provide you with a list of available articles in that newsgroup.
news:message-id — This enables you to retrieve a specific news article. Each news article has a unique ID called a message ID, which usually is something like
To use a message ID in a URL, remove the angle brackets and begin the URL with news:
Both forms of URL assume that you're reading news from an NNTP server, and they can be used only if you have defined an NNTP server somewhere in an environment variable or preferences file for your browser. Therefore, news URLs are most useful simply for reading specific news articles locally, not necessarily for using in links in pages.
NOTE
News URLs, like mail URLs, might not be supported by all browsers.
File
File URLs are intended to reference files contained on the local disk. In other words, they refer to files located on the same system as the browser. For local files, file URLs take one of these two forms: an empty hostname (three slashes rather than two) or with the hostname as localhost. Depending on your browser, one or the other will usually work:
File URLs are similar to FTP URLs. In fact, if the host part of a file URL is not empty or localhost, your browser will try to find the given file by using FTP. Both of the following URLs result in the same file being loaded the same way:
Probably the best use of file URLs is in startup pages for your browser (which are also called home pages). In this instance, because you'll almost always be referring to a local file, using a file URL makes sense.
The problem with file URLs is that they reference local files, where local means on the same system as the browser pointing to the file — not the same system from which the page was retrieved!
If you use file URLs as links in your page, and someone from elsewhere on the Internet encounters your page and tries to follow those links, that person's browser will attempt to find the file on their local disk (and generally will fail).
Also, because file URLs use the absolute pathname to the file, if you use file URLs in your page you can't move that page elsewhere on the system or to any other system.
If your intention is to refer to files that are on the same file system or directory as the current page, use relative pathnames rather than file URLs.
With relative pathnames for local files and other URLs for remote files, you shouldn't need to use a file URL at all.