DIGG this crazy URL
Filed: Sun, Dec 03 2006 under Programming|| Tags: .htaccess url urls digg
Although DIGG didn't start the practice of using "clean urls" to access dynamic documents, they certainly did push the practice to the forefront of public attention and most people think of DIGG style headlines when anyone mentions using the article's headline as part of the document URL. Simply put, URLs have evolved from http://www.hunlock.com/showdoc.php?8372893aed to http://www.hunlock.com/blogs/DIGG_this_crazy_URL
Friendly to search engines and to users passing your url along to others, you can get DIGG style URLs with a simple modification of your server's .htaccess file and a small server side program to fetch the requested file.
To understand how "DIGG style urls" work, you have to understand what's going on when the web server processes a URL. Lets look at the URL for this article.
We all know that http://www.hunlock.com/ refers to the server. /blogs/ appears to be the directory that "DIGG_this_crazy_URL" is stored in. I say appears because in reality blogs is not a directory blogs is a document, just like any .htm file and just like any php file. That's right; in the root web directory of hunlock.com is a file called 'blogs' and it's not a directory.
The web server, on getting the request for this file checked the root directory and found a file called blogs, since it was not a directory it tried to figure out what to do with the file. Since there's no extension, the out of the box server would just throw up an error message. It's like trying to play a mp3 file without an mp3 extension.
So the first step in setting up digg style urls is to let the web server know what to do with these files. This is very easy to do. Go into your web server's root directory and edit (or create) a file called .htaccess and add the following lines.
<FilesMatch "^blogs$"> ForceType application/x-httpd-php </FilesMatch>
Now you can change "blogs" to be whatever you want but in this example we tell the web server that if it stumbles accross a file that is named "blogs" then treat that file like a php file. That is, even though blogs isn't named blogs.php, the web server will behave exactly as if blogs were named blogs.php.
Now when the web server sees:
It's going to see blogs as a file and, ignoring everything after blogs, will go ahead and process blogs as a php file.
Which leads us to blogs itself. Now because blogs doesn't have a file extension you may have some trouble with your html editors. HTML-Kit will let you deal with the file if you right click on it and tell it to import as text. PS-PAD, which I normally prefer, has the same option but always throws an application error. A solution is to rename the server-file blogs.php until you're ready to test it and then rename it to just 'blogs'.
blogs should look something like this...
<?php header("Content-Type: text/html; charset=UTF-8"); $uri = $HTTP_SERVER_VARS["REQUEST_URI"]; $fields = explode("/",$HTTP_SERVER_VARS["REQUEST_URI"]); $article_name = $fields[count($fields)-1]; echo("User requested: $article_name"); ?>
The first line (header) just sends back a standard type/charset header to the browser so whatever we output is treaded as a web page.
The second line gets the full URL that was used to call blogs. "http://www.hunlock.com/blogs/DIGG_this_crazy_URL" The third line "explodes" everything between the slashes into an array called "$fields" so $fields would be HTTP,  would be null,  would be hunlock.com and so forth.
Finally we get the article fromt he URL by counting the number of slashes in the URL subtracting one and pulling the value from the exploded URL in $fields.
Now that you have the file name you can do whatever you want with it from
querrying a database for the information to transforming the article name into a
file name and just doing a simple file dump. The possibilities are endless
and completely up to the limits of your imagination.