Supercharged Javascript

Filed: Sun, Feb 04 2007 under Programming|| Tags: .htaccess php compress concatenate javascript

As frameworks like prototype, jquery, and YUI become more and more popular, external javascript loads grow dramatically -- dragging down the performance of your page. Compressing the javascript statically (covered in Compressed HTML Makes Your Pages Zippy) is one option to regain page-loading efficiency, however if you change one file you'll need to make a new compressed copy and that's quite a bit of maintenance work.

Pre-compression also doesn't solve a multiple connection issue. That is for each external javascript library your page uses, the page will need to open and initiate a new http connection and current best practice theory says that you want to limit, as much as possible, the number of connections your web page initiates on page load.

Introduction

If you have access to an apache web-server and php scripts, there is a simple, elegant solution that will let you turn this...

<!-- Namespace source file -->  
<script src = "yahoo.js" ></script> 
 
<!-- Dependency source files -->  
<script src = "dom.js" ></script> 
<script src = "event.js" ></script> 
<script src = "dragdrop.js" ></script> 
 
<!-- Slider source file -->  
<script src = "slider.js" ></script>    

Into this...

<script src = "scripts/yahoo.js, dom.js, event.js, dragdrop.js, slider.js"></script>

All of the requested files will be dynamically concatenated on the server, compressed and sent as one single file (and thus one single http connection) back to the browser. The script will use the last modified date of the latest modified file in the list as cache control, so after loading the file list the browser's cache will still work. This means that if you ask for the same list of files later and you haven't modified the files on the server, the browser will use its cached version.

This also means that if you modify your javascript source files, the next time that file is requested as part of a collection, the script will automatically create a new concatenation and compression file for that collection and your visitors will download the new collection instead of using the cached copy in their browser.

Upload it and forget it -- the script will handle everything for you automatically.

Modify .htaccess

To begin, you'll need to modify your .htaccess file (If you don't have one make sure you're using the apache web server, if you are go ahead and create one). Simply add the following block to the file.

<FilesMatch "^scripts$">
   ForceType application/x-httpd-php 
</FilesMatch>

What this does is tell the web server that if it comes across a file named "scripts" it should treat it like a PhP program. That is "scripts" will run as a php program without needing to be named "scripts.php". Once you've installed this then a url of http://myserver.com/scripts/somescript.js will LOOK like scripts is a directory but in reality the web server will see scripts as a program and ignoring everything in the URL after scripts, will go ahead and run it. Inside the scripts program we can take the full URL and break it out.

This is an extremely useful tool. I use variations of this to do Clean URLs and to do basic Hotlink Protection with cookies in addition to the referring url.

Create scripts

The next step is to create your scripts file. Some editors like HTML-Kit and PS-PAD with built in ftp clients have trouble working with server files without extensions, if this is the case simply create a file named scripts.php and when you're ready just rename it to plain old scripts. Here is the program nicely commented so you can follow along and modify it as you need.

<?php
   // This script has been placed in the public domain.
   // Use it and modify it however you wish.

   // Disable zlib compression, if present, for duration of this script.  
   // So we don't double gzip 
   ini_set("zlib.output_compression", "Off");

   //Set the content type header
   header("Content-Type: text/javascript; charset=UTF-8"); 

   // Set the cache control header
   // http 1.1 browsers MUST revalidate -- always
   header("Cache-Control: must-revalidate");     

   // Here we're going to extract the filename list.
   // We just split the original URL with "/" as the pivot
   $expl = explode("/",$HTTP_SERVER_VARS["REQUEST_URI"]);
   // Do a little trimming and url decoding to change %20 into spaces.
   $fileList = trim(urldecode($expl[count($expl)-1]));
   // And explode the remainder out with "," as the pivot to get our list.
   $orgFileNames = explode(",",$fileList);
   
   // $fileNames now is an array of the requested file names.

   // Go through each of the files and get its last modified time so we
   // can send a last-modified header so caching works properly
   $newestFile = 0;
   $ii=0;
   $longFilename = ''; // This is generated for the Hash
   $fileNames = Array();
   for ($i=0; ($i < count($orgFileNames)); $i++) {
      $orgFileNames[$i] = trim($orgFileNames[$i]);  // Get rid of whitespace
      if (preg_match('/\.js$/i',$orgFileNames[$i])) { // Allow only files ending in .js in the list.
         $fileNames[$ii++]=$orgFileNames[$i];         // Valid file name, so go ahead and use it.
         $longFilename .= $orgFileNames[$i];          // Build our LONG file name for the hash.
         $lastMod = @filemtime($orgFileNames[$i]);    // Get file last modified time
         if ($lastMod > $newestFile) {                // Is this the newest file?
            $newestFile = $lastMod;                   // Yup, so mark it.
         }
      } 
   }

/////////////////////////////////////////////////////////////////////////////
// Begin *BROWSER* Cache Control

   // Here we check to see if the browser is doing a cache check
   // First we'll do an etag check which is to see if we've already stored
   // the hash of the filename . '-' . $newestFile.  If we find it
   // nothing has changed so let the browser know and then die.  If we
   // don't find it (or it's a mismatch) something has changed so force
   // the browser to ignore the cache.

   $fileHash = md5($longFilename);       // This generates a key from the collective file names
   $hash = $fileHash . '-'.$newestFile;  // This appends the newest file date to the key.
   $headers = getallheaders();           // Get all the headers the browser sent us.
   if (ereg($hash, $headers['If-None-Match']))  {   // Look for a hash match
      // Our hash+filetime was matched with the browser etag value so nothing
      // has changed.  Just send the last modified date and a 304 (nothing changed) 
      // header and exit.
      header('Last-Modified: '.gmdate('D, d M Y H:i:s', $newestFile).' GMT', true, 304);
      die();
   }

   // We're still alive so save the hash+latest modified time in the e-tag.
   header("ETag: \"{$hash}\"");

   // For an additional layer of protection we'll see if the browser
   // sent us a last-modified date and compare that with $newestFile
   // If there's no change we'll send a cache control header and die.

   if (isset($headers['If-Modified-Since'])) {
      if ($newestFile <= strtotime($headers['If-Modified-Since'])) {
         // No change so send a 304 header and terminate
          header('Last-Modified: '.gmdate('D, d M Y H:i:s', $newestFile).' GMT', true, 304);
          die();
       }
   }

   // Set the last modified date as the date of the NEWEST file in the list.
   header('Last-Modified: '.gmdate('D, d M Y H:i:s', $newestFile).' GMT');

// End *BROWSER* Cache Control
/////////////////////////////////////////////////////////////////////////////


/////////////////////////////////////////////////////////////////////////////
// Begin File System Cache Control

   // Attempt to open a cache file for this set.  (This is the server file-system
   // cache, not the browser cache.  From here on out we're done with the browser
   // cache. 
   $fp = @fopen("cache/$fileHash.txt","r");
   if ($fp) {
      // A cache file exists but if contents have changed delete the file pointer
      // so we re-process the files like there was no cache
      if ($newestFile>@filemtime("cache/$fileHash.txt")) { fclose($fp); $fp=false;}
   }
   if (!$fp) {
      // No file pointer exists so we create the cache files for this set.
      // for each filename in $fileNames, put the contents into $buffer
      // with two blank lines between each file.
      $buffer='';
      for ($i=0; ($i < count($fileNames)); $i++) {
         $buffer .= @file_get_contents($fileNames[$i]) . "\n\n";
      }
      // We've created our concatenated file so first we'll save it as
      // plain text for non gzip enabled browsers.
      $fp = @fopen("cache/$fileHash.txt","w");
      @fwrite($fp,$buffer);
      @fclose($fp);
      // Now we'll compress the file (maximum compression) and save
      // the compressed version.
      $fp = @fopen("cache/$fileHash.gz","w");
      $buffer = gzencode($buffer, 9, FORCE_GZIP);
      @fwrite($fp,$buffer);
      @fclose($fp);
   }

// End File System Cache Control
/////////////////////////////////////////////////////////////////////////////


/////////////////////////////////////////////////////////////////////////////
// Begin Output 

   if (strstr($_SERVER['HTTP_ACCEPT_ENCODING'], 'gzip')) {
      // Browser can handle gzip data so send it the gzip version.
      header ("Content-Encoding: gzip");
      header ("Content-Length: " . filesize("cache/$fileHash.gz"));
      readfile("cache/$fileHash.gz");
   } else {
      // Browser can't handle gzip so send it plain text version.
      header ("Content-Length: " . filesize("cache/$fileHash.txt"));
      readfile("cache/$fileHash.txt");
   }

// End Output -- End Program
/////////////////////////////////////////////////////////////////////////////

?>

Create a "cache" directory

Create a sub-directory named "cache" in the directory where scripts resides. scripts will store cached versions of your concatenated and concatenated+compressed collections and in the cache directory. Make sure your webserver has permission to write files in this directory. scripts will write two files for each filename set it encounters. The filename is the hash (md5) of all the filenames followed by either .txt (the plain text version for browsers which can't handle compressed files) and .gz (the compressed output for browsers which can handle compressed files). They'll look something like this in your cache directory:

cache/4410ec34d9e6c1a68100ca0ce033fb17.gz
cache/4410ec34d9e6c1a68100ca0ce033fb17.txt

If you change or modify any of the javascript source files, new cache files will be generated automatically the next time the file is requested.

Using Your New Powers

That's pretty much all there is to it. Now anytime you specify:

<script src = "scripts/file1.js, file2.js, file3.js"></script>

…the files will automatically be concatenated and compressed on the server for you. Just remember that scripts is not a directory so in this example you'll need to keep all your script files in the root directory, although it would be trivial to modify scripts so you could keep all your scripts in another sub-directory.

The only thing you should be careful of is to try and keep the lists in the same order. If you have one page with …

<script src = "scripts/file1.js, file2.js, file3.js"></script>

… and a second page with …

<script src = "scripts/file3.js, file2.js, file1.js"></script>

then the server and browser are going to treat them as two separate files even though they all have the same content. Just try to do a little alphabetical sorting in your lists and keep the spacing consistent (always use one space, or always use no spaces, but never mix and match). If you keep the file-lists consistent then your browser's cache will work properly and that's one of the most important considerations in keeping your web-pages zippy.


Addendum

Based on user feedback, the following changes were made to scripts after this article was first published. Thanks to everyone who sent in these suggestions for making the script more powerful and robust.

  1. The zend compression library was disabled during the execution of the script to avoid conflicts.
  2. The "if-none-match" header was changed to a generic 304 response instead of a http 1.1 reply.
  3. The OB_ library was removed in favor of cached zip and text version of the files.