Recursion -- See Recursion

Filed: Mon, Jan 29 2007 under Programming|| Tags: methodology quicksort sort recurssion xml javascript

Recursion is one of those programming concepts that separate "REAL" programmers from script kiddies. "Real" programmers can do their taxes with regular expressions, write a full blown operating system in native assembly (the REAL REAL programmers do it in binary), and real programmers grok recursion. Recursion isn't exclusive to the high gurus of programming however -- it's actually a very simple concept to use and understand.

Introduction

Recursion is the process where a function will call itself. It's that simple. Unfortunately most textbooks and even the infamous Wikipedia tend to make this a lot more complicated that it really is.

Recursion, in mathematics and computer science, is a method of defining functions in which the function being defined is applied within its own definition. The term is also used more generally to describe a process of repeating objects in a self-similar way. For instance, when the surfaces of two mirrors are almost parallel with each other the nested images that occur are a form of recursion. Wikipedia

I've been using recursion ever since I wrote my first Mandelbrot fractal program in my teens and I have a tough time pulling useful information from that Wikipedia definition. It really, really is not that difficult.

Recursion to travel a Linked List

Lets say we have a data structure with some data in it and a pointer to another element with the same exact structure, just different data in it.

+----------+   +----------+   +----------+
| data: 15 |-->| data: 10 |-->| data: 20 | 
+----------+   +----------+   +----------+

Specifically, in javascript, this data structure (known as a linked list)

var nodeList= {};              // Declare an empty object
nodeList.value=15; 
nodeList.next=null;

newNode= {};                   // Create a new node
newNode.value=10;
newNode.next=null;

nodeList.next=newNode;         // Insert newNode into the nodeList

finalNode= {};                // Create a new node
finalNode.value=20;
finalNode.next=null;

nodeList.next.next=finalNode; // Insert finalNode into the nodeList

Now lets say we create a function...

function processNode(node) {
   if (node.next != null) {
      processNode(node.next)
   }
   outputStr+= node.value+'<br>';
}

A node is a single cell of a data set, a single record if you will. Given the above data structure we pass processNode the top record of our data set in this case the one with a value of 15.

processNode(nodeList);

Now this node (value 15) has a pointer to another node (value 10) and since processNode already knows how to process nodes of this data-set, we just go ahead and call processNode again only this time passing it the reference to the node with the data "10" in it, and then subsequently we pass the final node with a value of 20 to processNode.

Even though processNode was FIRST called with the node with a value of 15, our outputStr which we are building will have as its first value the value 20. This is because we called processNode again and then again before we added 20 to the build string. You can see this as the process is broken down step by step.

processNode(node with data=15)
processNode(node with data=10)
processNode(node with data=20)
outputStr+=20
return to calling process (processNode with data=10)
outputStr+=10
return to calling process (processNode with data=15)
outputStr+=15
End processNode function
outputStr's value = 20, 10, 15

If we'd like to reverse the output of the display all we need to do is to change the placement of where we build our outputStr. This simple change -- building the outputStr before we check for other "nodes" will reverse the order of the output.

function processNode(node) {
   outputStr+= node.value+'<br>';
   if (node.next != null) {
      processNode(node.next)
   }
}

The new step-by-step process looks like this...

processNode(node with data=15)
outputStr+=15
processNode(node with data=10)
outputStr+=10
processNode(node with data=20)
outputStr+=20
return to calling process (processNode with data=10)
return to calling process (processNode with data=15)
End processNode function
OutputStr's value = 15, 10, 20

The key to understanding what is going on in recurssion is to understand that processNode knows how to deal with the individual records in our data structure. Since it knows how to deal with one record it knows how to deal with all of them and so instead of writing many different instances of processNode we just have it call itself, passing along another record.

The key to using recursion is to understand you'd use it just like you use any other function. If it helps, think of the function as having an exact external copy of itself. You're coding away, doing your own thing and then you get to a point where you'd call a function in this case the function just happens to be itself. Procedurally, there's no difference in calling the same function and calling an external function which just happens to do the same thing.

Using Recursion to Solve Factorials

Factorials are a classical means to illustrate recursion because the factorial of a number is the value of multiplying everything before the number. So the factorial of 5 is 1*2*3*4*5 and the factorial of 10 is 1*2*3*4*5*6*7*8*9*10. Now because of the overhead of calling functions, recursion isn't the most efficient way to generate a factorial, it is, however, the most efficient way to generate a tutorial ;)

This code is in javascript since it will be used to generate an interactive experience after the break.

function factorial(n) {
   if(n <= 1) {
      return 1;
   } else {
      return n * factorial(n - 1);
   }
}

As you can see this is very simple stuff! Just pass factorial a number, say 5 and it will compute the value of 5*4*3*2*1. if (n <= 1) is our break point (officially this is known as a base case -- the point at which the recursive calls stop and the function starts to "unwind" so to speak), if we've reached 1 then the function simply returns 1 without calling itself again thus starting the "unwinding" process.

To see what's going on with this function, click here to start the interactive demo and enter a number between 1 and 20.

Using Recursion to Crawl XML

Any data type which is formatted into tree-like structures are a candidate for recursion, either from collapsible message boards (email and Usenet threads are a great example of this) or processing XML data. XML especially is a great candidate for recursion. Here's a spot of code which will traverse any xml document you give it from xml data to an rss feed -- it will even traverse most HTML if you pass document -- crawlXML(document).

function crawlXML(doc) {                             // Crawls an XML document
   if(doc.hasChildNodes()) {                         // If present element has children
      _xmlStr+='<ul><li>'+doc.tagName+'> ';       // Display current tag name
      for(var i=0; i<doc.childNodes.length; i++) {   // for each child node on current level
         crawlXML(doc.childNodes[i]);                // Call this function recursively
      }                                              // end for loop
      _xmlStr+='<\/li><\/ul>';                         // Close the list item.
   } else {                                          // current element has no children
      _xmlStr+=doc.nodeValue;                        // So display the value of the data
   }                                                 // End childNode check
}                                                    // End crawlXML

As you can see, if the current "node" has child elements then crawlXML calls itself to deal with the child elements, otherwise it outputs the value of the current node.

Using Recursion to Sort (quicksort)

Recursion is also the cornerstone of the "quicksort" algorithm.

sortArray = Array(5,8,9,10,38,81,24,83,22,21,1,4,63,73,42,88);

function qSort(sortArray, left, right)
{
  var scratch_left = left;         // remember current left
  var scratch_right = right;       // remember current right 
  var pivot = sortArray[left];   // remember current value
  while (left < right) {  
    while ((sortArray[right] >= pivot) && (left < right))   {
       right--;
    }
    if (left != right)  {
      sortArray[left] = sortArray[right];
      left++;
    }
    while ((sortArray[left] <= pivot) && (left < right)) {
      left++;
    }
    if (left != right) {
      sortArray[right] = sortArray[left];
      right--;
    }
  }
  sortArray[left] = pivot;
  pivot = left;
  left = scratch_left;
  right = scratch_right;

  if (left < pivot)   { qSort(sortArray, left, pivot-1);  }
  if (right > pivot)  { qSort(sortArray, pivot+1, right); }
}

qSort(sortArray, 0, sortArray.length-1);

To get a visual idea of how this sort is working, here is a wikipedia image of the sorting process.

If you'd like a very in-depth view of quicksort and how it works the Wikipedia entry is a great place to start.

Javascript Recursion in the Browser

In the browser, javascript will limit recursion to 1,000 iterations -- that is a function may only call itself 1,000 times before it needs to start "unwinding". This is to prevent infinite loops (IE a process where a node points to itself or two nodes point to each other). 1,000 is actually a fairly low number and as a result will limit the size of data files you can work with inside the browser. Fortunately most other languages will allow you to create as much recurssion as you need to get the job done.

Recursive Imagry

recursive image Remember that wikipedia quote up at the top of the page about the two parallel mirrors? Well there are better ways to demonstrate recursion in imagry. First, unfortunately, there is this, rather scary actually, image of David Hasselhoff.

The second example is the more benign, and certainly safer image to the right. This illustrates what's known as a "Droste" effect.

The Droste effect is a Dutch term for a specific kind of recursive picture. A picture exhibiting the Droste effect depicts a smaller version of itself in a place where a similar picture would realistically be expected to appear. This smaller version then depicts an even smaller version of itself in the same place, and so on. Technically this can go on forever, but practically it continues as long as the resolution of the picture allows, which is relatively short since each iteration exponentially reduces the picture's size.

The term Droste effect was coined by the poet and columnist Nico Scheepmaker, at end of the 1970s. It is named after Droste, a Dutch brand of hot chocolate, whose box has a picture of a nurse carrying a serving tray with a cup and a box of the same brand of hot chocolate. --Wikipedia

And the Droste effect is of course another example of recursion.

Tail Recursion

Tail recursion is a condition where you should strongly question the need to use recursion. Specifically it's when the recursive call happens at the very END of the function and nothing occurs afterwards. If this is the case, in languages which do not do this for you automatically -- which are most all of them -- you should consider changing the recursive function into a loop. The reason for this is that function calls are more inefficient and use more resources than loops so the ability to recognize tail recursion and code around it is an important consideration in designing your programs.

The example factorial function above is an example of tail recursion because the recursive call is the last command inside the function. So the most efficient way to express this function (when you are not demonstrating recursion of course) is as follows...

// recursive
function factorial(n) {
   if(n <= 1) {
      return 1;
   } else {
      return n * factorial(n - 1);
   }
}

// iterative
function factorial(n) {
  var result=1;
  for (var i=1; i <= n; i++) {
   result = result * i;
  }
  return result;
}

On the otherhand, even though quicksort does its recursive calls at the end, there are two seperate calls as such:

  if (left < pivot)   { qSort(sortArray, left, pivot-1);  }
  if (right > pivot)  { qSort(sortArray, pivot+1, right); }

Since "(right > pivot)" comes after (left < pivot) and there is a possibility both might be executed this is not tail recursion because of the possibility that "(right > pivot)" might need to execute after "(left < pivot)".

Likewise the XML crawler function is not an example of tail recursion because its recursive call happens in the middle of the function.

Conclusion

Recursion actually isn't very common in day to day programming. I often think that XML became so buzz-worthy because all these programmers with textbook college educations were so shocked and astonished to discover something they sweated bullets to master could actually be used in the real world. However in situations where recursion can be used it is definitely one of the more valuable tools in the programmer's arsenal, and now, hopefully one of your tools as well.