Saturday, January 3, 2009

Summing a Sequence

In yesterday's post about Karl Gauss, I recounted an anecdote about how Gauss, at the age of 10, found a quick way to sum the numbers from 1 to 100. (It's the second paragraph. I'll wait.) Today I want to compare the solution that Gauss came up with to a seemingly more direct solution to the same problem.

Young Gauss' classmates probably added each number in the sequence to a running total. A programmer might solve the problem in much the same way using a loop.

int total = 0;
for(int n = 1; n <= 100; ++n) {
total += n;
}

This solution works well for small values of n if you're using a computer, but if you're a young child with a slate and some chalk, young Gauss' solution is clearly superior. Even using a computer this solution won't be very efficient if the value of n gets large enough (what is the sum of the first googol integers, for example), since the time complexity of this solution is linear to the size of the input, or O(n), in more formal terms.

In the sequence sum problem, the "n" in O(n) represents the number of integers the students had to sum. In more abstract terms, we say that n is the size of the input to the function. Another way of looking at it is that the naive algorithm will always take more time to solve when given a larger input, because the number of integers you have to add together increases.

Gauss hit upon a much better algorithm for solving this problem. Many of you probably recognize the triangle numbers from a high-school or college algebra class you took:

1 = 1
1 + 2 = 3
1 + 2 + 3 = 6
1 + 2 + 3 + 4 =10
1 + 2 + 3 + 4 + 5 = 15
1 + 2 + 3 + 4 + 5 + 6 = 21
1 + 2 + 3 + 4 + 5 + 6 + 7 = 28
...

These are exactly the numbers you're calculating when you add up all the integers from 1 to some number. You might also remember that there's a simple formula for calculating the nth triangle number.

$T_n = n(n + 1) / 2$

Testing the formula out on a case we already know, we get

$T_{100} = 100 \times 101 / 2 = 5050$

Like magic it works!

Actually, it's not magic at all. I haven't really proven anything by showing that the formula works for one value, but you can read the proof by induction to see why it will work for all natural numbers.

This is exactly the same formula that Gauss had hit upon. Remember he had

$50 \times 101 = 5050$

which, for n = 100 is nothing more than our triangle number formula, slightly rearranged.

$(n / 2)(n + 1) = T_n$

The time complexity of calculating T(n) using the formula is constant, or O(1). No matter how big a number you insert for n, you still have to do one addition (an increment, really), one division, and one multiplication. The amount of time it takes to calculate $T_{10}$ is the same as the amount of time it takes to calculate $T_{10,000,000,000}$.

So what can we learn from all of this? The first thing you should learn (if you're a programmer) is that loops are always suspect. I'm not saying anything outrageous here, like "all loops are bad," but I will say that if you have a simple loop in your code that does something very predictable with a numerical sequence, there might be a way to replace it with a much simpler formula. Second, it's surprising how much you can learn from a 10 year old kid. Of course, it helps an awful lot if that 10 year old is Karl Friedrich Gauss.

Note: I'd like to thank Tim Kington for proofreading an initial draft of this post. Any technical errors are all his fault.

2 comments:

Anonymous said...

Excellent post! I always like things like this. One problem I always see people do are searching for certain characters within a string, involving one loop to loop through the string, and another loop to go through the alphabet, and then count (O n*n), instead of building a hash table of values and comparing that to the character you are looking for.

Bill the Lizard said...

I had a really similar interview question once. The interviewer asked, "If I have two lists of names, A and B, and I want to find all the names in list A that also appear in list B, what do I do?" I think you know the right answer. :)