Sunday, March 24, 2013

SICP 2.63 - 2.65: Sets as binary trees

From SICP section 2.3.3 Example: Representing Sets

So far in this section we've looked at two ways of representing sets. First as unordered lists, then as ordered lists. Now we'll look at how we can represent sets as binary trees, and we'll see what advantages this representation has over ordered lists.

Each node of a tree holds one element of the set, called the "entry" at that node, and a link to each of two other (possibly empty) nodes. The "left" link points to elements smaller than the one at the node, and the "right" link to elements greater than the one at the node. The same set may be represented by a number of different trees. The only requirements for a valid representation is that all elements in the left subtree must be smaller than the node entry and all elements in the right subtree be must be larger than the node entry. Figure 2.16 in the text shows several valid tree representations of the same set of values.



Recall that for an ordered set of n elements, we had to search through (on average) n/2 elements to locate a particular value. We do this by searching through the elements in order. The advantage of a tree representation is that we can cut this effort down to log n if the tree is balanced.


Exercise 2.63 asks us if the following two procedures produce the same results for every tree, and if not how they differ.

(define (tree->list-1 tree)
  (if (null? tree)
      '()
      (append (tree->list-1 (left-branch tree))
              (cons (entry tree)
                    (tree->list-1 (right-branch tree))))))
     
(define (tree->list-2 tree)
  (define (copy-to-list tree result-list)
    (if (null? tree)
        result-list
        (copy-to-list (left-branch tree)
                      (cons (entry tree)
                            (copy-to-list (right-branch tree)
                                          result-list)))))
  (copy-to-list tree '()))

The tree->list-1 procedure checks to see if the tree passed in is null, and if so returns an empty list. If the tree is not null, it creates a list by appending the left branch of the tree, the element at the root node, and the right branch of the tree. Elements of the left and right branches are flattened into lists using recursive calls to tree->list-1. The tree->list-2 procedure defines a helper function copy-to-list that takes the tree and a result-list as arguments. When the tree is null, it returns the result-list that was passed in. The copy-to-list helper function also uses recursive calls to the left and right branches of the tree while building the final result list. These two procedures will produce the same results for every tree.

We're asked to test the two procedures on the trees in figure 2.16.

(define tree1 '(7 (3 (1 () ()) (5 () ())) (9 () (11 () ()))))
(define tree2 '(3 (1 () ()) (7 (5 () ()) (9 () (11 () ())))))
(define tree3 '(5 (3 (1 () ()) ()) (9 (7 () ()) (11 () ()))))

> (tree->list-1 tree1)
'(1 3 5 7 9 11)
> (tree->list-2 tree1)
'(1 3 5 7 9 11)
> (tree->list-1 tree2)
'(1 3 5 7 9 11)
> (tree->list-2 tree2)
'(1 3 5 7 9 11)
> (tree->list-1 tree3)
'(1 3 5 7 9 11)
> (tree->list-2 tree3)
'(1 3 5 7 9 11)

We can see from these results that both procedures return an in-order traversal for every tree.

We're also asked if the two procedures have the same order of growth for a balanced tree, and if not, which one grows more slowly?

We can see from the results above and from inspecting the two procedures that each node of the tree is visited one time by each algorithm. What happens at each of those n steps is subtly different though. The second procedure simply calls cons at each step, which we'll assume is a constant-time operation, so the tree->list-2 procedure has a time complexity of $O(n)$. The first procedure calls append at each step, which we saw in section 2.2.1 has the following definition:

(define (append list1 list2)
  (if (null? list1)
      list2
      (cons (car list1) (append (cdr list1) list2))))

From this definition we can see that the order of growth of append is proportional to the first list argument that's passed in. In the case of tree->list-1, the first list argument is the left branch of the tree, which is about half of a node's elements for a balanced tree. This means that for each recursive call, approximately half of the number of nodes will be in the first list argument as in the previous call. Since the number of elements is cut in half on each of the n calls to append, the tree->list-1 procedure has a complexity of $O(n log n)$ for a balanced tree.


Exercise 2.64 introduces the list->tree procedure, which converts an ordered list to a balanced binary tree using the helper procedure partial-tree that takes as arguments an integer n and list of at least n elements and constructs a balanced tree containing the first n elements of the list.

(define (list->tree elements)
  (car (partial-tree elements (length elements))))

(define (partial-tree elts n)
  (if (= n 0)
      (cons '() elts)
      (let ((left-size (quotient (- n 1) 2)))
        (let ((left-result (partial-tree elts left-size)))
          (let ((left-tree (car left-result))
                (non-left-elts (cdr left-result))
                (right-size (- n (+ left-size 1))))
            (let ((this-entry (car non-left-elts))
                  (right-result (partial-tree (cdr non-left-elts)
                                              right-size)))
              (let ((right-tree (car right-result))
                    (remaining-elts (cdr right-result)))
                (cons (make-tree this-entry left-tree right-tree)
                      remaining-elts))))))))

First we're asked to explain how partial-tree works, then draw the tree produced by list->tree for the list (1 3 5 7 9 11).

The partial-tree procedure works by dividing the list into three parts, a center element (the root node of the tree), everything before the center element, and everything after the center element. All the elements before the center element are then passed to a recursive call to partial-tree to create the left branch of the tree, and all the elements after the center element are passed recursively to partial-tree to create the right branch. These recursive call continue until no elements are remaining, and the balanced binary tree is assembled.

The tree produced by list->tree for the list (1 3 5 7 9 11) is:



To verify this, we can simply call the procedure.

> (list->tree '(1 3 5 7 9 11))
'(5 (1 () (3 () ())) (9 (7 () ()) (11 () ())))

Next we're asked what is the order of growth in the number of steps required by list->tree to convert a list of n elements? The procedure only needs to visit each element of the list once, and it only performs a cons for each element it visits, so the number of steps is proportional to the size of the list, or $O(n)$.


Exercise 2.65 asks us to use the results of the previous two exercises to give $O(n)$ implementations of union-set and intersection-set for sets implemented as (balanced) binary trees.

We implemented union-set for the unordered list representation of sets back in exercise 2.59. This implementation had to check all elements of one set for each element of the other, so it's complexity was $O(n^2)$, quite poor. We improved on this in exercise 2.62 when we wrote an implementation of union-set for the ordered list representation of sets, which was $O(n)$. The text supplied a similar implementation of intersection-set that was also $O(n)$. We could use these ordered set implementations as a guide to writing efficient implementations of union-set and intersection-set for balanced binary trees, but that wouldn't require the results of the previous two exercises. Instead, we can use the $O(n)$ implementations of all of the procedures we've built so far to perform the following steps:

  • Convert the balanced binary trees to ordered lists.
  • Perform the desired operation (union-set or intersection-set).
  • Convert the resulting ordered set back to a balanced binary tree.

(define (union-set tree1 tree2)
  (define (union-list set1 set2)
    (cond ((null? set1) set2)
          ((null? set2) set1)
          ((= (car set1) (car set2))
           (cons (car set1) (union-list (cdr set1) (cdr set2))))
          ((< (car set1) (car set2))
           (cons (car set1) (union-list (cdr set1) set2)))
          (else (cons (car set2) (union-list set1 (cdr set2))))))
  (list->tree (union-list (tree->list-2 tree1)
                          (tree->list-2 tree2))))

(define (intersection-set tree1 tree2)
  (define (intersection-list set1 set2)
    (if (or (null? set1) (null? set2))
        '()    
        (let ((x1 (car set1)) (x2 (car set2)))
          (cond ((= x1 x2)
                 (cons x1
                       (intersection-list (cdr set1)
                                          (cdr set2))))
                ((< x1 x2)
                 (intersection-list (cdr set1) set2))
                ((< x2 x1)
                 (intersection-list set1 (cdr set2)))))))
  (list->tree (intersection-list (tree->list-2 tree1)
                                 (tree->list-2 tree2))))

In the implementations above, I've just defined the earlier ordered set implementations of union-set and intersection-set as helper functions named union-list and intersection-list. With these helper functions, all union-set and intersection-set need to do is convert from tree to list and back from list to tree. We can define a few balanced trees to test that these new implementations work as expected.

> (define evens (list->tree '(0 2 4 6 8 10)))
> (define odds (list->tree '(1 3 5 7 9)))
> (define primes (list->tree '(2 3 5 7 11 13 17 19)))
>
> evens
'(4 (0 () (2 () ())) (8 (6 () ()) (10 () ())))
> odds
'(5 (1 () (3 () ())) (7 () (9 () ())))
> primes
'(7 (3 (2 () ()) (5 () ())) (13 (11 () ()) (17 () (19 () ()))))
>
> (union-set odds evens)
'(5
  (2 (0 () (1 () ())) (3 () (4 () ())))
  (8 (6 () (7 () ())) (9 () (10 () ()))))
> (union-set odds odds)
'(5 (1 () (3 () ())) (7 () (9 () ())))
> (intersection-set evens primes)
'(2 () ())
> (intersection-set odds primes)
'(5 (3 () ()) (7 () ()))
> (intersection-set odds evens)
'()


For links to all of the SICP lecture notes and exercises that I've done so far, see The SICP Challenge.