Sunday, July 1, 2012

SICP 2.61 - 2.62: Sets as ordered lists

From SICP section 2.3.3 Example: Representing Sets

In exercises 2.59 and 2.60 we looked at how to represent sets as unordered lists. In the next two we'll look at representing sets as ordered lists, and at what benefits we get from the new implementation.

First, if elements in a set are guaranteed to be in order, we can improve the performance of element-of-set by returning false as soon as we encounter an element of the set that's less than the value that we're looking for. All of the remaining elements are guaranteed to be smaller as well. This saves us on average half the number of comparisons from the previous implementation.
(define (element-of-set? x set)
  (cond ((null? set) false)
        ((= x (car set)) true)
        ((< x (car set)) false)
        (else (element-of-set? x (cdr set)))))
We get a more impressive improvement from intersection-set. In the old implementation using unordered lists we had to compare every element of each set to every element in the other set. This had a complexity of $O(n^2)$. With the elements of both sets in order we can reduce that all the way down to $O(n)$. This is possible due to the fact that if the car of one ordered set is smaller than the car of another ordered set, it has to be smaller than all the other elements and we can discard it right away.
(define (intersection-set set1 set2)
  (if (or (null? set1) (null? set2))
      '()    
      (let ((x1 (car set1)) (x2 (car set2)))
        (cond ((= x1 x2)
               (cons x1
                     (intersection-set (cdr set1)
                                       (cdr set2))))
              ((< x1 x2)
               (intersection-set (cdr set1) set2))
              ((< x2 x1)
               (intersection-set set1 (cdr set2)))))))
Exercise 2.61 asks us to give an implementation of adjoin-set using the ordered representation. Like element-of-set?, our implementation of adjoin-set should require only half as many steps on average as the unordered representation.

The original implementation of adjoin-set made use of element-of-set? to check and see if the new element was already a member of the set. We no longer need to do this since we need to find the exact position in the set to insert the new element. As we scan through the set looking for the right position, we can simply return the set if we encounter the element we're trying to place.
(define (adjoin-set x set)
  (cond ((null? set) (cons x '()))
        ((= x (car set)) set)
        ((< x (car set)) (cons x set))
        ((> x (car set)) (cons (car set)
                               (adjoin-set x (cdr set))))))
There are several different conditions, and we need to cover them all. First, since we're no longer using element-of-set, we need to check to see if the set itself is null or empty. Second, if the we encounter the element we're trying to add, we can just return the set. Next, if the element we're adding is smaller than the first element of the set, we can simply cons the new element to the beginning of the set. Last, if the new element is greater than the first element of the set, we join the first element followed by the adjoin-set of the new element and the rest of the set.
> (adjoin-set 3 null)
(3)
> (adjoin-set 3 '())
(3)
> (adjoin-set 3 '(3 4 5))
(3 4 5)
> (adjoin-set 3 '(4 5 6))
(3 4 5 6)
> (adjoin-set 3 '(1 2 4 5))
(1 2 3 4 5)
> (adjoin-set 3 '(0 1 2))
(0 1 2 3)
Like element-of-set?, this implementation should only scan through half the set on average before finding the correct insertion point for the new element.

Exercise 2.62 asks us to give a $O(n)$ implementation of union-set for sets represented as ordered lists. Since the two sets are in order, we can simply step through each set comparing the first elements of each. At each step we decide which of the first two elements to place in the resulting set.
(define (union-set set1 set2)
  (cond ((null? set1) set2)
        ((null? set2) set1)
        ((= (car set1) (car set2))
         (cons (car set1) (union-set (cdr set1) (cdr set2))))
        ((< (car set1) (car set2))
         (cons (car set1) (union-set (cdr set1) set2)))
        (else (cons (car set2) (union-set set1 (cdr set2))))))
Anyone familiar with the mergesort algorithm will quickly recognize that this implementation of union-set is almost exactly the same procedure as the merge subroutine. It's purpose is to quickly merge two sorted lists into one. The only difference is that the union-set implementation above only keeps one copy of duplicate elements.
> (define odds '(1 3 5 7))
> (define evens '(2 4 6 8))
> (define primes '(2 3 5 7))
> (union-set odds evens)
(1 2 3 4 5 6 7 8)
> (union-set odds primes)
(1 2 3 5 7)
> (union-set evens primes)
(2 3 4 5 6 7 8)

For links to all of the SICP lecture notes and exercises that I've done so far, see The SICP Challenge.