Section 2.3.3 shows us several ways to represent sets in Scheme, starting with unordered sets. To start, we define what a 'set' is by specifying the operations that are to be used on sets. These are:
element-of-set?
- a predicate function that determines whether a given element is a member of a set.adjoin-set
- takes an object and a set as arguments and returns the set that contains the elements of the original set and the adjoined object.intersection-set
- computes the intersection of two sets, which is the set containing only elements that appear in both arguments.union-set
- computes the union of two sets, which is the set containing each element that appears in either argument.
We represent a set as an unordered list by making sure no element appears more than once. The text provides the following implementations for representing sets as unordered lists.
(define (element-of-set? x set) (cond ((null? set) false) ((equal? x (car set)) true) (else (element-of-set? x (cdr set))))) (define (adjoin-set x set) (if (element-of-set? x set) set (cons x set))) (define (intersection-set set1 set2) (cond ((or (null? set1) (null? set2)) '()) ((element-of-set? (car set1) set2) (cons (car set1) (intersection-set (cdr set1) set2))) (else (intersection-set (cdr set1) set2))))
We can define a few lists to test and see how these procedures work together.
> (define odds '(1 3 5 7)) > (define evens '(2 4 6 8)) > (define primes '(2 3 5 7)) > (element-of-set? 5 odds) #t > (element-of-set? 5 evens) #f > odds (1 3 5 7) > (adjoin-set 9 odds) (9 1 3 5 7) > (intersection-set odds evens) () > (intersection-set odds primes) (3 5 7) > (intersection-set evens primes) (2)
Exercise 2.59 asks us to implement the
union-set
operation for the unordered-list representation of sets.(define (union-set set1 set2) (cond ((null? set1) set2) ((null? set2) set1) ((element-of-set? (car set1) set2) (union-set (cdr set1) set2)) (else (cons (car set1) (union-set (cdr set1) set2)))))
This implementation starts by checking to see if either set is null. If so, the other set is returned immediately. The procedure then checks to see if the first element of
set1
is an element of set2
. If so, that element is discarded from the first set, and the union of the remaining elements of set1
and set2
are returned. If the first element of set1
is not an element of set2
, it is included in the list returned by the procedure.> (union-set odds evens) (1 3 5 7 2 4 6 8) > (union-set odds primes) (1 2 3 5 7) > (union-set evens primes) (4 6 8 2 3 5 7)
Exercise 2.60 asks us what would need to change in the implementations above if we redefine 'set' to allow duplicate elements. For example, the set {1, 2, 3} could be represented as the list (2 3 2 1 3 2 2). We need to compare the efficiency of each of the new procedures with their original (non-duplicate) implementations. The exercise also asks if there are applications where the new representation would be preferred over the original.
The implementation of
element-of-set?
doesn't need to change. It returns #t
when it finds an element that matches the input element, otherwise it returns #f
.
The implementation of adjoin-set
is simplified by the new definition. Since we no longer need to check to see if the input element already exists in the set, we can just cons
the element to the existing set.(define (adjoin-set x set) (cons x set))
Like
adjoin-set
, union-set
is also simplified by allowing duplicate elements.(define (union-set set1 set2) (append set1 set2))
We have an interesting choice when it comes to
intersection-set
. The intersection of two sets under the original (non-duplicate) definition doesn't seem like it requires any change in implementation since duplicates are now allowed, not necessarily required. However, look what happens when you execute intersection-set
with duplicate elements in the first set vs. the second set.> (define primes '(2 3 5 7)) > (define twos '(2 2 2)) > (intersection-set primes twos) (2) > (intersection-set twos primes) (2 2 2)
The result is different depending on which set has duplicate elements. This is because the original implementation checks each element of the first set independently to see if they're present in the second set. When the first set contains duplicates, so will the result. Since the instructions in the exercise are ambiguous (and being the typical lazy programmer that I am), I'm going to say that this implementation does not need to change.
Since
element-of-set?
and intersection-set
haven't changed, neither will their performance. Since adjoin-set
and union-set
no longer need to check for duplicate elements, the performance of both of these procedures is improved. The number of steps required by adjoin-set
has gone from $O(n)$ to $O(1)$. The number of steps required by union-set
has gone from $O(n^2)$ to $O(n)$.The penalty that we pay for these performance improvements is that sets now require more memory to accommodate duplicate elements. This representation would be preferred over the original in cases where we don't care about that added memory overhead, and where most of our operations are either
adjoin-set
or union-set
.Related:
For links to all of the SICP lecture notes and exercises that I've done so far, see The SICP Challenge.