Sunday, August 3, 2008

Implementing equals in Java

Update: I made reference to Josh Bloch's Effective Java in this article. Those references are to the first edition of the book and should be disregarded. The second edition addresses every issue that I (and others) considered a shortcoming concerning the implementation of the equals method from the original edition. If you are a Java programmer, you should probably stop reading this and just read the second edition of Josh's book instead.

Second update: There's also an excellent and detailed article online on Jave equality by Martin Odersky, Lex Spoon, and Bill Venners: How to Write an Equality Method in Java.

Many beginning Java programmers get confused about the difference between the equals operator (==) and the equals method. This confusion can probably be traced back to the fact that ==, like so many things in Java, works differently with primitives than it does with object references. The == operator in Java works with primitives by comparing their values, but it works with objects by comparing their references. This means that == will always work with primitives, but it will rarely work as expected with objects. Remember that references in Java point to a location on the memory heap. Two references that point to the same object are equal, but two references that point to two different objects are never equal, even if they hold the same value.

Another source of confusion may be from the default implementation of the equals method itself. Every class in Java inherits a default equals implementation from the Object class. Object's equals method simply compares references, so the default implementation gives the exact same behavior as the == operator. That's why it's a good idea to override the equals method in any classes you create that need more than the default behavior.

Unfortunately, implementing equals is harder than it at first may seem. Many beginning developers (and even a few experienced ones) naively cast to the correct type and compare significant fields without giving due consideration to the general equals contract. Joshua Bloch spends ten worthwhile pages on the subject in his book Effective Java, and some would argue that he still doesn't get it exactly right. In addition to the properties laid out in the general equals contract, there are also a few object design decisions that need to be taken into account to insure correct equals behavior. I'll try to explain all of these issues.

By general contract, the equals() method in Java must be reflexive, symmetric, transitive, consistent, and any non-null reference must return false. In other words, for arbitrary values of a, b, and c, the following tests must always pass:

// reflexive property
assertTrue( a.equals(a) );

// symmetric property
assertTrue( a.equals(b) == b.equals(a) );

// transitive property
if ( a.equals(b) && b.equals(c) ) {
assertTrue( a.equals(c) );
}

// consistency property
assertTrue( a.equals(b) == a.equals(b) );

// non-null property
assertFalse( a.equals(null) );

It would be difficult to get reflexivity and consistency wrong without trying. A very simple test for null is all that's needed to make sure your equals method returns false when a null parameter is passed. The symmetry and transitivity properties, on the other hand, are difficult to get right without giving some serious thought to each.

For example, both Java Practices and Joshua Bloch recommend the following steps for implementing equals:

1. Use this == that to check reference equality
2. Use instanceof to test for correct argument type
3. Cast the argument to the correct type
4. Compare significant fields for equality

Given this template, the equals method for a simple 2-dimension Point class might look like the following:

class Point {
private int x;
private int y;

...

public boolean equals(Object obj) {
// check for reference equality
if(this == obj) return true;

// type check
if( !(obj instanceof Point) ) return false;

// cast to correct type
Point p = (Point)obj;

// compare significant fields
return (this.x == p.x && this.y == p.y);
}
}

Step one fulfills the reflexivity requirement of the equals contract. It also serves as an optimizing step. If the parameter refers to this object, obviously they are equal and you don't need to waste any more time comparing them. This saves time particularly when the comparison (step four) is costly.

Step two uses instanceof to check that the argument is of an acceptable type. This is usually the same type as this object, but could be a super type or interface. Consider comparing two Collections to each other. You may wish to consider two Lists equal if they have the same elements, regardless of whether one is a LinkedList and the other an ArrayList. Simply implementing the List interface should be enough to correctly pass this step, then safely cast to the same type in step three. Notice also that this method doesn't explicitly check to see if the argument is null. That's because there's an implicit check in step two. The instanceof operator will return false if its second argument is null.

It's not hard to prove that this implementation of equals satisfies the symmetric and transitive properties of the equals contract. A few simple test cases should show that for any two Point objects, if x and y are equal, the equals method will return equal, and if x or y are not equal it will return false. But there is a very subtle bug lurking in this implementation. You have to ask yourself, "What happens if I extend this class by adding a significant field?" By "significant field", I mean a field that impacts equality. One that should be taken in to account by the equals method.

Let's look at an example. Say I want to extend this class to make a 3-dimensional Point by adding a z dimension. In this case the Point3D equals method needs only to call the Point equals method and add its own comparison of the z-dimension instance variable.

class Point3D extends Point {
private int z;

...

public boolean equals(Object obj) {
if(!(obj instanceof Point3D)) return false;
Point3D p3d = (Point3D)obj;
return super.equals(obj) && this.z == p3d.z;
}
}

This implementation works for two Point3D objects the same way the Point equals method works for two Point objects. The problem is that it breaks the symmetric property of the equals contract when you mix objects of the two types. Consider the following:

Point p = new Point(4, 2);
Point3D p3d = new Point(4, 2, 3);
p.equals(p3d); // returns true
p3d.equals(p); // return false

The problem is that Point3D instanceof Point returns true, while Point instanceof Point3D returns false, violating the symmetric property.

To his credit, Joshua Bloch goes on in his book to explain a technique using composition that offers a solid workaround to the problems inherent in using instanceof. There is another popular approach to implementing equals that also circumvents these problems.

1. Test for a null argument
2. Use if (getClass() != obj.getClass()) to test for correct argument type
3. Cast the argument to the correct type
4. Compare significant fields for equality

Using this slightly different template the equals method for our simple Point class now becomes:

public boolean equals(Object obj) {
// check for null reference
if(this == null) return false;

// type check
if(this.getClass() != obj.getClass()) return false;

// cast to correct type
Point p = (Point)obj;

// compare significant fields
return (this.x == p.x && this.y == p.y);
}

As you can see, the two approaches only differ in the first two steps. First, we replace the optimizing step with a test for null. The null test is required because in the second step we're no longer using instanceof, it's been replaced with a call to getClass. This fixes the problem with symmetry that we experienced in the first approach because the getClass method will always return false if the parameter is not the exact same type as the object class. Super class or subclass parameters return false, preserving the symmetric property. Be aware that this can lead to surprising behavior when two objects are equal in all significant fields, but are considered unequal because they aren't the same type (refer back to the LinkedList, ArrayList example I gave earlier).

So which of these two approaches to implementing equals should be preferred? The questions to ask when designing your class is whether you're designing it to be extended or not, and whether it is important for classes of different types (but the same interface) to adhere strictly to the equals contract. In most cases strict adherence to the equals contract should be preferred, and so the getClass approach should be used. The only times you should consider using instanceof in your equals methods are when objects of a subclass must be comparable to objects of their base class type, or if two objects that implement a common interface must be considered equal based on their state rather than just their type. In both these cases, be well aware of what properties of the general equals contract you may be violating. It's also a good idea to thoroughly document these points in your code and in your API documentation.

2 comments:

Anonymous said...

There is a closing bracket missing. It should be

// type check
if(!(obj instanceof Point)) return false;

Bill the Lizard said...

Anon,
Thanks for letting me know about that error. I fixed my mistake, and while I was in there editing the post anyway, I added a link to an excellent article on writing an equality method in Java. See the second paragraph for the link.