Tuesday, August 26, 2008

Trim Tokens

When working with a tokenized string, like those found in comma-separated value (CSV) files, it's common to encounter this problem. What do you do when someone edits the file by hand, and inserts extra space characters to make the file more human-readable? Add a space to the token? That's no good because then all of your files have to match the "new and improved" format.

For example, take the following snippet of CSV file.

username, password,home folder,default editor, favorite color,web address

Notice that the delimiter is inconsistent in this example. Sometimes I have tokens separated with a single comma (",") and sometimes with a comma followed by a space (", "). If the delimiter were consistent it would be a simple matter to get all tokens with the following code:

String[] tokens = line.split(",");

First of all, let's note how much simpler this is than the old method of using a StringTokenizer to loop through the text scanning for more tokens. String's split method added in the Java 1.4 release is a great improvement. The second thing you should note is that the split method accepts one argument, and that argument is a regular expression. This should be a clue to solving our CSV formatting problem. How do we specify that the delimiter in our file is a comma that might sometimes be followed by a space? By harnessing the power of regular expressions. (Ok, that's overstating the case by quite a bit. We're really only harnessing a tiny fraction of the power of regular expressions.)

String[] tokens = line.split(",\\s*");

That says split the line on any comma followed by zero or more spaces (check out Mastering Regular Expressions for an in-depth guide to regular expression syntax). This single line of code should have the desired effect of splitting the line into the following array.

username
password
home folder
default editor
favorite color
web address

Try it out by creating a simple class that reads a single line of comma-delimited text and prints out the tokens. If you really want to see what an improvement String's split method brings, try writing the same function using a StringTokenizer instead.

Monday, August 18, 2008

On "Quantity Always Trumps Quality"

Jeff Atwood of Coding Horror fame recently posted an article titled Quantity Always Trumps Quality, the main point of which is that endless designing and theorizing is a waste of time, and that that time could be better spent building something. The idea is that the time you spend failing early on is time spent learning to do whatever it is you're attempting to do the right way. That's a benefit you don't get by spending time designing the wrong thing. While this idea isn't new or original it is still excellent advice, and Jeff shows how it applies to a wide range of disciplines, from making clay pots to software development and blog writing.

One reason that I like this advice so much is that it teaches us not to be afraid of making mistakes. Not only that, we can learn to embrace our mistakes because that is, after all, how we learn. I've often felt that if something is stagnant (a software project, your company's sales, a personal relationship), doing any random thing to get out of stagnation can be better than doing nothing at all. Once you make a random change you can analyze the results and see what you need to correct. If you go into the change with the mindset that it's only purpose is to teach you what you really should have done, it makes it a lot easier to bear when you make a mistake.

Saturday, August 9, 2008

Stop a thread that's accepting connections

A common approach to implementing a server is to set up its main loop to accept connections from clients, then create separate threads to handle each connection.

But how do you shut down a thread that's blocked in the wait state? It's not enough to simply set the running flag to false, breaking it out of its main loop, as that would leave the server waiting for one last connection before it lets go of its resources and shuts down. The following code shows you how.

public class SimpleServer extends Thread {

private volatile boolean running = false;
ServerSocket serverSock = null;
private int serverPort = 8080;

public SimpleServer (int serverPort) {
this.serverPort = serverPort;
}

/** Begin listening for connections from subscribers. */
public void run() {
System.out.println( "Simple Server started." );

try {
serverSock = new ServerSocket( serverPort );
} catch ( IOException e ) {
System.err.println( e.getMessage() );
}

running = true;
// main loop of the thread
// listens for new subscribers
while( running ) {
Socket sock = null;
try {
// blocks while waiting
sock = serverSock.accept();

// handle the new socket connection
}
catch ( IOException ioe ) {
System.err.println( ioe.getMessage() );
}
finally {
try {
sock.close();
}
catch ( IOException ioe ) {
System.err.println( ioe.getMessage() );
}
}
}
}

/** Check the running flag. */
public boolean isRunning() {
return running;
}

/** Stop the server listen thread. */
public void shutDown() {
running = false;
try {
// open a connection to the server socket,
// because it is blocking on accept
new Socket( serverSock.getInetAddress(),
serverSock.getLocalPort() ).close();
System.out.println("Server shut down normally.");
}
catch(IOException ioe) {
System.err.println( ioe.getMessage() );
}
}
}

The interesting technique here can be found in the shutDown method. After setting the running flag to false we open one more socket connection on the listening port, then immediately close it. This allows the server application to terminate immediately and gracefully.

Thursday, August 7, 2008

Avoid NPE when comparing strings

By now everyone knows that you should use the equals method when comparing two objects in Java (as opposed to using the == operator). Strings are no different. But what happens when you compare a null reference to a string literal, like so:

String myRef = null;
if ( myRef.equals("literal") ) {
// do something
}

If you run this code you'll see you get a Null Pointer Exception for calling a method on a null reference. I've seen some developers get around this problem by checking for null before calling equals.

String myRef = null;
if ( myRef != null && myRef.equals("literal") ) {
// do something
}

This cleverly uses the short-circuit property of the && operator. If the first part of the condition is false the second part won't be evaluated, avoiding the NPE. Like most clever code, this is not the best way to go about it. I learned a better way when reviewing a colleague's code. Just take the original condition and switch it around so you call equals on the literal, and pass the (possibly null) reference as a parameter.

String myRef = null;
if ( "literal".equals(myRef) ) {
// do something
}

This avoids the NPE because a string literal can never be null. If the reference is null, the condition returns false, as expected. This solution is also a little bit shorter and clearer than the first one. The lesson to be learned here is that if you have to resort to "clever" tricks to get something done, with just a little bit of lateral thinking you can probably find a cleaner, simpler way.

Sunday, August 3, 2008

Implementing equals in Java

Update: I made reference to Josh Bloch's Effective Java in this article. Those references are to the first edition of the book and should be disregarded. The second edition addresses every issue that I (and others) considered a shortcoming concerning the implementation of the equals method from the original edition. If you are a Java programmer, you should probably stop reading this and just read the second edition of Josh's book instead.

Second update: There's also an excellent and detailed article online on Jave equality by Martin Odersky, Lex Spoon, and Bill Venners: How to Write an Equality Method in Java.

Many beginning Java programmers get confused about the difference between the equals operator (==) and the equals method. This confusion can probably be traced back to the fact that ==, like so many things in Java, works differently with primitives than it does with object references. The == operator in Java works with primitives by comparing their values, but it works with objects by comparing their references. This means that == will always work with primitives, but it will rarely work as expected with objects. Remember that references in Java point to a location on the memory heap. Two references that point to the same object are equal, but two references that point to two different objects are never equal, even if they hold the same value.

Another source of confusion may be from the default implementation of the equals method itself. Every class in Java inherits a default equals implementation from the Object class. Object's equals method simply compares references, so the default implementation gives the exact same behavior as the == operator. That's why it's a good idea to override the equals method in any classes you create that need more than the default behavior.

Unfortunately, implementing equals is harder than it at first may seem. Many beginning developers (and even a few experienced ones) naively cast to the correct type and compare significant fields without giving due consideration to the general equals contract. Joshua Bloch spends ten worthwhile pages on the subject in his book Effective Java, and some would argue that he still doesn't get it exactly right. In addition to the properties laid out in the general equals contract, there are also a few object design decisions that need to be taken into account to insure correct equals behavior. I'll try to explain all of these issues.

By general contract, the equals() method in Java must be reflexive, symmetric, transitive, consistent, and any non-null reference must return false. In other words, for arbitrary values of a, b, and c, the following tests must always pass:

// reflexive property
assertTrue( a.equals(a) );

// symmetric property
assertTrue( a.equals(b) == b.equals(a) );

// transitive property
if ( a.equals(b) && b.equals(c) ) {
assertTrue( a.equals(c) );
}

// consistency property
assertTrue( a.equals(b) == a.equals(b) );

// non-null property
assertFalse( a.equals(null) );

It would be difficult to get reflexivity and consistency wrong without trying. A very simple test for null is all that's needed to make sure your equals method returns false when a null parameter is passed. The symmetry and transitivity properties, on the other hand, are difficult to get right without giving some serious thought to each.

For example, both Java Practices and Joshua Bloch recommend the following steps for implementing equals:

1. Use this == that to check reference equality
2. Use instanceof to test for correct argument type
3. Cast the argument to the correct type
4. Compare significant fields for equality

Given this template, the equals method for a simple 2-dimension Point class might look like the following:

class Point {
private int x;
private int y;

...

public boolean equals(Object obj) {
// check for reference equality
if(this == obj) return true;

// type check
if( !(obj instanceof Point) ) return false;

// cast to correct type
Point p = (Point)obj;

// compare significant fields
return (this.x == p.x && this.y == p.y);
}
}

Step one fulfills the reflexivity requirement of the equals contract. It also serves as an optimizing step. If the parameter refers to this object, obviously they are equal and you don't need to waste any more time comparing them. This saves time particularly when the comparison (step four) is costly.

Step two uses instanceof to check that the argument is of an acceptable type. This is usually the same type as this object, but could be a super type or interface. Consider comparing two Collections to each other. You may wish to consider two Lists equal if they have the same elements, regardless of whether one is a LinkedList and the other an ArrayList. Simply implementing the List interface should be enough to correctly pass this step, then safely cast to the same type in step three. Notice also that this method doesn't explicitly check to see if the argument is null. That's because there's an implicit check in step two. The instanceof operator will return false if its second argument is null.

It's not hard to prove that this implementation of equals satisfies the symmetric and transitive properties of the equals contract. A few simple test cases should show that for any two Point objects, if x and y are equal, the equals method will return equal, and if x or y are not equal it will return false. But there is a very subtle bug lurking in this implementation. You have to ask yourself, "What happens if I extend this class by adding a significant field?" By "significant field", I mean a field that impacts equality. One that should be taken in to account by the equals method.

Let's look at an example. Say I want to extend this class to make a 3-dimensional Point by adding a z dimension. In this case the Point3D equals method needs only to call the Point equals method and add its own comparison of the z-dimension instance variable.

class Point3D extends Point {
private int z;

...

public boolean equals(Object obj) {
if(!(obj instanceof Point3D)) return false;
Point3D p3d = (Point3D)obj;
return super.equals(obj) && this.z == p3d.z;
}
}

This implementation works for two Point3D objects the same way the Point equals method works for two Point objects. The problem is that it breaks the symmetric property of the equals contract when you mix objects of the two types. Consider the following:

Point p = new Point(4, 2);
Point3D p3d = new Point(4, 2, 3);
p.equals(p3d); // returns true
p3d.equals(p); // return false

The problem is that Point3D instanceof Point returns true, while Point instanceof Point3D returns false, violating the symmetric property.

To his credit, Joshua Bloch goes on in his book to explain a technique using composition that offers a solid workaround to the problems inherent in using instanceof. There is another popular approach to implementing equals that also circumvents these problems.

1. Test for a null argument
2. Use if (getClass() != obj.getClass()) to test for correct argument type
3. Cast the argument to the correct type
4. Compare significant fields for equality

Using this slightly different template the equals method for our simple Point class now becomes:

public boolean equals(Object obj) {
// check for null reference
if(this == null) return false;

// type check
if(this.getClass() != obj.getClass()) return false;

// cast to correct type
Point p = (Point)obj;

// compare significant fields
return (this.x == p.x && this.y == p.y);
}

As you can see, the two approaches only differ in the first two steps. First, we replace the optimizing step with a test for null. The null test is required because in the second step we're no longer using instanceof, it's been replaced with a call to getClass. This fixes the problem with symmetry that we experienced in the first approach because the getClass method will always return false if the parameter is not the exact same type as the object class. Super class or subclass parameters return false, preserving the symmetric property. Be aware that this can lead to surprising behavior when two objects are equal in all significant fields, but are considered unequal because they aren't the same type (refer back to the LinkedList, ArrayList example I gave earlier).

So which of these two approaches to implementing equals should be preferred? The questions to ask when designing your class is whether you're designing it to be extended or not, and whether it is important for classes of different types (but the same interface) to adhere strictly to the equals contract. In most cases strict adherence to the equals contract should be preferred, and so the getClass approach should be used. The only times you should consider using instanceof in your equals methods are when objects of a subclass must be comparable to objects of their base class type, or if two objects that implement a common interface must be considered equal based on their state rather than just their type. In both these cases, be well aware of what properties of the general equals contract you may be violating. It's also a good idea to thoroughly document these points in your code and in your API documentation.