For example, take the following snippet of CSV file.
username, password,home folder,default editor, favorite color,web address
Notice that the delimiter is inconsistent in this example. Sometimes I have tokens separated with a single comma (",") and sometimes with a comma followed by a space (", "). If the delimiter were consistent it would be a simple matter to get all tokens with the following code:
String[] tokens = line.split(",");
First of all, let's note how much simpler this is than the old method of using a StringTokenizer to loop through the text scanning for more tokens. String's split method added in the Java 1.4 release is a great improvement. The second thing you should note is that the split method accepts one argument, and that argument is a regular expression. This should be a clue to solving our CSV formatting problem. How do we specify that the delimiter in our file is a comma that might sometimes be followed by a space? By harnessing the power of regular expressions. (Ok, that's overstating the case by quite a bit. We're really only harnessing a tiny fraction of the power of regular expressions.)
String[] tokens = line.split(",\\s*");
That says split the line on any comma followed by zero or more spaces (check out Mastering Regular Expressions for an in-depth guide to regular expression syntax). This single line of code should have the desired effect of splitting the line into the following array.
username
password
home folder
default editor
favorite color
web address
Try it out by creating a simple class that reads a single line of comma-delimited text and prints out the tokens. If you really want to see what an improvement String's split method brings, try writing the same function using a StringTokenizer instead.
No comments:
Post a Comment