Saturday, February 27, 2010

"Goes to" Considered Harmful

A few months ago the following question was asked (half jokingly) on Stack Overflow,
What is the name of this operator: “-->”?
The question includes the following C++ code:
#include <stdio.h>
int main()
{
int x = 10;
while( x --> 0 ) // x goes to 0
{
printf("%d ", x);
}
}
The output of this program is to simply count down from 9 to 0, printing each value. As many others pointed out, "-->" isn't a single operator at all, but two operators smashed together. The condition of the while loop above can be rewritten with proper spacing as:
while( x-- > 0 ) // x-- is greater than  0
{
...
}
Joshua Bloch commented on the goes-to operator on Twitter yesterday.



He links to the results of a Google Code search for instances of "-->" in the wild. The results are specific to the C programming language, but "-->" turns up in similar searches for C++ and Java as well (and probably any other language that allows it).

What's the harm?

So what's so harmful about this bit of cleverness? The main problem is that it's too clever. Any time you use standard operators in a non-standard way (even if the language specification doesn't strictly forbid it) you're negatively impacting the readability of your code. In this specific case, the "-->" operator isn't documented anywhere. It's just not reasonable to expect other programmers to know what it means.

On top of that, this situation isn't best handled by a while loop to begin with. If you know ahead of time how many iterations your loop needs to make, use a for loop. Kernighan and Richie covered this a very long time ago in The C Programming Language. It's still true today.
The for is preferable when there is a simple initialization and increment, since it keeps the loop control statements close together and visible at the top of the loop.
This, of course, is also true for a simple decrementing loop. The meaning of the following code should be obvious to any junior programmer.
int i;
for( i = 9; i >= 0; i-- )
{
printf("%d ", i);
}

Writing code that's "too clever" for other programmers to understand immediately just obfuscates your code needlessly. You'll seem much cleverer to your colleagues if you write your code in a clear and readable style to begin with. Besides that, using the common idioms of your language is probably the quickest way to reduce your WTFs/minute score at your next code review.

27 comments:

THK said...

Wow, I have never seen that kind of ``clever code'' before, and I was totally fooled till I read the following explanation of your article. Yes, straightforward style actually save much of our time. Too ``clever'' style sometimes damage the readability and it's not clever at all.

Tim Kington said...

I disagree. Comes from is awesome. The more operators, the better, in my opinion. Isn't that what C++ is all about?

Besides, clever = job security :)

Bill the Lizard said...

Hiankun,
I didn't get it at first either. I was actually pretty surprised to find that Java even allows this. I could have sworn that the JLS didn't allow the "--" operator with spaces on both sides, but it does. "-->" might save a few keystrokes, but it's not worth it in the long run.

Bill the Lizard said...

Tim,
When you say "comes from" are you talking about "while (0 <-- x)", or is it something else? I remember you told me a long time ago about some esoteric language with strange syntax, but I couldn't find a reference.

I'm sure you don't need clever tricks to have a sense of job security. :)

Galilyou said...

Now that sucks. People should strive to make their code as readable as it could be. You write code for people to read not for machines to interpret, that's the priority. For instance I've seen this statement in a c++ app x = x+++++x;

Tim Kington said...

Oops! I meant "goes to". "Come from" is even better, though. It's part of INTERCAL.
http://en.wikipedia.org/wiki/INTERCAL

Bill the Lizard said...

Galilyou,
That's pretty awful. It won't even compile in Java unless you add some spaces.

x = x++ + ++x;

Even with proper spacing it's not entirely clear what this code does. I had to run it several times to be sure it was the same as:

x = 2 * (x + 1);

Even if this was a matter of optimization, I would prefer bit shifting:

x = ++x << 1;

At least most programmers can be expected to remember that shift-left 1 is the same as multiplying by 2. I'd still comment the last line saying why it was written like this, instead of the obvious way.

Thanks for another great example to pick apart. :)

Bill the Lizard said...

Tim,
That's the one! COMEFROM is just mind bending at first, but there's a good explanation here: http://en.wikipedia.org/wiki/COMEFROM

It turns out to work exactly like GOTO. Only the syntax is the opposite. Temporal law is not violated. :)

Anonymous said...

The vast majority of Google results for "-->" in C, C++, or Java do not appear to playing clever, contrary to what the original post on twitter and this post would suggest.

The results do show a lot of simple ASCII art (in comments and in string literals), code dealing with XML comments (when end with -->), and omitted spaces ("if (length-->0)").

Bill the Lizard said...

Anonymous,
The phrases "some people use it in production code" and "but "-->" turns up in similar searches for C++ and Java" (emphasis added) do not suggest anything about the proportion of "-->" used as an operator vs. ASCII art. Neither I nor Bloch implied that it was the majority. All three of the searches turn up instances of "-->" being used as an operator on the first page of results.

Anonymous said...

Bill, the only uses of --> on the first page of C are from SQLite, and on those very same lines similarly eliminates spaces around != to give "if ( cnt-->0 && .. )". The first pages of C++ and Java results share "while (n-->0)", and C++ has another "if (length-->0)".

I see absolutely no evidence people are using --> as a special operator, as both you and the other post implied. It's as if you saw code such as "if ((a!=b)&&(c!=d))" and exclaimed people are using a new ")&&(" operator.

Bill the Lizard said...

Anonymous,
If you don't see the evidence then it must not exist.

Anonymous said...

You are supporting the claim that "some people use [the --> operator] in production code", so you are the one that should be providing evidence that it is used. Is there a problem with my pointing out that the support provided so far (the 3 Google code searches) is incredibly weak?

The advice about not being clever is good and something I find myself repeating often, but you don't have to go ghost hunting to find reasons to tell people about it.

Dan said...

Are you sure the behaviour of x++ + ++x is defined?

I believe the standard states something like "The order of operations of subexpressions within an expression is undefined. In particular, you cannot assume that the expression is evaluated left to right.", which would make this codes behaviour undefined.

It could be parsed either as (x+1)++ + (x+1) or as (x)++ + (x+1) Possibly even as x + ++(x+1) if the x++ increments x before the ++x is evaluated.

Bill the Lizard said...

Anonymous,
I think you're missing the point. This post isn't just about "-->" and specific instances of it in the wild. It's about all of the silly ways that people obfuscate their code in the name of brevity, or cleverness, or whatever. The "goes-to" operator is just one example of that. See Galilyou's comment above for another.

Besides that point, I have provided plenty of evidence that "-->" does get used, you've just refused to look at it. Look beyond the first page of results in the first link provided.

Beyond that, what else can I tell you? The code that gets indexed by Google isn't all of the code in the world. Showing results in Google Code Search is only an existence proof. It's been demonstrated that "-->" does get used. But like I said, that's just one example of the larger problem.

Dan said...

Actually, x = x++ + ++x; modifies x three timeswithin one sequence point and is definitely undefined.

Bill the Lizard said...

Dan,
This is an excellent point. In Java the evaluation order is specified as left-to-right. I'm not at all sure about C and C++, though. If it's not specified then it's compiler implementation specific, so that expression might work as expected or not based on what compiler you use.

Dan said...

I meant in C/C++ - I guess in Java, this is defined. In C and C++ I'm fairly sure its undefined and compiler specific though.

Anonymous said...

The only proof you've given is that people sloppily eliminate spaces (according to popular coding conventions) when using postfix-decrement and greater-than. Jumping from that to "they are using some weird --> operator on purpose" is a stretch. Beyond that, I don't know what to tell you, though I guess I didn't say it clearly enough before.

Bill the Lizard said...

Dan,
I do remember for certain that the evaluation order of function arguments is undefined in C and C++. I'm not sure about subexpression evaluation order, though.

Dan said...

Ok, I looked it up :-P

I'm not sure about C - I imagine its the same - but the ISO C++ spec states in section 5.4 that its undefined: "Except where noted, the order of evaluation of operands of individual operators and subexpressions of individual expressions, and the order in which side effects take place, is unspecified. Between the previous and next sequence point a scalar object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be accessed only to determine the value to be stored. The requirements of this paragraph shall be met for each allowable ordering of the subexpressions of a full expression; otherwise the behavior is undefined."

Bill the Lizard said...

Dan,
I just asked on Stack Overflow and someone pointed out the same section. You're definitely right, that's unspecified in C and C++.

jokergirl@wererabbits said...

I disagree - it's completely legal code, used completely legally, and I don't see the reason to call it an "operator" at all. Except for the weird spacing, it's something I would expect every programmer to understand immediately, and it is a good way of distinguishing between newbs and people who can actually program.

Bill the Lizard said...

jokergirl,
People who can actually program write their code in the clearest possible way so they don't have to spend time explaining it to the newbs. Just because the language specification doesn't explicitly forbid it, doesn't mean it's a good idea.

Johannes Rössel said...

Bill:

On the topic of

x=x++ + ++x;
x = 2 * (x + 1);
x = ++x << 1;

You stated you'd prefer the latter for optimization. I think you know full well Knuth's saying on premature optimization :-)

In fact, at least for my compiler (MSVC 9) all three lines above compile down to the exact same assembler code:

lea eax, DWORD PTR [eax+eax+2]

So I'd say for the sake of readability (and not being too clever), the x = 2 * (x + 1) variant should probably be preferred. Or 2 * x + 2 which is what the assembler code says.

Bill the Lizard said...

Johannes,
Great catch on the compiler optimization of x = 2 * (x + 1); and
x = ++x << 1; That's a perfect example of exactly why you shouldn't prematurely optimize. A lot of the time it's for absolutely no gain, and if you really don't know what you're doing, you could even make things worse!

Anonymous said...

Closed discussion as not constructive