Bill the Lizard: December 2008

Monday, December 29, 2008

Education vs. Experience

Joel Spolsky set off a minor flame war with a comment on his discussion group yesterday. It started off as a simple reply to Bob, who is thinking of leaving the software industry, but it then spilled over to reddit, where things can quickly get out of hand and off-topic.

For those of you who would like a summary, Joel takes the position that anyone who thinks they should leave the software industry, probably should. I tend to agree with this, but it's not what I want to write about in this post, partly because I don't want to be accused of being a Joel fanboy (even though the fact that I went to a Java school should immunize me from such an accusation), and partly because I'm much more interested in one of the side discussions that flared up.

The side discussion that I'm interested in is probably one you've heard before. It's about the merits of going to school and getting a CS degree vs. being a self-taught "hacker". The reason this discussion so fascinates me is that I've been on both sides of it, and I feel like I still understand both sides.

The argument against getting a CS degree always starts out reasonable enough. Here's a quote from the reddit comments:

In my view, a computer science degree doesn't predict whether a person is a good programmer or not.

This may be true, but the point is that it's a better indicator than not having a CS degree. Think of all the people you know without a CS degree. How many of them are great programmers? Maybe a few, but it's going to be a tiny proportion. Now how about those with CS degrees? Still only a few of them are probably great programmers, but the proportion is going to be much higher.

The same commenter goes on to say:

A lot of people with Computer Science degrees have a tremendously hard time realising for themselves that the degree they've got is probably worthless. There's some serious cognitive dissonance there.

Maybe that's because a CS degree isn't worthless? I understand that there may be a bias, but if most people who get a CS degree think that it was worth it, how can someone without a CS degree disagree? On what can they base their argument? Their own lack of a CS degree? If you haven't gotten a degree, then you can't know what it's worth.

The commenter then says:

In short, a degree leads a candidate to think they actually know something in much more depth than they actually do. In pretty much any area of computer science you can understand the subject to a much higher standard with a week of personal study than they achieved with three years at a university.

This is patently false. The little bit of knowledge you gain from personal study is much more likely to lead to second-order incompetence. The one thing that I learned better than any other when I went back to school to get a CS degree is how much more there was for me to learn. It didn't lead me to an inflated sense of my own knowledge, it led me to understand how truly ignorant I was.

A different reddit commenter had this to say:

I don't have a computer science degree, and I believe a degree generally shows the following:

You decided to recognize how the system works
You can stick with something for four years
You know how to look up an answer

That's pretty much it.

I agree with the first two points. I went to school with plenty of people who had years of experience and were only in a CS program in order to "get a piece of paper" corroborating their knowledge. They went back to school only to prove they could stick it out for four years, and they had recognized that they needed a degree to get a much deserved promotion. But I always felt that these people were cheating themselves by just showing up to get the degree. Many of them didn't apply themselves as hard as the could, and if they had given it a chance they may have realized, as I did, that there was a lot more to gain than just the "piece of paper".

I think the whole argument boils down to this: Every person with a CS degree used to be a person without a CS degree. If most of us agree that we're better off after having gotten a degree, then how can those without CS degrees be unconvinced? They haven't experienced the argument from both sides, so are by definition in an inferior position to argue.

I don't think that the argument should be that one programmer with a CS degree is better than another programmer without a degree. The real argument to make is that a programmer with a degree is better than he was before he got it. If you don't have a degree, you just don't know how much better you could be if you put in the time and the effort to earn one.

Sunday, December 28, 2008

Should I Learn C or C++?

The question comes up quite often, "Should I bother to learn C, or go straight to C++?" Many beginning programmers wonder if it's worth it to spend the time learning C, when they know a more advanced language is readily available. Others wonder if they're going to be missing out on anything important if they skip C, and proceed directly to C++ without passing Go or collecting $200.

While it is true that well-written C code will normally compile under C++, C is not a proper subset of C++. There are a few differences that can make a valid C program invalid in C++. The easiest examples to illustrate involve the addition of keywords such as new and class to C++, which were not reserved words in C. The following (contrived) valid C program will not compile as C++.

int main(void) {
 int class, new; // both class and new are C++ keywords
 printf("Enter two integers > ");
 scanf("%d %d", &class, &new);
 printf("The two numbers are: %d  %d\n", class, new);
 printf("Their sum is %d\n", class + new);
}

Another difference that's often cited between C and C++ is that C supplies an implicit cast when a void pointer is assigned to a pointer of a specific type, while C++ requires an explicit cast. The following valid C code will not compile using C++:

void* ptr;
int *i = ptr;

In order to make this code valid C++, an explicit cast to an int pointer must be supplied:

void* ptr;
int *i = (int *) ptr;

While it's important to understand that C and C++ are really two separate languages, it's just as important to understand that the parts of C that aren't valid C++ are extreme edge cases. C++ was originally intended to be just an extension of C (Stroustrup started out calling it C with Classes), so an effort was made to ensure that valid C syntax was broken in only a very few places. As Scott Meyer points out in Effective C++, C++ is really a federation of related programming languages: C, Object-Oriented C++, Template C++, and the STL. Almost all valid C programs will compile as C++, with very little, or often no changes necessary.

Most programmers who are deliberating between learning either C or C++ should probably skip C and learn C++. Start out with a book like C++ Primer and you will learn good programming style, not only in the C subset, but in all of the parts of the C++ language. Unless you plan on doing some work on the Linux kernel or another project that you know uses C, the only thing you will be really missing by not learning C first is Kernighan & Ritchie's C Programming Language. You can always go back and read K&R after you've taught yourself good C++ style and habits.

Further Reading

Wikipedia, Compatibility of C and C++.

Monday, December 22, 2008

Even More Free Programming Books

In a previous post I listed a few programming books that I found available for free online. Since then, I've been scouring the net looking for even more free programming books. Here's what I've found. (Note: As before, I haven't read all of these, so don't take this as a list of informed endorsements. Besides, they're free. You can read them yourself.)

It just occurred to me how eclectic this list is. It's not every day that you see books on Lisp, C#, and PHP collected in the same list. The Internet is as wide as it is wonderful. :)

Additional Note: As some alert readers over on reddit noticed, the online version of Programming Pearls is incomplete, so I've decided to replace it on my list. Sorry I didn't catch this sooner, but you can still enjoy the 3 sample chapters that the author has graciously made available for free.

And here, for those of you who don't read blog comments, are a few bonus picks submitted by readers of my previous post.

As always, I'm interested in hearing from you about even more free online programming material, whether they be in the form of books, tutorials, video lectures, etc...

Saturday, December 20, 2008

The Monty Hall Problem

The Monty Hall Problem is a probability puzzle based on a game that contestants played on the popular old TV game show Let's Make a Deal, hosted by Monty Hall. On the show, contestants were given a choice of three doors. Behind one door is a car, behind each of the other two, goats. The contestant got to choose a door, and won whatever was revealed to be behind that door. The twist was that after the contestant had selected a door, Monty would show them what was behind one of the two doors not selected, and give them the option of changing their mind.*

The game was deceptively simple, in that it seemed as though the contestant always had one chance in three of winning the car, whether they switched their choice or not. There was a strategy that would double the odds of winning though. It turns out that the winning strategy is as simple as the game.

All a contestant had to do to double their odds of winning was always switch their guess after Monty opened one of the doors. This simple strategy worked because the contestant had only one chance in three of guessing the right door initially. Once they had made their guess, Monty would open one of the other doors that did not conceal the car. Since the player's initial guess was wrong 2/3 of the time, Monty was opening the only other losing door 2/3 of the time (the car was behind one door, so he wouldn't open that).

This strategy isn't intuitive to a lot of people. In fact, when the strategy was first published in Parade magazine, thousands of people wrote in to claim the solution was wrong. It becomes more clear if you look at a bigger variation of the same problem.

Imagine you're on a game show where there are 100 doors. Behind one of them is a new car, while the other 99 conceal goats. Your odds of selecting the right door are 1 in 100. Now imagine that after you've made your selection, the host of the show opens 98 of the doors, revealing 98 of the goats. (Keep in mind that he doesn't randomly open doors. He knows where all the goats are.) Would you switch your original guess to the one remaining door? You only had a 1% chance of winning with your original guess, so does it seem advantageous to switch after the choices are narrowed down to only two?

With a bigger set of doors to choose from, it becomes much easier to see that you have a distinct advantage when you switch from your initial guess. The same holds true for the original problem. By always switching after one door was opened in the original game, the odds of winning were improved from 1/3 to 2/3. That's certainly not a sure thing, but it's not bad odds on a brand new car.

Additional Note: If you want to try out an online interactive version of the original game, you can play it here. Make sure you give yourself enough trials to confirm that the optimal strategy wins 2/3 of the time.

* It turns out, Monty Hall never offered to let contestants switch which door (or curtain) they picked on Let's Make a Deal. Instead, he would offer them cash to opt out of the game entirely. The problem that would be known as the Monty Hall Problem was originally published as the Game Show Problem by columnist Marilyn vos Savant in 1990.

Thursday, December 18, 2008

Hints or Solutions?

There was a mild debate on Stack Overflow today regarding the posting of solutions to the puzzles on other sites like Project Euler. The debate was started by this question. Several people disagreed with the questioner, but I'm on the fence.

I love sites like Project Euler, Top Coder, and the Python Challenge for the puzzles they provide. At least 90% of the enjoyment I get from them is in solving the problems myself, with no outside help. I have to admit though, that at least a small part of the enjoyment is in competing with other people who are trying to solve the same puzzles.

I don't mind at all when people give hints (especially the non-programming hints that seem to be required to advance to the next level of the Python Challenge). I do get a little bit annoyed, though, when I see people posting entire solutions to the puzzles from other sites. Like it or not, competition is a component of the enjoyment that people get from programming puzzle sites like these. Having others get full solutions to the puzzles removes that part of the challenge.

I know that there's little that the administrators over at Stack Overflow can do about people posting solutions to puzzles on other sites. People can easily rephrase a question so that it isn't obviously a Project Euler puzzle (for example). I also know that the solutions to many of these problems can be found with a quick Google search.

That leaves it up to the Stack Overflow community. As I said, I see nothing wrong with posting questions asking for hints and tips to questions from puzzle sites. I'd like to see more of them. What I do see wrong, is when people post full solutions to these problems. Having a full solution takes away the learning experience that one might have otherwise enjoyed. I think that taking away a learning experience goes against the spirit of both Project Euler and Stack Overflow.

Tuesday, December 16, 2008

Google is Visually Impaired

What real difference does it make if you use HTML tables instead of CSS to control the layout of your Web page? If you're a professional Web designer you probably already know, and you can stop reading now. Thanks for stopping in. For the rest of you, the title of this article is a hint.

The truth is, HTML tables are a lot easier to control. I can control every aspect of the appearance of a table using very little markup, and as a bonus it looks pretty much the same across all the major browsers. And I'm not even a Web developer. This seems like an easy choice. I don't want to have to learn about liquid layouts. Absolute positioning sounds like a good thing, but then I have to learn about something called the box-model hack? No thanks. Tables are fine. Tables are easy.

But what about Usability? I remember taking a course in college about that, and reading this really cool book by a guy named Don Norman called The Design of Everyday Things. It's about how simpler design is better, and that's why the iPod is so awesome. That goes right along with designing Web pages with tables. They're simple, right?

Not so fast. I also remember Norman saying something about how design should be simple from the perspective of the user. There's a trap here that's easy to fall into. You don't want to make a design decision base on how easy it is to implement, you want to make the choice based on how easy the design is to use. We've already established that tables are a lot easier on the developer, but how could they make a difference to the user? They're just reading the information off the page, right?

Well, most of them are. Some people have to use screen readers, and screen readers don't read tables in the same order as they read the elements in a CSS layout. It's a really small percentage of internet users, though, and you can't even find accurate statistics because all of the studies lump everyone with a disability in to one big group in order to inflate the numbers. It's really small, something like 10%* of internet users are using some sort of assistive technology like a screen reader. So for some small percentage of internet users, probably less than 10%, using a table-based layout will provide a less than ideal user experience.

Visual impairment is really random, though, so that means that about 10% of virtually any of internet users are using some sort of assistive technology like a screen reader. So for some small percentage of internet users, probably less than 10%, using a table-based layout will provide a less than ideal user experience. Can you really afford to turn away a random 10% of your potential Web audience? If you can, keep reading.

Google is visually impaired. (Let that sink in.)

Even if you don't care about users with accessibility issues (and you should), there's one user that everyone should care about. The web crawler that Google uses to scan the internet and index your site can't really see the pages. It reads them just like a screen reader would. That is, it reads pages from top to bottom, just like the screen readers are programmed to. They can't tell the difference between a table used for layout and a table used to hold tabular data, so they treat them both the same. HTML tables are meant for tabular data, so that's how screen readers and the Google web crawler read them.

Table-based layout is quick and easy. CSS layouts are complicated and hard. Does it really matter if you choose tables? Yes, more than most people realize.

* Really? Ten percent? No, I can't back that up. Even if it's only 2%, though, Google is still in that 2%.

Sunday, December 14, 2008

Freely Available Programming Books

In an earlier post, I listed some frequently recommended computer programming books. I got some good feedback from that post, so I thought I'd go a step further and provide some links to programming books that I've found for free online.

Please note that I'm going against my own previous advice here, as I haven't finished reading every one of these books from cover to cover (~~I haven't even started SICP, I admit~~), so don't take these as endorsements from someone who has read the books.

NOTE: One Anonymous poster pointed out that there's more material missing from the Google Book Search selections than at first appeared. I've had to remove those books, but you can search their selection through the link provided.

Without further ado, I present to you the 7 best free programming books that I could find online. Enjoy.

Please let me know if you find any of these links useful. I'm particularly interested to learn what people think of SICP, since it's been on my "to read" queue for quite awhile. If you decide to read it, be sure to check out the SICP companion video lectures. (Update: I've started reading SICP since this was first published, and I'm keeping track of my progress starting with The SICP Challenge.)

I'd also certainly be interested to learn of any other free online book resources that anyone can recommend.

Related posts:

Even More Free Programming Books

Wednesday, December 10, 2008

Books Programmers Don't Really Read

Mark Twain once said that a classic novel is one that many people want to have read, but few want to take the time to actually read. The same could be said of "classic" programming books.

Periodically over on Stack Overflow (and in many other programming forums) the question comes up about what books are good for programmers to read. The question has been asked and answered several times, in several different ways. The same group of books always seems to rise to the top, so it's worth it to take a look at these books to see what everyone is talking about.

Books Most Programmers Have Actually Read

I've read all of these books myself, so I have no difficulty believing that many moderately competent programmers have read them as well. If you're interested enough in programming that you're reading this blog, you've probably read most, if not all of the books in this list, so I won't spend time reviewing each one individually. I'll just say that each of the books on the list is an exceptional book on its respective topic. There's a good reason that many software developers who are interested in improving their skills read these books.

Among the most commonly recommended programming books there is another group that deserves special consideration. I call the next list "Books Programmers Claim to Have Read". This isn't to say that no one who recommends these books has actually read them. I just have reason to suspect that a lot more people claim to have read the following books than have actually read them. Here's the list.

Books Programmers Claim to Have Read

Introduction to Algorithms (CLRS)
This book may have the most misleading title of any programming book ever published. It's widely used at many universities, usually in graduate level algorithms courses. As a result, any programmer who has taken an algorithms course at university probably owns a copy of CLRS. However, unless you have at least a Masters degree in Computer Science (and in Algorithms specifically), I doubt you've read more than a few selected chapters from Introduction to Algorithms.

The title is misleading because the word "Introduction" leads one to believe that the book is a good choice for beginning programmers. It isn't. The book is as comprehensive a guide to algorithms as you are likely to find anywhere. Please stop recommending it to beginners.

Compilers: Principles, Techniques, and Tools (the Dragon Book).
The Dragon Book covers everything you need to know to write a compiler. It covers lexical analysis, syntax analysis, type checking, code optimization, and many other advanced topics. Please stop recommending it to beginning programers who need to parse a simple string that contains a mathematical formula, or HTML. Unless you actually need to implement a working compiler (or interpreter), you probably don't need to bring the entire force of the Dragon to bear. Recommending it to someone who has a simple text parsing problem proves you haven't read it.

The Art of Computer Programming (TAOCP)
I often hear TAOCP described as the series of programming books "that every programmer should read." I think this is simply untrue. Before I'm burned at the stake for blasphemy, allow me to explain. TAOCP was not written to be read from cover to cover. It's a reference set. It looks impressive (it is impressive) sitting on your shelf, but it would take several years to read it through with any kind of retention rate at all.

That's not to say that it's not worthwhile to have a copy of TAOCP handy as a reference. I've used my set several times when I was stuck and couldn't find help anywhere else. But TAOCP is always my reference of last resort. It's very dense and academic, and the examples are all in assembly language. On the positive side, if you're looking for the solution to a problem in TAOCP (and the appropriate volume has been published) and you can't find it, the solution probably doesn't exist. It's extremely comprehensive over the topic areas that it covers.

Design Patterns: Elements of Reusable Object-Oriented Software (Gang of Four)
Design Patterns is the only book on this list I've personally read from cover to cover, and as a result I had a hard time deciding which list it belongs on. It's on this list not because I think that few people have read this book. Many have read it, it's just that a lot more people claim to have read it than have actually read it.

The problem with Design Patterns is that much of the information in the book (but not enough of it) is accessible elsewhere. That makes it easy for beginners to read about a few patterns on Wikipedia, then claim in a job interview that they've read the book. This is why Singleton is the new global variable. If more people took the time to read the original Gang of Four, you'd see fewer people trying to cram 17 patterns into a logging framework. The very best part of the GoF book is the section in each chapter that explains when it is appropriate to use a pattern. This wisdom is sadly missing from many of the other sources of design pattern lore.

The C++ Programming Language
This book is more of a language reference than a programming guide. There's certainly plenty of evidence that someone has read this book, since otherwise we wouldn't have so many C++ compilers to choose from.

Beginning programmers (or even experts in other languages) who want to learn C++, though, should not be directed to The C++ Programming Language. Tell them to read C++ Primer instead.

As I said before, I know there are a few of you who have actually read these books. This post isn't intended for you, it's intended for the multitudes who are trying to appear smarter by pretending to have read them. Please stop recommending books to others that you haven't read yourself. It's counter productive, as often there is a better book (more focused on a specific problem domain, easier to understand, geared more toward a specific programming language or programming skill level) that someone more knowledgeable could recommend. Besides that, you may end up embarrassing yourself when someone who has actually read TAOCP decides to give you a MMIX pop quiz (if you don't know what I'm talking about, then this means you).

Friday, December 5, 2008

Pair Programming

I had the chance to do some pair programming this week and I thought I'd share my impressions. For those of you who may have just recovered from a recent coma, pair programming is a technique where two programmers collaborate on a piece of code using one keyboard and monitor. The person typing is called the driver, and the person observing is called the navigator.

I entered in to my pair programming arrangement as not exactly skeptical, but I wouldn't say I was convinced by all I had read. I'd need to spend some more time coding with a partner (or see some citations) before I'm prepared to accept everything the Pair Evangelicals claim. Regardless, I was just curious enough to give it a chance and see if it could work.

The thing that I noticed within the first hour was that having a navigator was much less of a distraction than I had feared. I was worried that it would devolve into an unproductive chat session, but that didn't happen. There was a bit of chit-chat now and then, but each time within a few minutes one of us would remind the other of the mountain of work we had to do. Having a partner wasn't a distraction at all, it actually kept me slightly more focused than normal.

Another thing that struck me fairly quickly was that having a second pair of eyes on my code while I was writing it kept me honest. I took fewer shortcuts, commented my code properly, tested more, and refactored more often. All of the things you're supposed to do even when no one is watching. It was like having a code review really early in the project.

When it was my turn to navigate I noticed something else. I think differently when I'm not the one typing in the code. Having my hands on the keyboard forces me into a very logical mode of thought where I'm worried about details. Observing someone else writing code allowed me to think at a higher level of abstraction. You may have experienced this yourself when you're designing on paper, at a whiteboard, or writing pseudocode. Not worrying about the details of what will make your code compile allows you to think more about design-level aspects of your software.

Overall I would rate my experience with pair programming as a success, but I don't think that it's strictly necessary. If you haven't tried it, I would recommend it for a few days just to see what it's like. It has been worth it for me just to note the difference in my own way of thinking when I'm away from the keyboard. This difference will likely cause me to take more frequent breaks from typing so I can spend time at the whiteboard thinking about my software from a different point of view, even when I'm coding solo.

Bill the Lizard