When we look at a new language, what is most likely to impress us is what it has. It’s tempting to distinguish a language by adding built-ins — control structures, thread interaction facilities, resource management apparatus, error handling. These are easy to describe, easy to remember, easy to imagine using.

It’s a trap.

It’s not that there’s anything wrong with any given built-in, per se. They work fine, and help users code. It takes a lot of them to add much complexity to the implementation. C is always described as having a “rich set of operators”, and this is usually counted in its favor. Scripting languages have many more built-in features than C, shell languages even more.

That’s a clue. The less powerful a language is, the more built-ins it needs to attract users. Without built-in process control, pattern matching, and formatted i/o, a shell would be crippled. Users would switch to something else.

Languages we count as powerful get along without any such features. Those features can all be coded directly in the language, and put in a library. The only features we need to build into the core language are those we can’t code in the core language, or that we couldn’t present pleasingly to users if we did. In C, I/O is in the standard library, but for / break / continue is built in. strlen() is in the library, but argument passing is built in.

Let us consider Lisp. Lisp has memory management built in. This is always counted in its favor: Lisp has garbage collection, where (e.g.) C++ lacks garbage collection. This makes Lisp (like Haskell, Python, Java) “higher level” than C++. Could Lisp memory management be coded in Lisp, and put in the library? No way.

For contrast, consider control structures. Lisp needs no built-in for loop. With Lisp macros and lambda you can code your own control structures, and stick them in a library. People do. Common Lisp’s standard library has the most useful ones already. They look built in.

Lisp manages memory automatically, usually well enough for the programs people write in Lisp. What if you have other resources to manage? The programming world provides us plenty of others: sockets, database connections, locks, windows, closures, sessions, contexts. They’re distinguished from memory in that something has to happen when they go away, that the language doesn’t necessarily know about. You can make the language know a few, but it’s always easy to come up with more it doesn’t, or where you need closer control of them than the language offers. Indeed, memory may be one of the latter. My web browser used to pause frequently, each time ignoring UI events for 15 seconds while it garbage-collected Javascript memory.  Nowadays it segfaults when I’m not using it.

Advertisements

Dereferencing

2010-04-13

Powerful languages need pointers. Some languages try to make every name a pointer, and then pretend not to have pointers. We’re not fooled. Many make everything a pointer except the most useful things, as in Lisp and Java. Languages that make it hard to tell whether something is a pointer or not deserve a whole ‘nother posting. Here I’m going to talk about dereference syntax.

Dereference syntax was invented for assembly language. It was an elegant way to express an addressing mode, to set a particular bit in an instruction word. One common notation was a prefix * (asterisk). Others used ‘@‘, or parentheses, or brackets. Anything worked fine in assembly code, because there were no expressions to speak of. Prefix dereference was easy to understand, and caused no trouble.

When actual languages came along, prefix dereference operators were familiar and conventional, so they went in without much thought. It was just the obvious way to do things. It caused trouble in precursors to C, with expressions like (*p).i, leading to an additional operator to allow p->i. Pascal, wonder of wonders, got it right, with a postfix operator, thus p^.i, but a little too late for C to learn anything from it.

The mistake is revealed when we see constructions like (*p)->i — the new operator didn’t really help. In Pascal, of course, this would be p^^.i, without parentheses, and without the superfluous operator ->. Now, as syntax embarrassments go, this is a small matter. Mistakes are usually easy for the compiler to catch, and it doesn’t make most code much harder to read. To copy C declaration syntax, as in Java, is much worse. Still, why copy a mistake, when you can just get it right? C++ could have added a postfix dereference op@ any time, but it would have added complexity, not reduced it. Google’s proprietary language Go improves on C’s declaration syntax, but copies the much more easily fixed dereference mistake.

C gets a free pass. Not so every language that apes C syntax without C compatibility. For any such language, prefix pointer dereference syntax is an embarrassing mistake. Pascal got so few things right. Let us at least acknowledge and carry those forward.

People design new programming languages all the time. Each new language is awful in so many ways, but people don’t learn; the next is even worse. Here I’m going to explore how languages go wrong, and why, and a few cases where a language got something profoundly right — usually by contrast to the rest who got it wrong. I’ll draw examples from lots of languages.

Many languages embody mistakes that are forgiveable, because they were doing something for the first time, or when people didn’t understand the consequences so well. FORTRAN, LISP, COBOL, C, Algol — anything that predates 1972 — deserves a free pass. Everything was so hard, back then, that getting anything right was a triumph. Other languages get things wrong, but can’t help it. C++ adopted C’s mistakes, but had no choice about it; upward-compatible means bug-compatible, perforce. Languages that copy those mistakes, with full benefit of hindsight, have no excuse (cough Java cough). Languages that copy what earlier languages actually managed to get right, and get them wrong (cough Java cough), likewise deserve no mercy. Languages that try something interesting and new almost always get it tragically wrong. That would not be so bad if anybody would learn from it, and do it better next time. Sometimes they do. It happens rarely enough that we can afford to devote individual posts to what somebody, somehow finally got right.

People posting blinkered defenses of their favorite language will be mocked with little more mercy than their language got, albeit with grace, humor, and style. Feel free to join in skewering idiocy, but mind Muphry’s Law. It’s hard to write about idiocy without calling attention to our own. We all have plenty, but we don’t all need to display it all the time.