What’s wrong with syntax colouring?

Syntax colouring – the annotation of source code with different colours for keywords and other syntactic tokens – has become standard in just about all development environments. Yet, it often does not make sense.

I don’t mean to say that it does not make sense at all to use colour to annotate source code. On the contrary. What I am saying is that almost all syntax colouring systems colour the wrong things.


Let’s have a look at what syntax colouring systems typically do, why it is not useful, and what they should be doing instead.

To get started, let’s have a look at what the syntax colouring looks like in BlueJ today. BlueJ’s colouring is quite typical compared to other systems, so most of what we’re discussing here applies equally to any other environment.

Example 1 – Syntax colouring today

The following is a bit of source code as presented in BlueJ’s editor:

syntax coloured code

As we can see, the syntax highlighter distinguished several classes of tokens:

javadoc comments – blue
other comments – although the colour is so similar to the javadoc comments that it is hard to distinguish
strings – green
keyword class 1 (abstract, private, public, if, else, etc.) – dark red
keyword class 2 (class, int, and a few others) – bright red
keyword class 3 (super, null, this, etc.) – light blue

Reflection – what does syntax colouring do?

The general problem with this is that the distinction is based on lexical considerations, rather than structural reasoning. I will use the terms lexical syntax colouring for what is currently done and structural syntax colouring for what it should be.

Lexical syntax colouring is based on information directly derived from a lexical analyser. The source character stream is divided into tokens, and those tokens are sorted into classes. Colour is then assigned to classes of tokens.

The problem with this is that some of the lexical token classes are of little value to make the source code more readable for a programmer.

Why, for example, is the word ‘int’ coloured red in the field declarations, but ‘String’ is not? Why does ‘abstract’ have another colour than ‘class’? Why is ‘public’ in the method header the same colour as ‘return’ in its body – are they somehow related? Why is ‘super’ different?

We should approach this question by asking ourselves what the purpose of syntax colouring is.

In my view, that purpose of syntax colouring is to make source code more readable to programmers. The way it can achieve this, is by making the code structure more obvious.

Does the colouring above do this? Well, partly yes, partly no.

The one aspect where this succeeds in the above example is the distinct presentation of javadoc comments. This helps a lot separating the comment from the implementation, and also in separating methods from each other. But this is rather coincidental: it works, because a lexical token coincides with a logical structure.

The colouring of different keyword classes, for example, is not especially helpful.

What should syntax colouring do?

Structural syntax colouring would use colour much more explicitly to emphasise the logical structure of the code, instead of the lexical structure. Interesting structural tasks are:

• separating interface from implementation
• separating comments from code
• separating elements of a class (fields, constructors, methods)

Highlighting javadoc comments should be retained, since this is the one thing that is very helpful in the current scheme. In addition to this, we should:

• highlight the class header and the complete method signature

Finding the signature of a method is important; a programmer has to do this often.
Signatures should stand out from the rest of the text.

• highlight field definitions

Field definitions are a separate logical entity that should be easily identifiable.

• distinguish between private and public fields

Private and public field serve different purposes and have important differences
in characteristics. This should pointed out by different appearance.

• highlight control structures

The control structure highlighting should make it easy to recognise the parts
belonging to the control structure, and the embedded block. Highlighting
should colour the control structure keyword and associated parenthesis.

• separate methods, constructors, field definitions and inner classes

The most obvious structure of a class is the division into its members. This
should be immediately visually obvious.

In addition to colouring these main structural aspects, there are two other things that might be helpful: I like the different colouring of strings, as they make some methods more readable, and it might be good to retain some colour for reserved words in method bodies to distinguish them from identifiers.

The next question then is: what should it look like?

The obvious mechanism that is usually employed is font colour. This will remain an important aspect.

Another possibility would be type face styles (bold, italics) or different faces. I have not used this for now. Styles, such as bold, are not supported well for all fonts on all systems, and I fear that we might be getting into technical problems more than it’s worth. So I leave this as an open question for now.

But another possible, underused technique, is background colouring, and other graphical annotation. To give you an idea what I mean, consider the following example.

Structural syntax colouring – an example

new syntax coloured code

Here, I have used background colour to make method definitions stand out. Coloured bars on the right distinguish field definitions, constructors, and methods. Further, public and private methods should be distinguished (not shown here).

Method signatures are highlighted, and public and private field definitions are distinguished.

This is just a quick screenshot mock-up to demonstrate an example. The actual colour values are debatable, and can no doubt be improved. I also think that people can come up with other ways to highlight logical code structure. But you get the idea, I assume.

Let’s get rid of letting technical incidentals dictate the appearance, and make it useful for the reader!

4 thoughts on “What’s wrong with syntax colouring?

  1. I do agree with your new view of syntax colouring,
    but I think you could use bold style to highlight
    all “invented” identifiers.

    I don’t think this would be a big problem if for
    some face, we couldn’t distinguish bold/unbold
    so well.

    In your example, only Person, name, yearOfBirth,
    size, and getYearOfBirth would be in bold style.
    toString wouldn’t.

    Oh, and I like very much your idea of background colors, especially the 4 white rectangles.

    I hope to see these ideas soon in BlueJ 3.0 …

    Thanks.

  2. I agree that the precise look (colours and font weights) can be improved a lot. And I agree that using bold fonts can help.

    I did not mean to suggest that this example is exactly how it should look like. It’s more the principles I tried to point out.

    Will this go into BlueJ 3.0? Maybe. It might make a nice student project…

  3. I like the ideas presented regarding redefining the way code is colored. I think that your example code is great (not perfect, as you suggest, it will need adjusting). Your point that concepts, not syntax, should be colored is exactly correct, especially when trying to teach new students. I also agree with the previous comment regarding trying to distinguish between user defined identifiers and system defined identifiers (whether part of the language or common usage: e.g., toString() and even the String class name should look different than MailItem class name, mailItem field, or readMailItem() method). Perhaps, user defined class names in bold face and member names (data and methods) in italic face.

    In any case, particulars aside, let’s keep the discussion going. Wouldn’t it be great if we could devise a methodology that the rest of the industry follows in the future.

  4. Are you familiar with Michael Van De Vanter’s work?

    http://research.sun.com/people/mlvdv/

    Some of his early work on Pan has probably been continued in the Jackpot project:

    http://portal.acm.org/citation.cfm?id=122804&coll=portal&dl=ACM&CFID=779810&CFTOKEN=42870153

    Furthermore, I would posit that all the coloring in the world might only serve to confuse, rather than aid a user. Color is a powerful tool, poorly understood, and perceieved differently by many people. Perhaps simply having a powerful editor capable of flagging errors (when and where they come up, a la Eclipse) is worth more than all the colors in the rainbow?

    While I’m on the topic, Ron Baecker’s work might be of interest as well:

    http://citeseer.ist.psu.edu/context/98445/0

    He deals with the typography and typesetting of programs, and the work is 20 years old. However, I think very little of his work has actually impacted development environments in use today.

Comments are closed.