Syntax colouring – the annotation of source code with different colours for keywords and other syntactic tokens – has become standard in just about all development environments. Yet, it often does not make sense.
I don’t mean to say that it does not make sense at all to use colour to annotate source code. On the contrary. What I am saying is that almost all syntax colouring systems colour the wrong things.
Let’s have a look at what syntax colouring systems typically do, why it is not useful, and what they should be doing instead.
To get started, let’s have a look at what the syntax colouring looks like in BlueJ today. BlueJ’s colouring is quite typical compared to other systems, so most of what we’re discussing here applies equally to any other environment.
Example 1 – Syntax colouring today
The following is a bit of source code as presented in BlueJ’s editor:
As we can see, the syntax highlighter distinguished several classes of tokens:
• javadoc comments – blue
• other comments – although the colour is so similar to the javadoc comments that it is hard to distinguish
• strings – green
• keyword class 1 (abstract, private, public, if, else, etc.) – dark red
• keyword class 2 (class, int, and a few others) – bright red
• keyword class 3 (super, null, this, etc.) – light blue
Reflection – what does syntax colouring do?
The general problem with this is that the distinction is based on lexical considerations, rather than structural reasoning. I will use the terms lexical syntax colouring for what is currently done and structural syntax colouring for what it should be.
Lexical syntax colouring is based on information directly derived from a lexical analyser. The source character stream is divided into tokens, and those tokens are sorted into classes. Colour is then assigned to classes of tokens.
The problem with this is that some of the lexical token classes are of little value to make the source code more readable for a programmer.
Why, for example, is the word ‘int’ coloured red in the field declarations, but ‘String’ is not? Why does ‘abstract’ have another colour than ‘class’? Why is ‘public’ in the method header the same colour as ‘return’ in its body – are they somehow related? Why is ‘super’ different?
We should approach this question by asking ourselves what the purpose of syntax colouring is.
In my view, that purpose of syntax colouring is to make source code more readable to programmers. The way it can achieve this, is by making the code structure more obvious.
Does the colouring above do this? Well, partly yes, partly no.
The one aspect where this succeeds in the above example is the distinct presentation of javadoc comments. This helps a lot separating the comment from the implementation, and also in separating methods from each other. But this is rather coincidental: it works, because a lexical token coincides with a logical structure.
The colouring of different keyword classes, for example, is not especially helpful.
What should syntax colouring do?
Structural syntax colouring would use colour much more explicitly to emphasise the logical structure of the code, instead of the lexical structure. Interesting structural tasks are:
• separating interface from implementation
• separating comments from code
• separating elements of a class (fields, constructors, methods)
Highlighting javadoc comments should be retained, since this is the one thing that is very helpful in the current scheme. In addition to this, we should:
• highlight the class header and the complete method signature
Finding the signature of a method is important; a programmer has to do this often.
Signatures should stand out from the rest of the text.
• highlight field definitions
Field definitions are a separate logical entity that should be easily identifiable.
• distinguish between private and public fields
Private and public field serve different purposes and have important differences
in characteristics. This should pointed out by different appearance.
• highlight control structures
The control structure highlighting should make it easy to recognise the parts
belonging to the control structure, and the embedded block. Highlighting
should colour the control structure keyword and associated parenthesis.
• separate methods, constructors, field definitions and inner classes
The most obvious structure of a class is the division into its members. This
should be immediately visually obvious.
In addition to colouring these main structural aspects, there are two other things that might be helpful: I like the different colouring of strings, as they make some methods more readable, and it might be good to retain some colour for reserved words in method bodies to distinguish them from identifiers.
The next question then is: what should it look like?
The obvious mechanism that is usually employed is font colour. This will remain an important aspect.
Another possibility would be type face styles (bold, italics) or different faces. I have not used this for now. Styles, such as bold, are not supported well for all fonts on all systems, and I fear that we might be getting into technical problems more than it’s worth. So I leave this as an open question for now.
But another possible, underused technique, is background colouring, and other graphical annotation. To give you an idea what I mean, consider the following example.
Structural syntax colouring – an example
Here, I have used background colour to make method definitions stand out. Coloured bars on the right distinguish field definitions, constructors, and methods. Further, public and private methods should be distinguished (not shown here).
Method signatures are highlighted, and public and private field definitions are distinguished.
This is just a quick screenshot mock-up to demonstrate an example. The actual colour values are debatable, and can no doubt be improved. I also think that people can come up with other ways to highlight logical code structure. But you get the idea, I assume.
Let’s get rid of letting technical incidentals dictate the appearance, and make it useful for the reader!