{"id":10,"date":"2006-07-01T11:29:02","date_gmt":"2006-07-01T10:29:02","guid":{"rendered":"http:\/\/www.bluej.org\/mrt\/?p=4"},"modified":"2020-05-15T15:46:31","modified_gmt":"2020-05-15T15:46:31","slug":"whats-wrong-with-syntax-colouring","status":"publish","type":"post","link":"https:\/\/blogs.kcl.ac.uk\/proged\/2006\/07\/01\/whats-wrong-with-syntax-colouring\/","title":{"rendered":"What&#8217;s wrong with syntax colouring?"},"content":{"rendered":"<p>Syntax colouring \u2013 the annotation of source code with different colours for keywords and other syntactic tokens \u2013 has become standard in just about all development environments. Yet, it often does not make sense.<\/p>\n<p>I don&#8217;t mean to say that it does not make sense at all to use colour to annotate source code. On the contrary. What I am saying is that almost all syntax colouring systems colour the wrong things.<\/p>\n<p><!--more--><br \/>\nLet&#8217;s have a look at what syntax colouring systems typically do, why it is not useful, and what they should be doing instead.<\/p>\n<p>To get started, let&#8217;s have a look at what the syntax colouring looks like in BlueJ today. BlueJ&#8217;s colouring is quite typical compared to other systems, so most of what we&#8217;re discussing here applies equally to any other environment.<\/p>\n<p><span style=\"color:#808080;font-size:14pt\">Example 1 \u2013 Syntax colouring today<\/span><span style=\"font-family:Georgia;font-size:13pt\"><br \/>\n<\/span><br \/>\nThe following is a bit of source code as presented in BlueJ&#8217;s editor:<span style=\"font-size:13pt\"><br \/>\n<\/span><br \/>\n<img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/www.bluej.org\/mrt\/wp-content\/uploads\/2006\/07\/_Users_mik_Sites_iblog_B2002791029_C1916402003_E2024643120_Media_code-current.gif\" border=\"1\" alt=\"syntax coloured code\" hspace=\"4\" vspace=\"4\" width=\"602\" height=\"742\" \/><\/p>\n<p>As we can see, the syntax highlighter distinguished several classes of tokens:<\/p>\n<p>\u2022 <strong>javadoc comments<\/strong> \u2013 blue<span style=\"font-family:Georgia\"><br \/>\n<\/span>\u2022 <strong>other comments<\/strong> \u2013 although the colour is so similar to the javadoc comments that it is hard to distinguish<span style=\"font-family:Georgia\"><br \/>\n<\/span>\u2022 <strong>strings<\/strong> \u2013 green<span style=\"font-family:Georgia\"><br \/>\n<\/span>\u2022 <strong>keyword class 1<\/strong> (abstract, private, public, if, else, <em>etc<\/em>.) \u2013 dark red<span style=\"font-family:Georgia\"><br \/>\n<\/span>\u2022 <strong>keyword class 2<\/strong> (class, int, and a few others) \u2013 bright red<span style=\"font-family:Georgia\"><br \/>\n<\/span>\u2022 <strong>keyword class 3<\/strong> (super, null, this, <em>etc<\/em>.) \u2013 light blue<\/p>\n<p><span style=\"color:#808080;font-size:14pt\">Reflection \u2013 what does syntax colouring do?<\/span><span style=\"font-family:Georgia;font-size:13pt\"><br \/>\n<\/span><span style=\"font-family:Georgia\"><br \/>\n<\/span>The general problem with this is that the distinction is based on lexical considerations, rather than structural reasoning. I will use the terms <strong><em>lexical syntax colouring<\/em><\/strong> for what is currently done and <strong><em>structural syntax colouring<\/em><\/strong> for what it should be.<\/p>\n<p>Lexical syntax colouring is based on information directly derived from a lexical analyser. The source character stream is divided into tokens, and those tokens are sorted into classes. Colour is then assigned to classes of tokens.<\/p>\n<p>The problem with this is that some of the lexical token classes are of little value to make the source code more readable for a programmer.<\/p>\n<p>Why, for example, is the word &#8216;int&#8217; coloured red in the field declarations, but &#8216;String&#8217; is not? Why does &#8216;abstract&#8217; have another colour than &#8216;class&#8217;? Why is &#8216;public&#8217; in the method header the same colour as &#8216;return&#8217; in its body \u2013 are they somehow related? Why is &#8216;super&#8217; different?<\/p>\n<p>We should approach this question by asking ourselves what the purpose of syntax colouring is.<br \/>\n<span style=\"font-family:Georgia\"><br \/>\n<\/span>In my view, that purpose of syntax colouring is to make source code more readable to programmers. The way it can achieve this, is by making the code structure more obvious.<\/p>\n<p>Does the colouring above do this? Well, partly yes, partly no.<\/p>\n<p>The one aspect where this succeeds in the above example is the distinct presentation of javadoc comments. This helps a lot separating the comment from the implementation, and also in separating methods from each other. But this is rather coincidental: it works, because a lexical token coincides with a logical structure.<\/p>\n<p>The colouring of different keyword classes, for example, is not especially helpful.<\/p>\n<p><span style=\"color:#808080;font-size:14pt\">What should syntax colouring do?<\/span><span style=\"font-family:Georgia\"><br \/>\n<\/span><br \/>\nStructural syntax colouring would use colour much more explicitly to emphasise the logical structure of the code, instead of the lexical structure. Interesting structural tasks are:<\/p>\n<p>\u2022 separating interface from implementation<span style=\"font-family:Georgia\"><br \/>\n<\/span>\u2022 separating comments from code<span style=\"font-family:Georgia\"><br \/>\n<\/span>\u2022 separating elements of a class (fields, constructors, methods)<\/p>\n<p>Highlighting javadoc comments should be retained, since this is the one thing that is very helpful in the current scheme. In addition to this, we should:<\/p>\n<p>\u2022 highlight the class header and the complete method signature<\/p>\n<p><span style=\"color:#808080\"><em>Finding the signature of a method is important; a programmer has to do this often.<br \/>\nSignatures should stand out from the rest of the text.<br \/>\n<\/em><\/span><span style=\"font-family:Georgia\"><br \/>\n<\/span>\u2022 highlight field definitions<\/p>\n<p><span style=\"color:#808080\"><em>Field definitions are a separate logical entity that should be easily identifiable.<\/em><\/span><\/p>\n<p>\u2022 distinguish between private and public fields<\/p>\n<p><span style=\"color:#808080\"><em>Private and public field serve different purposes and have important differences<\/em><\/span><span style=\"font-family:Georgia\"><br \/>\n<\/span><span style=\"color:#808080\"><em>in characteristics. This should pointed out by different appearance.<\/em><\/span><\/p>\n<p>\u2022 highlight control structures<\/p>\n<p><span style=\"color:#808080\"><em>The control structure highlighting should make it easy to recognise the parts<\/em><\/span><span style=\"font-family:Georgia\"><br \/>\n<\/span><span style=\"color:#808080\"><em>belonging to the control structure, and the embedded block. Highlighting<\/em><\/span><span style=\"font-family:Georgia\"><br \/>\n<\/span><span style=\"color:#808080\"><em>should colour the control structure keyword and associated parenthesis.<\/em><\/span><\/p>\n<p>\u2022 separate methods, constructors, field definitions and inner classes<\/p>\n<p><span style=\"color:#808080\"><em>The most obvious structure of a class is the division into its members. This<br \/>\nshould be immediately visually obvious.<\/em><\/span><\/p>\n<p>In addition to colouring these main structural aspects, there are two other things that might be helpful: I like the different colouring of strings, as they make some methods more readable, and it might be good to retain some colour for reserved words in method bodies to distinguish them from identifiers.<\/p>\n<p>The next question then is: what should it look like?<\/p>\n<p>The obvious mechanism that is usually employed is font colour. This will remain an important aspect.<\/p>\n<p>Another possibility would be type face styles (bold, italics) or different faces. I have not used this for now. Styles, such as bold, are not supported well for all fonts on all systems, and I fear that we might be getting into technical problems more than it&#8217;s worth. So I leave this as an open question for now.<\/p>\n<p>But another possible, underused technique, is background colouring, and other graphical annotation. To give you an idea what I mean, consider the following example.<\/p>\n<p><span style=\"color:#808080;font-size:14pt\">Structural syntax colouring \u2013 an example<\/span><span style=\"color:#808080;font-size:18pt\"><br \/>\n<\/span><span style=\"color:#808080;font-size:18pt\"><strong><br \/>\n<\/strong><\/span><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/www.bluej.org\/mrt\/wp-content\/uploads\/2006\/07\/_Users_mik_Sites_iblog_B2002791029_C1916402003_E2024643120_Media_code-new.gif\" border=\"1\" alt=\"new syntax coloured code\" hspace=\"4\" vspace=\"4\" width=\"621\" height=\"771\" \/><br \/>\n<span style=\"color:#808080;font-size:18pt\"><strong><br \/>\n<\/strong><\/span>Here, I have used background colour to make method definitions stand out. Coloured bars on the right distinguish field definitions, constructors, and methods. Further, public and private methods should be distinguished (not shown here).<\/p>\n<p>Method signatures are highlighted, and public and private field definitions are distinguished.<\/p>\n<p>This is just a quick screenshot mock-up to demonstrate an example. The actual colour values are debatable, and can no doubt be improved. I also think that people can come up with other ways to highlight logical code structure. But you get the idea, I assume.<\/p>\n<p>Let&#8217;s get rid of letting technical incidentals dictate the appearance, and make it useful for the reader!<span style=\"font-family:Georgia\"><br \/>\n<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Syntax colouring \u2013 the annotation of source code with different colours for keywords and other syntactic tokens \u2013 has become standard in just about all development environments. Yet, it often does not make sense. I don&#8217;t mean to say that &hellip; <a href=\"https:\/\/blogs.kcl.ac.uk\/proged\/2006\/07\/01\/whats-wrong-with-syntax-colouring\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":179,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[8,11,13],"tags":[25,42],"class_list":["post-10","post","type-post","status-publish","format-standard","hentry","category-programming","category-software-tools","category-teaching","tag-bluej-software-tools","tag-design"],"_links":{"self":[{"href":"https:\/\/blogs.kcl.ac.uk\/proged\/wp-json\/wp\/v2\/posts\/10","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs.kcl.ac.uk\/proged\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.kcl.ac.uk\/proged\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.kcl.ac.uk\/proged\/wp-json\/wp\/v2\/users\/179"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.kcl.ac.uk\/proged\/wp-json\/wp\/v2\/comments?post=10"}],"version-history":[{"count":1,"href":"https:\/\/blogs.kcl.ac.uk\/proged\/wp-json\/wp\/v2\/posts\/10\/revisions"}],"predecessor-version":[{"id":1077,"href":"https:\/\/blogs.kcl.ac.uk\/proged\/wp-json\/wp\/v2\/posts\/10\/revisions\/1077"}],"wp:attachment":[{"href":"https:\/\/blogs.kcl.ac.uk\/proged\/wp-json\/wp\/v2\/media?parent=10"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.kcl.ac.uk\/proged\/wp-json\/wp\/v2\/categories?post=10"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.kcl.ac.uk\/proged\/wp-json\/wp\/v2\/tags?post=10"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}