A lot of people are very aware that programming education is still quite difficult. We are a young discipline, and pedagogical principles are not genrally very well established. Many people teach programming, and generally how they do it is based on gut feeling. There are many grey areas where we just don’t know what works and what doesn’t, or why something works, or — most importantly — how to improve our teaching.
For example, students often fall into a bimodal distribution in programming classes: Some learn programming quite easily, and some are really struggling to “get it”. So far, there are many theories why this may be, and a good number of studies, but nobody really knows why this is.
It’s similar for the design of educational tools and environments – do we really know which aspects have an effect and which don’t? No, we don’t. Our discipline is really only at a beginning of an understanding of how people learn to program.
There has been work in this area for some time. Many people have studied data about early programming interaction. Getting your hands on this data can be hard work. Often, researchers collect data (either interaction data from the computer system, or interview or observation data), and then evaluate it. If the class is small, it is sometimes hard to be sure how much the results can be generalised. Collecting larger data sets, however, is hard, because most teachers have access only to their own students.
In our BlueJ project group, we discussed some time ago that we are in a fairly unique position to be able to gather data. BlueJ has a large user community, and there is potential to make use of this to further our work. And not only BlueJ development specifically, but programming education research in general.
So, some time last year we decided to initiate a new project: Project Blackbox.
The Blackbox idea is to collect data about the way beginners interact with BlueJ, and to make this data available to any interested research group to conduct their own studies with it.
For BlueJ users, this would only happen with explicit consent (opt-in) even though the data collected will be entirely anonymous. For researchers, we hope that this may create a treasure trove of data that might spark research that was not previously possible.
BlueJ is currently being downloaded over 2 million times a year, and has over 200,000 active users every month. Even if only 10% of users were to opt in to our project, we are still looking at hundreds of thousands of sessions per month, generating millions of interaction events.
We presented this idea at a special session at the last SIGCSE conference (session abstract, subscription needed), and several people expressed an interest.
So, we have now started on the design and implementation of this system, and I will occasionally give you an update here on my blog. If you are interested to keep a closer eye on it or get involved in the design discussions, you are welcome to join our mailing list for the Blackbox project.