The terrible state of MCQs

In academia – as in many other contexts – Multiple Choice Questions (MCQs) are a fact of life. We have many situations in which we need to assess a large cohort of learners, and MCQs make this easier.

Here, I don’t want to debate the value of MCQs as an examination technique in general, but talk about how they are written. If we accept that they are part of our work, let’s at least make them not terrible.

There are three very common ways in which MCQs can be obviously bad. I see them all the time.

They are:

  1. The question only examines surface knowledge
  2. The distractors are random
  3. The off-by-one way of writing distractors

Let’s look at each briefly.

The question only examines surface knowledge

This is quite an easy and obvious shortcoming to recognise: Often, MCQs just ask questions about terminology. They do not touch on concepts or deeper understanding. This is a very common trap to fall into for people who are not used to writing MCQs. Writing MCQs that test understanding is harder, and it’s easy to slip into the trap of just testing surface issues.

We have probably all seen it.

When you write your questions, examine each question critically in this light. Strive for meaningful questions.

The distractors are random

Another very common gripe of mine is that the distractors (the incorrect options in MCQs) are chosen at random. Let’s consider an example. Assume we have this question:

Question 54:

What is the output of the following segment of code?

        int sum = 0;   
        for (int i=0; i<=5; i+=2) {
            sum = sum + i;
        }
        System.out.println(sum);

If you speak Java, then you can work out that the answer is 6. So we use 6 as one of the possible options, and add a few random distractors:

Answer:

(A) 1
(B) 896
(C) 6
(D) -4
(E) 5

In this case, if students have a common misconception, they may arrive at an answer that is not on the list. For example, a student who does not understand that this definition makes the loop increment by 2 might assume that the answer is 15. But 15 is not on the list. So the student is implicitly told that they are wrong, and they can discard unlikely options (“Negative? – No.” “More then 800? – No.”) and make a random guess between a few remaining options.

This question is even worse in a way: Another common mistake is not being able to correctly distinguish between <= and < in the loop condition, and getting the answer wrong because of this. The way the question is written, however, it is not even possible to expose this misconception, because both variants happen to result in the same output!

A badly written question.

The off-by-one way of writing distractors

The other very common pattern I see all the time has an example in this question.

Question 31:

 Which of the following code segment would execute the stored procedure “getPassword()” located in a database server?

  1. CallableStatement cs = con.prepareCall("{call.getPassword()}");
    cs.executeQuery();
  2. CallabledStatement callable = con.prepareCall("{call getPassword()}");
    callable.executeUpdate();
  3. CallableStatement cab = con.prepareCall("{call getPassword()}");
    cab.executeQuery();
  4. Callablestatement cstate = con.prepareCall("{call getpassword()}");
    cstate.executeQuery();

The way questions like this are written is usually this: The question setter writes the correct response, and then creates three distractors by introducing one error in each.

It is trivial to reverse-engineer the correct answer from this. To do this, we spot the differences, and for each differences discard the odd one out. For example:

  • One difference is that “{call getpassword()}” is written with a dot in the middle once, but without the dot three times. So the dot is the odd one out, we discard Answer 1.
  • Another difference: CallableStatement is written with w capital S three times, with lowercase s once. So we discard answer 4.
  • Last one: The second line calls executeQuery() three times, and executeUpdate() once. Answer 2 is out.

And voilà – there is our answer: 3. We can do this without any knowledge or understanding of the subject matter. Keep your eyes open for this pattern in MCQ tests. It is astonishing how often you will see this question pattern.

A better example

Here is an example of something you could do:

Question 8:

HashMap<Integer, String> myMap = new HashMap<>();

myMap.put(2, "Tiger");
myMap.put(4, "Lion");
myMap.put(13, "Lion");
myMap.put(2, "Elephant");
myMap.put(3, "Chicken");

System.out.println(myMap.get(2));

When you try to compile and run this code, what will happen?

    1. The code has an error, because the value “Lion” is used twice in the HashMap.
    2. The code has an error, because the key 2 is used twice in the HashMap.
    3. The code will run and print “Tiger”.
    4. The code will run and print “Lion”.
    5. The code will run and print “Elephant”.

 In this case, the question tests an understanding of how maps work.

  • Option 1 tests the possible misconception that values cannot appear twice in a map.
  • Keys, on the other hand, are unique in a map. Option 2 tests the misconception that they cannot be replaced by a new entry.
  • Option 3 tests the misconception that the first entry for a given key persists.
  • Option 4 tests the misconception that the get(n)-parameter is an index, mirroring list-behaviour (referring to either the second or third element entered, depending whether the list is zero-based or one-based).
  • And Option 5 is correct.

This is by far not all that there is to say about MCQ writing. But it’s a start. It is the absolute minimum you should consider if ever you find yourself in the position to write and MCQ test.

A Light In The Darkness

lightComputing education in schools is in a dire state—we have known that for some time now.

Over the last couple of years, reports came out thick and fast stating that school age students find computing boring, that fewer students are taking computing classes, that offerings of computer science in schools is declining, and that this is  leading to a severe skills shortage. All seemed to indicate that students are really not interested in programming anymore.

Then today I stumbled upon this report of a survey of UK GCSE school students. (For the non-Brits: the GCSE is a set of tests that students take at the age of about 16.) And the result: Many students want more programming!

Quote:

Most students (63 per cent) said they would have liked more subjects to choose from. Topping the wish list for school based learning is computer programming voted for by almost a quarter (22 per cent).

So it’s not the students who are losing interest at all. It is just that computing in schools has gone so off track that it is actively turning the kids away. Lets go out and get programming back into schools!

Quality-oriented teaching of programming

One of the great mysteries of introductory programming education is what’s commonly known among computing education practitioners as the double hump.

The double hump refers to the marks distribution in a typical introductory programming course. Here is an example:

chart1

Figure 1: The double hump distribution of marks in a programming course


This graph shows a typical distribution of final marks in a first programming course. We have a whole bunch of students getting very good marks and a lot of them failing. Mark Guzdial, in a blog post, calls it the 20% Rule:

“In every introduction to programming course, 20% of the students just get it effortlessly — you could lock them in a dimly lit closet with a reference manual, and they’d still figure out how to program. 20% of the class never seems to get it.”

The surprising thing about this distribution is how constant it seems to be. It has been observed in programming courses all over the world, largely independent of geographical or social context, and over a long period of time: The same pattern that we observe now existed 10 years ago, and 20 years ago, and 30 years ago.

(And an ironic side note is that we as teachers tend to react to this by aiming our teaching at the medium level—thus teaching to a group that contains hardly any students at all…)

So what is this telling us?

For programming teachers, this poses some interesting questions and challenges. One conclusion has been that the ability to understand programming is somehow intrinsic. That there are people who are just good at it, and others how simply cannot get it. This has led to a whole lot of research about programming ability predictors: ways in which we can predict, by looking at people’s prior activities, performance, social context, or any other aspect of their lives, or with a hopefully simple test, whether or not they will be successful in learning programming before we make them go through it.

Some people really believe in these predictors, some do not. I am generally in the second category. Let’s say, I am at least highly sceptical. The first paper linked above, for example, draws what I believe to be severely invalid conclusions. It interprets the data in a way that the data just does not support. But that’s a story for another day.

What I really want to talk about is an idea that I picked up from a fascinating seminar that Anthony Robins gave a few months ago in our department: That the cause of the high failure rate that we are observing might not be any kind of intrinsic capability, but caused by the sequential nature of the material we are teaching.

What is going on might be this: The material in programming courses is highly hierarchical. Every topic strongly builds on the topics previously covered. Thus, if you don’t understand one section of the course, you will likely also struggle in the following sections, unless you spend extra time to catch up and make up for what you missed before.

Therefore, programming courses are self-amplifying systems. If you fall behind a bit towards the beginning, for whatever reason, you are likely to fall more and more behind in the following weeks and months. If you get ahead early, you are likely to stay ahead, or even move further ahead. The short of it is that this dependancy of performance on prior performance in the course will automatically divide the class into two distinct groups whose performance drifts further apart as time goes on.

Voilá, the double hump is born.

Anthony, in his seminar, made some very interesting points that provided strong indicators that this might be what’s going on. He is, as far as I know, in the process of publishing these findings, so I won’t go into too much detail here. (I will update this post with a link once his paper is out.) We can leave discussion of whether we belive this or not for another day.

For today, I’ll simply state that I find it entirely plausible that this is at the root of the problem. So for now, I will work with the premise that the double hump is not caused by some intrinsic personal capability, but by the strong hierarchical nature of the material we teach.

The question that I asked myself then is: What should we do about it?

My conclusion is that we need to shift from quantity-oriented teaching to quality-oriented teaching. Here is what I mean.

In a typical programming course at a university today, we may have a number of assessments. (For the purpose of this argument it does not matter what that number is—it might be a smaller number of large assessments or a larger number of small ones.) Students get marked on every assessment, and they pass the course if their average grade is above some defined pass level. Figure 2 shows a student in this scheme.

chart2

Figure 2: A good student attempts six assessments and passes most of them

In this example, there are six assessments and the student is doing okay: The average mark is above the pass level. The pattern for a weak student might look like Figure 3: The student also attempts six assessments, but mostly fails.

chart3
Figure 3: A weak student attempts six assessments and mostly fails

And this is a problem. It is not only a problem that the student is failing, it is a failure to teach sensibly if we believe in the hierarchical nature of our material. Once the student severely fails Assessment 2, there is really no point to let him/her move on to Assessment 3. In fact, it’s ridiculous. We are essentially saying: “We have just established that you did not understand concept X, so now go on to study concept Y, which is harder and builds on concept X.”

Now, that’s just plainly stupid.

We have essentially demonstrated that you have no chance to move to something harder, and then make you move on to something harder. Without giving you a break to catch up. The problem is: This is exactly what we usually do.

So what can we do about it? The solution might be to re-orient the pass line in our diagram (Figure 4).

chart4
Figure 4: Re-thinking the pass level

Typically, in our courses, the x-axis (number of assignment students do) is fixed, while the y-axis (quality of submissions) is variable. And we define success as reaching a given level of quality (see Figure 3).

We should turn that around. We should fix the y-axis and make the x-axis variable, thus re-defining success as successfully completing a certain number of assessments. This means that every student has to achieve a defined minimum level of quality in each assessment before they are allowed to move on to the next assessment. The differentiation between students then is not what quality level they have achieved, but how many stages of assessment they have managed to complete. In a diagram, it looks like this: Figure 5 shows a good student at the end of the course. Nine assessments have been completed, beyond the pass level of assessment 6.

chart5
Figure 5: A good student completing enough assessments to pass

The graph for a struggling student then looks like this (Figure 6).

chart6
Figure 6: A a struggling students failing to reach pass level

This student has not reached pass level and fails the course. Note, however, the difference here to our student in Figure 3: There are no assessments recorded below the minimum accepted quality level. (Every submission attempt with insufficient quality is simply judged as “not completed”.)

I don’t believe that this scheme will solve all our problems, but I do believe that it has a number of advantages:

  • Students may have a higher chance of success. Instead of early failure leading almost automatically to a sequence of further failures, they have a chance to recover.
  • Students who learn at different speeds can all survive.
  • Especially good students can go even further than they did in the past, because we can allow them to move forward according to their own capabilities.
  • More flexibility in dealing with temporary problems: If a student misses the first three weeks of class, be it because of illness or any other reason, they may still pass the class instead of having no chance.
  • Even failing students learn something. Instead of failing to understand every topic (Figure 3), they can at least spend their time understanding a few topics (Figure 6).

There are a number of problems and challenges as well, of course. The obvious difficulty is that we must be able to allow different students to work on different material at the same time. Every student effectively progresses at their own personal pace, and the teaching must support this.

Is this organisationally possible? I think so.

It is not easy, and it requires us to severely change how we teach, but I think it’s worth it. It will require different form of instruction (away from big lectures to the whole class) and different form of assessment (away from giving everyone the same task and assessing it by just submitting source code). Otherwise there would be a problem with plagiarism. We might need to assess by interviewing students to actually test their understanding!

Does this create work? Sure. But I think it’s worth it. And I actually believe that, once the change is made, the workload for teachers will be comparable to what we have now.

In an ideal world, this way of learning would have the result that different students learn the same material in a different amount of time. In reality, we will not get away from fixed length courses for the time being.

This means, that—at the end of a course—different students will have studied different amounts of material. Some students who move on to the next course will have covered the advanced material, some have not.

There will be an objection from other teachers that we let students pass who have not seen all the material, and that they do not know some concepts that they should know.

While this is true, I don’t believe that this is any different from our current situation. Which teacher would really claim that all students who passed their course have understood all concepts they were teaching? If one of my current student just barely clears the pass mark, there is a lot that we have covered that they don’t know. Claiming that they know more material because I talked over their heads about more advanced concepts for longer is nonsense.

I would rather have a student who properly understands 50% of the concepts than a student who half understands (and half misunderstands!) everything.