Concept inventories for evaluating teaching

This is an evaluation guide.

What is it?

A concept inventory is a standardised multiple choice instrument developed collaboratively by subject experts to test students’ understanding of core concepts. It is used as a pre- and post-test to evaluate a given teaching strategy, activity or intervention. Many expert educators experience surprise when they realise that their students are struggling with knowledge they thought they could take for granted, and this has been a powerful stimulus to improve teaching.

Concept inventories came into existence at around the same time as the ‘Private Universe’ documentary (Harvard-Smithsonian Center for Astrophysics, 1987) highlighted a disconnect between assessment performance and underlying knowledge, revealing that graduating Harvard students had been able to pass their exams while lacking basic scientific knowledge. Although concept inventories look like multiple choice tests, they aren’t the same thing as assessment. They are not credit-bearing and students do not revise for them; instead they longitudinally test understanding of foundational concepts.

What can it tell me about teaching?

A concept inventory is designed to measuring learning gain, defined as ‘“distance travelled” or learning acquired by students at two points of their academic career’ (Sands et al, 2018). Typically students take it before and after experiencing a given educational intervention, and the results are used to evaluate the effectiveness of that particular pedagogical practice.

How could I use it?

You can use concept inventories to evaluate any approach to teaching core concepts. Eric Mazur used the Force Concept Inventory to compare his standard teaching with the Peer Instruction method he pioneered (watch his video lecture from roughly 31 min and then 56 min). He identified a doubling of learning gain with Peer Instruction (from 8% to 16%).

The ‘Where can I find examples’ section below gives some indication of the range of inventories in use – 95 for physics on one site alone. Unsurprisingly given the multiple choice question (MCQ) format, concept inventories are prevalent in STEM subjects. However, insofar as a test of objective knowledge is called for, they are amenable to a range of disciplines.

How can I collect the data or evidence?

In their review of concept inventory practices Madsen and colleagues (2017) provide helpful guidance on selecting a concept inventory. They also recommend that students take the whole test (not excerpts) supervised in-person during a session. They also observe that giving a small amount of credit for completing the tests improves participation without affecting scores. Giving feedback is also recommended; although some believe this compromises the security of the test, the test is no-stakes and consequently the incentives to revise or cheat are low.

For existing concept inventories you can use, see the section on examples below. If no existing inventory suits your purposes, you may be in a position to put in the time and effort to develop one. Kember and Ginns (2012, p98) set out five steps:

A group of experts determine the small number of core concepts to be included, sometimes using the Delphi technique to generate consensus.
Study and articulate students’ learning processes to understand these concepts. Experts may take these concepts so much for granted that they find it impossible to mentally return to a time when they didn’t understand them; for this reason they often draw on student interviews and responses to open ended questions, keeping a note of misconceptions.
Drawing on misconceptions identified in the previous step, develop several multiple choice questions and distracters for each concept – totalling around 30 questions since students should be able to complete the inventory in 30 minutes. Each question tests a single concept enabling incorrect responses to be easily interpreted.
Pilot the inventory with large numbers of students and analyse the results to establish validity, reliability and fairness. Kember and Ginns (2012, p98) and Lindell and colleagues (2007) suggest statistical and other tests for each of these aspects.
Iterate until the inventory’s characteristics are suitable for widespread use.

How can I interpret the data or evidence?

The aim of a concept inventory is to give insights about a given teaching strategy, activity or intervention.

At its simplest, analysis looks at the difference (eg raw gain, normalised gain, or – more commonly in social sciences – effect size) between the average pre-test and post-test scores, in order to associate that teaching strategy with changes in students’ conceptions and misconceptions. The teaching strategy can then be refined for the subsequent cohort and the concept inventory can be run again.

To find out more about for whom the strategy worked well, analyse the post-test results for differences in learning gain by characteristics such as gender, socio-economic background, race, disability and nationality. You could compare results between institutions. You could compare the pre-test results over time – are they stable or is students’ baseline knowledge changing?

What else should I know?

Students take these no-stakes tests seriously, according to Madsen et al, 2017. However, you may need to take measures to motivate their best efforts (Waters et al, 2019), describing to them why it exists and how it relates to both their and your learning. Offering feedback may help here.

MCQ tests have well-known limitations. One is that they cannot differentiate between mistakes due to lack of knowledge and mistakes due to divergent conceptions which fall outside the canon of anticipated misconceptions. Because unanticipated, these alternative conceptions are not included as distracters and cannot be selected by students; yet despite being objectively incorrect they could shed light on teaching strategies. To address these limitations, Sands and colleagues (2018) are researching a format for physics in which students construct their own answers but marking is still automated, maintaining standardisation.

This doesn’t look imminent, so in the meantime questions may be tiered as follows to elicit students’ reasons for their responses. Tier 1 is the conventional propositional multiple choice question with its stem, correct option and distracters. Tier 2 is another MCQ question asking for the scientific reason for the answer to the question in Tier 1. Tier 3 asks the student whether they are sure or unsure about their responses to Tiers 1 and 2. One example can be found in the supporting information to this article about testing understanding of carbohydrates. Another alternative adopted by Price and colleagues (2014) is a binary approach where students are asked to agree or disagree with statements about a vignette. Analysis is adapted to account for a higher rate of guessing.

There are other limitations which may particularly affect earlier concept inventories and which may be addressed by selecting carefully tested inventories (Lindell et al, 2007; Madsen et al, 2018) or developing new ones. One, identified by Laverty and Caballero (2018) is the emphasis on conceptual learning at the expense of attitudinal shifts and metacognition. Another is bias – for example evidence of gender bias prompted adaptations to the Force Concept Inventory.

Where can I find examples?

Some inventories are pubished as appendices to the journal articles which report them, while others are available from their authors on request.

One of the earliest examples is the Force Concept Inventory (Hestenes et al, 1992) which tests understanding of Newtonian mechanics. A later one is the Genetic Drift Inventory (Price et al, 2017) which uses true/false statements to evaluate teaching of randomness in evolution. A substantial though incomplete list of existing concept inventories, including contacts and references, is curated at Texas A&M University. For Physics education Physport lists 95 concept inventories.

Consider developing your own as outlined above.

References

Harvard-Smithsonian Center for Astrophysics. 1987. A Private Universe. https://www.learner.org/resources/series28.html.
Hestenes, David, Malcolm Wells, and Gregg Swackhamer. 1992. “Force Concept Inventory.” The Physics Teacher 30 (3): 141–58. https://doi.org/10.1119/1.2343497.
Kember, David, and Paul Ginns. 2012. Evaluating Teaching and Learning: A Practical Handbook for Colleges, Universities and the Scholarship of Teaching. London ; New York: Routledge.
Laverty, James T., and Marcos D. Caballero. 2018. “Analysis of the Most Common Concept Inventories in Physics: What Are We Assessing?” Physical Review Physics Education Research 14 (1): 010123. https://doi.org/10.1103/PhysRevPhysEducRes.14.010123.
Lindell, Rebecca S., Elizabeth Peak, and Thomas M. Foster. 2007. “Are They All Created Equal? A Comparison of Different Concept Inventory Development Methodologies.” In AIP Conference Proceedings, 883:14–17. Syracuse, New York (USA): AIP. https://doi.org/10.1063/1.2508680.
Madsen, Adrian, Sarah B. McKagan, and Eleanor C. Sayre. 2017. “Best Practices for Administering Concept Inventories.” The Physics Teacher 55 (9): 530–36. https://doi.org/10.1119/1.5011826.
Price, Rebecca M., Tessa C. Andrews, Teresa L. McElhinny, Louise S. Mead, Joel K. Abraham, Anna Thanukos, and Kathryn E. Perez. 2014. “The Genetic Drift Inventory: A Tool for Measuring What Advanced Undergraduates Have Mastered about Genetic Drift.” Edited by Michèle Shuster. CBE—Life Sciences Education 13 (1): 65–75. https://doi.org/10.1187/cbe.13-08-0159.
Sands, David, Mark Parker, Holly Hedgeland, Sally Jordan, and Ross Galloway. 2018. “Using Concept Inventories to Measure Understanding.” Higher Education Pedagogies 3 (1): 173–82. https://doi.org/10.1080/23752696.2018.1433546.
Smith, Michelle K., William B. Wood, and Jennifer K. Knight. 2008. “The Genetics Concept Assessment: A New Concept Inventory for Gauging Student Understanding of Genetics.” Edited by Diane Ebert-May. CBE—Life Sciences Education 7 (4): 422–30. https://doi.org/10.1187/cbe.08-08-0045.
Waters, David P., Dragos Amarie, Rebecca A. Booth, Christopher Conover, and Eleanor C. Sayre. 2019. “Investigating Students’ Seriousness during Selected Conceptual Inventory Surveys.” Physical Review Physics Education Research 15 (2): 020118. https://doi.org/10.1103/PhysRevPhysEducRes.15.020118.