Light-weight clustering techniques for short text answers in human computer collaborative (HCC) CAA

Mary McGee Wood,Craig Jones,John Sargeant,Phil Reed

Light-weight clustering techniques for short text answers in human computer collaborative (HCC) CAA

2006

We first explore the paedogogic value, in assessment, of questions which elicit short text answers (as opposed to either multiple choice questions or essays). Related work attempts to develop deeper processing for fully automatic marking. In contrast, we show that light-weight, robust, generic Language Engineering techniques for text clustering in a human-computer collaborative CAA system can contribute significantly to the speed, accuracy, and consistency of human marking. Examples from real summative assessments demonstrate the potential, and the inherent limitations, of this approach. Its value as a framework for formative feedback is also discussed. Introduction Assess By Computer (ABC; Sargeant et al 2004), deployed at the University of Manchester since 2003, follows a human-computer collaborative (HCC) approach to assessment. We focus on constructed answers such as text and diagrams rather than answers requiring mere selection between alternatives. The HCC assessment process is an active collaboration between humans and a software system, where the software does the routine work and supports the humans in making the important judgements. One feature which distinguishes our approach from “traditional” CAA is our classification of question and answer types, which has three parameters. First, we distinguish constructed from selected answers (we strongly deprecate the traditional use of the term “objective” to mean “selected”). Second, we distinguish “closed” or truly “objective” from “open” or “subjective” questions. For closed questions, the substance of a correct answer can be specified in advance (although its expression can vary wildly and unpredictably: Wood et al 2005). Open questions typically ask for an original example or argument. A marking scheme can only describe meta-level properties of a correct answer, and a “model answer” can only be an example. Third, we distinguish loosely between long and short text answers. Length does not necessarily correlate with openness /closure: “Describe the causes of haemolytic disease in the newborn” calls for a paragraph of routine bookwork while “Give an original example of an exception to default inheritance” requires only a short phrase. Length also does not necessarily correlate with the levels of Bloom’s taxonomy (Bloom et al 1956). Its main significance in ABC is that different Natural Language Engineering techniques are optimised for different lengths of text. To date we have focussed on simple, robust, generic techniques which are best suited to short answers. Related Work The use of text clustering in CAA is far from unique; but the other work we are aware of, such as the examples below, limits itself to formative assessment and/or aspires to be fully automatic. Lutticke (2005) uses “logical inference” to compare student-drawn semantic networks with a model answer and generate formative feedback: the details of the comparison mechanism are unclear. Weimer-Hastings et al (2005) use Latent Semantic Analysis to compare student answers with expected answers in an Intelligent Tutoring System in research methods in Psychology. Its use is purely formative, and they have attempted to evaluate student learning gain but not the effectiveness of clustering per se (p.c.). Although the technique is generic, its application is question-specific: they refer to it as “expectation-driven processing”. Carlson & Tanimoto (2005) induce text classification rules from student answer sets. These rules are used “to construct ‘diagnoses’ of misconceptions that teachers can inspect in order to monitor the progress of their students” and to automatically construct formative feedback. Pulman & Sukkarieh (2005) aim for automatic marking of “short” (“from a few words up to five lines”) free text answers to factual (objective, in our terminology) science questions. They use relatively heavy-weight techniques from traditional computational linguistics, and compare answers with keywordbased “patterns”, for which machine learning techniques have been investigated. They have worked with real student data, and their best results correlate acceptably with human markers’ judgements, but on a very small sample, and it is not obvious that these techniques will scale up sensibly. The Paedogogic Potential of Short Text Answers Constructed-answer questions have significant advantages over selectedanswer questions for assessing students, even at the “knowledge” and “comprehension” levels of Bloom’s taxonomy. Recalling even a bare phrase like “mean cell volume” is a greater challenge than recognising it, even among cunningly chosen distractors; let alone the possibility of getting it right by luck. And even short text answers (1-30 words; or comparably simple diagrams) are surprisingly versatile. As the following examples (with genuine, representative, mostly good student answers) show, short text answer questions, set cleverly, can test all levels of the taxonomy. Knowledge: What single measurement would you make to confirm that an individual is anaemic? Student answer: haemoglobin concentration Comprehension: A blood sample was taken from a patient and he was found to have a high white cell count. On further investigation the patient was found to have a neutrophil count of 22 x 10/L. Give two examples of what this could be indicative of. Student answer: A recent or present bacterial infection. Or an allergic reaction. Application: What is the value at the root of this minimax tree? Student answer: 42 Analysis: ... What general significant problem with the size of search spaces does this illustrate? Student answer: There are too many to calculate. This problem illustrates the number of possible choices AI problems have to deal with; it is a combinatorial explosion. Synthesis: Rewrite the following replacing the underlined part with the appropriate pronoun: Ho regalato I quaderni a Paolo. Student answer: Glieli ho regalati. Evaluation: For each of the following pairs of classes, state whether or not it would be appropriate to relate them by inheritance, and why. If not, what other sort of relationship would be appropriate? – Car and Wheel Student answer: This one may be better as a composition instead. A car as an association with wheel, but a wheel can exist on its own without the car class.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations