On October 11, 2003, I sent the following email to Dr. Stephen
LaRocca, who at that time was employed by the United States Military
Academy as director of the Center for Technology Enhanced Language
Learning (CTELL) and Professor of French and Linguistics.
Subject: Nasality/lip-shape detection for CALL
Dr. LaRocca,
I just found your CTELL web site and I found it very interesting. I am
a speech recognition researcher, and I am interested in computer-aided
language learning
(CALL) but I have never worked in it.
The CALL technology I have read about seems based on one-dimensional
pronunciation scoring, sometimes enhanced by supplying information about
places in the utterance where the score was particularly bad. It would
be nice if the software went past this to giving specific feedback (in
articulatory or acoustic
terms) on how to improve. I wonder if this goal could be brought closer
by using additional input devices besides the microphone.
I am interested in your opinion of the following ideas as an expert on
CALL and French instruction. I expect you are busy and will not be
offended if there is no reply. What I am wondering is, assuming the
technology described below can be made to work reliably, do you think
there would there be enough pedagogical benefit to justify its use?
(Let's say either kind of extra device costs $50/seat and requires an
extra minute spent at the start of each session adjusting the device's
position.)
- To determine whether the student is nasalizing:
a nose clip which holds under the nostril a pressure sensor (to detect
air
flow) or a temperature sensor (to detect warm air from nostril)
- To determine whether the student is sticking their lips out:
video camera(s)
The examples (nasality, lip use) relate to difficulties an English
speaker might have with French pronunciation.
The use of a camera to observe lips has the advantage that it observes
the lips separately from the other articulators, as opposed to a
microphone signal which has the effects of various articulators
mixed together. I used the simple example of how far the lips are out
because of my ignorance of other aspects of lip usage in French and of
how much sophistication computer vision technology can provide.
Thank you,
David Gelbart
Dr. LaRocca replied:
Hello David.
Allow me to paraphrase your question in hopes of demonstrating that I
understand it. Do I think that additional input devices, including a
sensor for nasal airflow and a videocamera aimed at the lips, would
benefit those of us who want to automate pronunciation evaluations for
learners of languages such as French?
Actually, I have never considered sensors other than microphones for
this purpose, though I do understand why you have.
My first reaction is to say hey, why not? Bring on the noseclamps
(ouch!) and the videocams. Neither device seems particularly difficult
to build or install. Try them out; some students will presumably
learn/more better with them than without them. A project at Johns
Hopkins University's CLSP Summer 2000 Workshop showed that adding mouth
geometry data derived from video camera imput improved speech
recognition accuracy for televised news broadcasts. A similar approach
might be applicable for pronunciation feedback as well. Look under
Audio-Visual Speech Recognition at www.clsp.jhu.edu/ws2000
In way of a second reaction, allow me to point out that synchronizing
the detection of nasal airflow and lip rounding/spreading with sound
segment time boundaries is likely to be difficult. Our CTELL is
wrestling right now with how to teach tone in multiword Chinese
utterances, and a similar synchronization issue faces us.
So my answer is, I am not sure. Inexpensive modifications to existing
computer workstations that will sense directly nasal air flow and lip
rounding/spreading might provide considerable help to some students of
French. Of the two sensors, I like the vidcam better. Sorting out the
double set of French front vowels (round and unrounded) using a
pre-synchronized image with sound (from the camera and its microphone)
might be a great place to start.
Good luck with your investigations!
Steve LaRocca