Digital Speech Processing
Multi-Sensor Speech Science
ARCON has worked with multi-sensor arrays for speech analysis for many years. This work has primarily focused on the characterization of speech and sensors in high acoustic noise environments. Recently, ARCON has expanded this work into the use of sensor arrays that couple non-acoustic sensors with traditional acoustic sensors. This work was spured on by ARCON's involvement with the DARPA/ATO's Advanced Speech Coder Program (ASE). The first phase of this program was focused on the improvement of coder speech intelligibility in high acoustic noise military environments utilizing non-acoustic sensors. This section will provide some detail into the use of these sensor arrays for Speech Science. For details on contracting with ARCON for Speech Communication Research services follow this link.
The following plots will provide synchronous sensor traces of the speech signal as generated by a talker in a simulated acoustic noise environment. Details on the collection of this data can be found in the ARCON Report "Pilot Corpus for Multisensor Speech Processing", Oct. 2003. Many of the plots are interactive and clicking on the will provide an audio output. The following multi-channel plot provides the synchronous time traces for five sensors from a female talker in a simulated M2 Bradley armored personnel carrier acoustic noise environment. The noise field is at a Sound Pressure Level of 104 dBA.
Click within each individual plot to hear the corresponding sound file.
The sensors from top to bottom are:
1. Calibration Microphone - B&K Type 4155
2. Resident Microphone - M175 Noise canceling electret boom microphone
3. Electroglotograph (EGG) at larynx - Glottal Enterprises Model EG2 with 35mm electrodes
4. Physiological Microphone (P-mic) at larynx
5. Micro-radar sensor (GEMS) at larynx
The speech material is scripted sentences. The difference between the calibration microphone and the noise cancelation resident microphone signal is dramatic. Sensors 3 and 5 are non-acoustic and are expected to sense glottal vibrations, their signal should be invariant to the acoustic noise. Sensor 3, the EGG seems to have a complex structure. A 100 Hz high pass filter removes these features and results in a clear signal of the glottal vibrations. Sensor 4, the P-mic is robust to the acoustic noise but does pickup the signal along with the glottal vibrations. The following plots present the corresponding narrowband spectrograms for the five sensors.
The next plots represent a zoomed version of the second sentence in the first plot. Sensor 3 has now been high-pass filtered. Voiced and un-voices segments can be identified in the M-175 signal. The acoustic noise is evident in all acoustic sensors, but significantly reduces in sensors 2 and 3. The un-voices segments of the speech are evident only in the resident microphone signal.
The final multi-channel plot is a closer zoom on the area highlighted in the previous plot. Here we can clearly see the pitch period of the speech in sensor 2 and the glottal vibration period as seen by sensors 3, 4 and 5. It is expected that the phases of glottal opening and closing can be extracted from these signals. This should allow precise location of the instance of glottal closure.