News & Events

Celebrating 50 years of excellence. Read more ...


ARCON Corporation services are listed on the GSA schedule. Read more ...

Code Algorithms

MELP at 2.4kbps

The Mixed Excitation Linear Prediction (MELP) voice coder model is based on the traditional LPC vocoder. However, the synthesizer has the following additional abilities that allow MELP to more naturally mimic human speech:

  • 1. Mixed pulse and noise excitation
  • 2. Periodic or aperiodic impulses
  • 3. Adaptive spectral enhancement
  • 4. Pulse dispersion filter
  • 5. Fourier magnitude modeling

The coder employs Fourier magnitude coding of the prediction residual to improve speech quality and vector quantization techniques to efficiently encode the LPC and Fourier information.[1] Some of the other characteristics of MELP are described below:

  • Frame Size: 22.5ms
  • Sampling Rate: 8kHz
  • Analyzer
    • High Pass Filter: 4thorder Chebychev type II
      • Cut-off Frequency: 60Hz
      • Stopband Rejection: 30dB
    • Bandpass Voicing Analysis: 6th order Butterworth filters, 5 frequency bands
      • 0-500Hz
      • 500-1000Hz
      • 1000-2000Hz
      • 2000-3000Hz
      • 3000-4000Hz
    • Linear Prediction Analysis: 10th Order
    • Error Protection: Unused coder parameters during unvoiced mode are replaced with forward error correction
      • Three Hamming (7,4) - corrects single bit errors
      • One Hamming (8,4) - detects double bit errors
  • Synthesizer
    • Mixed Excitation Generation - The sum of the filtered pulse and noise excitations
      • Pulse - Inverse Discrete Fourier Transform of one pitch period in length
      • Noise - Uniform Random Number Generator, RMS:1000, Range: -1732 to 1732
      • Pulse Filter: Sum of bandpass filter coefficients for voiced frequency bands
      • Noise Filter: Sum of bandpass filter coefficients for unvoiced frequency bands
    • Adaptive Spectral Enhancement Filter: 10th order pole/zero with 1st order tilt compensation
    • Linear Prediction Synthesis: Direct form filter, Coefficients correspond to interpolated LSF's
    • Pulse Dispersion: 65th order FIR filter derived from spectrally flattened triangle pulse

[1] "Analog to Digital Conversion of Voice by 2,400 Bit/second Mixed Excitation Linear Prediction (MELP)," Federal Information Processing Standards Publication (FIPS PUB) Draft, June 12, 1997

LPC at 2.4kbps

In their basic form, Linear Predictive Coding (LPC) algorithms achieve high compression ratios by developing short-term, steady-state models of the vocal tract and transmitting only the quantized and encoded parameters of these models. The speech production process is modeled by a flat spectrum excitation source that represents glottal movement, which is filtered by an all-pole, short-term stationary digital filter that models the shaping due to the vocal tract's response characteristics. The latest version of the LPC voice coding algorithm officially tested by the DDVPC is LPC-10e version 52. It conforms to the requirements of the DoD Standard for Operation at 2.4k bps (FED-STD-1015 dated 28 November 1984). The characteristics of LPC-10e are described below: [1]

  • Sampling Rate: 8 kHz
  • Frame: 22.5ms, 54 bits per frame
  • Analyzer: Semi-pitch synchronous
    • Linear Prediction Analysis: 10th Order
    • Low Pass Filter: 19 tap tranversal
    • Pitch: AMDF with dynamic pitch tracking (50Hz to 400Hz; 60 Values, 20/octave)
    • Voicing: 2 decisions/frame based on low band energy, zero-crossing count, spectral shape, and periodicity measures
    • Preemphasis: Single zero low frequency cut +6dB/octave high frequency boost
    • Matrix Load: Covariance
    • Matrix Invert: Truncated Cholesky decomposition
    • Reflection Coefficient (RC) Coding: Log area ratio for RC1 and RC2, linear for others
    • Transmission Error Protection: Hamming codes on selected bits during unvoiced and transition frames
  • Synthesizer: Pitch Synchronous
    • Error Correction/Detection: On selected bits during unvoiced and transition frames
    • Parameter Smoothing: Pitch, RMS, RC1 - RC6 during voiced frames, smoothing thrshold varies with error rate
    • Interpolation:
      • Log Area ratio for RC1, RC2
      • Linear for RC3-RC10
      • Log for RMS
      • Linear for pitch period
    • Deemphasis: 200Hz high pass single pole low frequency boost.

[1] J.P. Campbell Jr., T.E. Tremain, "Voiced/Unvoiced Classification of Speech with Applications to the U.S. Government LPC-10E Algorithm," IEEE International Conference on Acoustics, Speech, and Signal Processing, Tokyo, 1986, pp. 473-476.

CELP at 4.8kbps

Codebook Excited Linear Prediction (CELP) synthesis starts with an excitation using a signal obtained from a codebook of signals. The excitation signal is fed into a long-term (pitch) filter, followed by a short-term (LPC) filter and a short-term, pole-zero post filter with adaptive spectral tilt compensation. LPC parameters (34 bits/frame) are received as 10 line spectral parameters once per frame. Pitch delay (28 bits/frame), gain (20 bits/frame), codebook index (36 bits/frame), and codebook gain (20 bits/frame) are sent once per sub frame (4 times per frame). Error correction (4 bits/frame), synchronization (1 bit/frame), and an expansion bit are also transmitted once per frame. The transmitter's CELP analyzer contains a replica of the receiver's synthesizer (minus the post filter) that, in the absence of transmission errors, generates speech identical to the receiver's. This approximation is subtracted from the input speech and the difference is perceptually weighted. This perceptually weighted error is then used to drive an analysis-by-synthesis (closed-loop) error minimization gain-shape vector quantization (VQ) search procedure. The search discerns which adaptive and stochastic codebook indices and gains minimize the perceptually weighted error. Other characteristics of CELP are described below: [1]

  • Sampling Rate: 8k bps
  • Frame Size: 30ms (144 bits) with 4 equal length sub frames
  • Adaptive Codebook Size: 256 words
  • Stochastic Codebook Size: 512 words
  • Synthesizer
    • Error Protection: (15,11) Hamming code for protection of 10 bits of the "pitch" delay and gain (4 bits/frame).
    • Output filter: low-pass: -3dB at 3,600Hz, -18dB at 4kHz, -46dB at 4,400kHz
  • Analyzer
    • Input filter: 100-3,600Hz (These values indicate the -3dB points), -18dB at 50Hz and 4,000Hz, -46dB at 4,400Hz, No preemphasis
    • Short Term Linear Predictor: 10th order autocorrelation analysis, 30ms Hamming window
    • Bandwidth expansion: 0.994(LP analysis predictor coefficients) yields 15Hz expansion

[1] J.P. Campbell Jr., V.C. Welch, T.E. Tremain, "An Expandable Error-Protected 4800 BPS CELP Coder (U.S. Federal Standard 4800 BPS Voice Coder)," IEEE International Conference on Acoustics, Speech, and Signal Processing, 1989, pp. 735-737.

CVSD at 16kbps

The Continuously Variable Slope Delta-modulation (CVSD) attempts to reconstruct the exact waveform at the receiver that was input to the transmitter, and is thusly classified as a waveform coder. The digital output from the transmitter is 1 bit per input sample (upsampled to 16Kbps). The transmitted bit stream indicates the slope of the input waveform, and the slope-limiting detection looks at the 3 most recent bits transmitted. If these bits are all 1's or all 0's, the step size is doubled. For all other combinations, the step size is cut in half. The ratio between maximum and minimum step size is 16, and the sign of the slope is positive if the current bit is a 1 and negative if the current bit is 0.