Digital Audio Broadcasting


From douzzer Tue Jul 22 05:53:49 EDT 1997
From: Daniel Pouzzner <douzzer@mit.edu-antispam>
Newsgroups: alt.radio.digital
Subject: Digital Audio Broadcasting - draft "whitepaper"
Sender: <douzzer@mit.edu-antispam>
Organization: (private)


Digital Audio Broadcasting in the 76MHz-108MHz Band

The goal is to provide net bandwidth equal to a T1 (1536000 bits/s)
with a net BER of 10^-9 or better over a 150khz wide VHF channel with
50khz guardbands separating neighboring channels.  This corresponds to
the channel allocation scheme for current-day VHF WBFM audio
broadcasting.  Furthermore, coexistence with existing VHF WBFM, with
alternate-channel interference no more objectionable than that of a
WBFM signal of similar service contours, is a requirement.  It is not
felt that proper receiver operation in the absence of a proper antenna
of some sort is a requirement.

To get a ballpark feel for how this can and cannot be accomplished,
here is the math for a naive QPSK implementation:

200khz deviation for x cycles is 360 degrees of carrier y+200khz
1/y = x * (1/(y-200khz) - 1/y)
1 = x * (y/(y-200khz) - 1 )
x = 1/(y/(y-200khz) - 1)
x = 380 cycles = 5uS = 400kbit/s

Since a 2000kbit/s native rate is required, naive QPSK is clearly
insufficient.  Moreover, the high frequency hash of naive PSK is
unacceptable for broadcast applications.  The required scheme is a
variation on phase shift keying that can be described as
continuous-phase frequency shift keying, or CPFSK.  The frequencies
are -75khz, -25khz, +25khz, and +75khz deviations from assigned
carrier.  A switch to fractional CPFSK achieves the required five-fold
speedup thusly:

The QPSK phase advance/delays (0, 90, 180, 270) are divided by 5
(0, 18, 36, 54).

The timing and amplitude cues associated with this modulation scheme
are:

+/-0: +/-0 picoseconds, instantaneous amplitude is 0 * zero-to-peak
+/-18: +/-657 picoseconds at 76MHz, amplitude is .309 * zero-to-peak
+/-36: +/-1.315 nanoseconds, amplitude is .588 * zero-to-peak
+/-54: +/-1.973 ns, amplitude is .809 * zero-to-peak

(zero included for clarity)

Raw bandwidth is directly proportional to assigned carrier.  A station
at 107.9MHz has 1.41 times the available raw bandwidth of a station at
76.1MHz.  The increased error rate of the station at the higher
frequency serves to reduce, to a definite degree, the advantage
enjoyed by a station with a high frequency.  In practice the increased
error correction overhead of the stations at higher frequencies
essentially offsets any advantage.

Quieting ratio are both at most 10dB, sufficient to provide
substantial multipath rejection and immunity to noise and interfering
broadcasts.  Clever design of the control block (see below) can
substantially reduce this figure.  The rated net BER is achieved when
the quieting ratio is met or exceeded.  This system cannot exhibit the
remarkable capture robustness of the naive WBFM encoding scheme of
current-day radio, where .8dB capture ratios are not unheard of
(typically accompanied by near-total unusability of the stereo
subcarrier).  It is inconceivable that any spectrally frugal encoding
scheme will fully equal the graceful degradation of naive WBFM, and
this is a legitimate liability.

The modulator is a special-purpose direct-synthesizing 10 GHz 8 bit
DAC (implemented in GaAs) which accepts a symbol stream as input and
performs the modulation computations internally.  The modulator is
followed by a 5GHz LPF, then the transmission amplifier system
comsisting of a linear, non-tuned solid state exciter and power
amplifier.  Phase-preserving low pass filtering may be performed on
the amplifier outputs as necessary.  The antenna must be fairly
broadband, with a bandwidth of at least 2MHz at -3dB.  With minor,
inexpensive modifications (principally, replacement of RF filters with
phase-linear systems), many systems already deployed for VHF WBFM
audio broadcast may prove to be already suitable for use in this
system, requiring only replacement of the encoder/modulator/exciter
(in most cases, representing a negligeable proportional investment).

The receiver front end is an RF preamp, phase/amplitude-linear in the
76-108mhz band, and a phase-linear narrow bandwidth filter which is
-2dB or flatter at +/-75khz, -12dB or more at +/-125khz, -40dB at
+/-400khz, and -70dB at +/-800khz, followed by a phase/amplitude-
linear RF AGC.

The detector is 4 6 bit signed flash converters with <250picosecond
windows, one to detect each phase shift, each flashing once every
carrier cycle.  Each is clocked and phase-initialized such that it
flashes when the phase shift to which it has been assigned is expected
to be at zero.  Phase advance/delay is distinguished by the sign bit
output by the converters whose assigned phase shift is not the actual
phase shift of the received signal.  When the symbol is complete, the
clock for each detector is reset so that all four flash at carrier
zero initially.  The four converters output the detected amplitude for
each sample to a single control block.  The control block performs a
statistical analysis on the 4 sample streams to determine the most
likely actual phase shift.  Detection is then complete, and the
detector passes this determination to the shift-to-symbol translator.

The AGC, detector, and shift-to-symbol translator (among other digital
functionalities discussed below, particularly error correction and
decoder logic) are implemented as a single monolithic device, with an
RF input and a small set of serialized versatile digital outputs each
capable of carrying up to a stereo 24/96 (2 * 24 bit sample size *
96KHz sample rate) audio signal.

As fabrication costs come down, the tuning filter and special-purpose
flash converter can be replaced with a simple 5GHz linear-phase
anti-aliasing analog filter, a 10GSPS 150ps 6 bit ADC, and logic to
perform the bandbass filtration and phase detection entirely in the
digital domain.  This would be necessarily implemented in GaAs, and
would allow for more sophisticated carrier processing, effectively
increasing the capture ratio and reducing the raw error rate.
Currently this technique is prohibitively expensive.

An annotation data stream is interleaved, occupying a series of 160
symbols (320 bits) every 8000 symbols (160us every 8ms at 76MHz), with
error correction framing independent from that of the remainder of the
data stream.  Every 30th annotation is a raw synchronization warble
tone (not ECC-encoded) sequencing through the -75 -25 +25 +75 symbol
frequencies 40 times.  An unsynchronized receiver uses this warble
signal to lock on to the phase clock of the broadcast, and the ECC
framing of the rest of the data stream allows synchronized symbol
production to begin after a delay of at most 1/4 second from the time
the frequency is first tuned (provided reception quality is
sufficient).

The aggregate available annotation bandwidth before ECC is 37440 bits
per second at 76MHz, and about 20000 bps after ECC.  Most critically,
the annotation carries repetitive station identification and the
identity and software for the decoder required for channel extraction
and listening.  If the decoder is already cached by the receiver,
audio (and text, etc.) production can begin almost immediately after
symbol synchronization.  If not, a period of up to 30 seconds will be
required so that the decoder can be stored.  Depending on its size,
anywhere from ~20% to ~80% of the annotation bandwidth is consumed by
repeated enumeration of the decoder (80% for 30 seconds sufficing to
relay more than 80 kilobytes of decoder).  Standard annotation fields
include the current error correction parameters for the remainder of
the data stream, the station's operating authority and status,
technical profile (transmitter power, antenna location, height, and
type, and primary and protected service contours dividing the contours
into one-degree segments whose center of mass is identified to 500m
precision), brief freeform description of the types of programming
offered by the station, an emergency alert field divided into
subfields, and the decoder field broken into subfields identifying the
decoder type, the page of the decoder relayed in this frame, and the
actual contents of this page of the decoder.  Further custom keyed
fields may be included, which the decoder can make use of (for
example, current artist/album/song, current DJ's name, etc.).
Information that consumes substantial bandwidth is not put in the
annotation stream, but instead is separately channelized and handled
by the decoder gotten from the annotation stream.

Actual commercialized tuners will have a facility reminiscent of the
channel search facility of modern TV's and VCR's.  The vital stats and
decoders for every receivable station are stored in a bulk, automated
fashion.  Thereafter, the tuner automatically updates the vital stats
and decoder if they have changed, and can be configured to constantly
poll when not in use tuned to a particular station, in order to assure
that stored information is as accurate as possible at all times.

Error correction is implemented with a pipeline consisting of Reed
Soloman (check/correct polynomial) insertion, interleaving, Viterbi
convolutional encoding, and a second round of interleaving.  This is
the de facto standard for the reliable transmission of digital data
over inherently noisy media, and the technique was patented in the
mid-70's by NASA.  ECC frame size is 16 kilobits (8 kilosymbols, 8
milliseconds at 76MHz) corresponding to the length of data between
annotation insertions.  Channelization is by timeslicing with this
granularity - that is, the 8-16 kilobits encoded in a single ECC frame
are divided into contiguous chunks each containing data for a single
channel.  The same pipeline with different operational parameters is
applied to each annotation frame.

At 76MHz, an average ECC bit inflation factor is 1.6.  At 108MHz, it
is 2.0.  Error correction encoding overhead is slidable, so that BER
and net bandwidth can be traded off against eachother.  This allows
any frequency from 76 to 108mhz to carry 16/44 PCM without any
compression at all.  As mentioned earlier, since the inherent error
rate at the higher frequency is higher than that at the lower
frequency, the tradeoff is a reflection of the underlying physics.

A representative channelization scenario is as follows: a stereo
primary program channel derived from 24/96 stereo audio
lossy-compressed at a 4:1 ratio, plus any number of auxiliary channels
(e.g. 3 channels each compressed into a 128kb/s channel, which is an
11:1 compression ratio if the sources are stereo 16/44.1).  Channels
can, instead, be dedicated to wideband information services of
whatever type (weather or disaster instruction maps, legislative
transcripts, stock prices, even enhanced decoder software).  There are
no builtin constraints on the number and width of channels, except
that they must fit into the net bandwidth of the carrier aggregated
once per ECC frame.  An encryption-protected subscriber service can
occupy one or more channels, allowing the renting of data distribution
by third parties.  Presumably the maximum proportion of the carrier
bandwidth which can be consumed by such non-public services will be
strictly regulated by law.

There is absolutely no requirement that the bitrate for a particular
channel be constant, so an enlightened compression engine would allow
degradation indices to be specified for each channel, and shift around
available bandwidth based on leftovers from channels that did not need
all the bandwidth available to them to meet the specified degradation
index.  Depending on content, this can result in the ability to
broadcast 4 stereo programs of "hifi" audio at the same time, for
example.

Correlated program - stereo, binaural, reverberation, etc. - are
passed in a single channel, and the compression engine exploits
correlations between the subchannels.  For example, this can make it
efficient to send the same program in a stereo version and 3 binaural
versions optimized for different HRTF profiles.  Bear in mind that
high-ratio compression schemes are likely to have dire effects on the
illusion of a binaural soundfield, especially if they make no attempt
at phase coherency.


R&D by Tom McEwan at Lawrence Livermore National Laboratory has
resulted in a low cost CAMAC transient sampling module, with a 30ps
sample separation, 60ps rise time, and 2ps jitter.  This device is
designed to be low cost, and has an overwhelming 10 bits of precision
corresponding to a dynamic range of 60dB.  Progress on low-cost
devices such as this makes it clear that the performance requirements
of the detector described above can be met at a low device cost
(perhaps $20 in large quantities) using current-day technology.

Read more about McEwan's sampler at
http://www.llnl.gov/IPandC/op96/10/10j-33g.html


The preceding is Copyright 1997 by Daniel Pouzzner.  You are free to
redistribute or publish this document, provided you do not edit or
abbreviate (beginning with the title line), and provided you include
this licensing statement.  Some of the techniques described may
represent intellectual property worthy of patent protection, for which
Daniel Pouzzner retains sole right to apply.  Others of the techniques
may have been already patented by others, whether or not I have
mentioned this fact.