information theory & coding CHAPTER TWO

Lecturer

PROF. DR. MAHMOOD 2018-01-06 20
Chapter Two
1- Channel:
In telecommunications and computer networking, a communication channel
or channel, refers either to a physical transmission medium such as a wire, or to
a logical connection over a multiplexed medium such as a radio channel. A channel is
used to convey an information signal, for example a digital bit stream, from one or
several senders (or transmitters) to one or several receivers. A channel has a certain
capacity for transmitting information, often measured by its bandwidth in Hz or its data
rate in bits per second.
2- Symmetric channel:
The symmetric channel have the following condition:
a- Equal number of symbol in X&Y, i.e. P(Y∣X) is a square matrix.
b- Any row in P(Y∣X) matrix comes from some permutation of other rows.
For example the following conditional probability of various channel types as shown:

a- 𝑃( 𝑌 ∣ 𝑋 ) = [	] is a BSC, because it is square matrix and 1st row is the
0 .9	0 .1

0.1 0.9permutation of 2nd row.
b- 𝑃( 𝑌 ∣ 𝑋 ) = [0 00..05 05 .9 0 00..05 05 .9 0 00..05 05 .9 ] is TSC, because it is square matrix and each
row is a permutation of others.

c- 𝑃( 𝑌 ∣ 𝑋 ) = [	] is a non-symmetric since since it is not square
0 .8	0 .1	0 .1

0.1 0.8 0.1although each row is permutation of others.

d- 𝑃( 𝑌 ∣ 𝑋 ) = [0 0 ..8 1

0 0 ..1 7

0 0 ..1 2 ] is a non-symmetric although it is square since 2nd

0.1 0.1 0.8row is not permutation of other rows.
PROF. DR. MAHMOOD 2018-01-06 21
2.1- Binary symmetric channel (BSC)
It is a common communications channel model used in coding theory and information
theory. In this model, a transmitter wishes to send a bit (a zero or a one), and the receiver
receives a bit. It is assumed that the bit is usually transmitted correctly, but that it will
be "flipped" with a small probability (the "crossover probability").
A binary symmetric channel with crossover probability p denoted by BSCp, is a
channel with binary input and binary output and probability of error p; that is, if X is the
transmitted random variable and Y the received variable, then the channel is
characterized by the conditional probabilities:
Pr( 𝑌 = 0 ∣ 𝑋 = 0 ) = 1 - 𝑃
Pr( 𝑌 = 0 ∣ 𝑋 = 1 ) = 𝑃
Pr( 𝑌 = 1 ∣ 𝑋 = 0 ) = 𝑃
Pr( 𝑌 = 1 ∣ 𝑋 = 1 ) = 1 - 𝑃
2.2- Ternary symmetric channel (TSC):
The transitional probability of TSC is:
𝑃( 𝑌 ∣ 𝑋 ) =
𝑥1
𝑥2
𝑥3 [

1𝑃𝑦-𝑒1 2𝑃𝑒	1-𝑃𝑦𝑒22𝑃𝑒	𝑦𝑃 𝑃3𝑒 𝑒
𝑃𝑒	𝑃𝑒	1- 2𝑃𝑒

]
The TSC is symmetric but not very practical since practically 𝑥1 and 𝑥3 are not affected
so much as 𝑥2. In fact the interference between 𝑥1 and 𝑥3 is much less than the
interference between 𝑥1 and 𝑥2 or 𝑥2 and 𝑥3.
1 1
0
1-P
P P
1-P
0
PROF. DR. MAHMOOD 2018-01-06 22
1-2Pe
Hence the more practice but nonsymmetric channel has the trans. prob.
𝑃( 𝑌 ∣ 𝑋 ) =
𝑥1
𝑥2
𝑥3 [1𝑃𝑦0𝑒1- 𝑃𝑒 1𝑃𝑒-𝑃𝑦𝑒221𝑃𝑒- 𝑃𝑦 0𝑃𝑒3𝑒]
Where 𝑥1 interfere with 𝑥2 exactly the same as interference between 𝑥2 and 𝑥3, but 𝑥1
and 𝑥3 are not interfere.
3- Discrete Memoryless Channel:
The Discrete Memoryless Channel (DMC) has an input X and an output Y. At any given
time (t), the channel output Y= y only depends on the input X = x at that time (t) and it
does not depend on the past history of the input. DMC is represented by the conditional
probability of the output Y = y given the input X = x, or P(YX).
1-2Pe

1-2Pe

Pe
Pe
Y1
Y2
X3 Y3
X2
1-2Pe
Pe
Pe
1-Pe
X2

1-Pe

X3
Y2
Y3
Y1
X

Channel
P(YX)

PROF. DR. MAHMOOD 2018-01-06 23
4- Binary Erasure Channel (BEC):
The Binary Erasure Channel (BEC) model are widely used to represent channels or links
that “losses” data. Prime examples of such channels are Internet links and routes. A
BEC channel has a binary input X and a ternary output Y.
Note that for the BEC, the probability of “bit error” is zero. In other words, the
following conditional probabilities hold for any BEC model:
Pr( 𝑌 = "𝑒𝑟𝑎𝑠𝑢𝑟𝑒" ∣ 𝑋 = 0 ) = 𝑃
Pr( 𝑌 = "𝑒𝑟𝑎𝑠𝑢𝑟𝑒" ∣ 𝑋 = 1 ) = 𝑃
Pr( 𝑌 = 0 ∣ 𝑋 = 0 ) = 1 - 𝑃
Pr( 𝑌 = 1 ∣ 𝑋 = 1 ) = 1 - 𝑃
Pr( 𝑌 = 0 ∣ 𝑋 = 1 ) = 0
Pr( 𝑌 = 1 ∣ 𝑋 = 0 ) = 0
5- Special Channels:
a- Lossless channel: It has only one nonzero element in each column of the
transitional matrix P(Y∣X).
𝑃( 𝑌 ∣ 𝑋 ) =
𝑥1
𝑥2
𝑥3 [30/04𝑦1 0 10𝑦/24 1/𝑦3030 𝑦240/03 𝑦5100 ]
This channel has H(X∣Y)=0 and I(X, Y)=H(X) with zero losses entropy.
Pe
1-Pe

1-Pe
X1	Y1

X2
Erasure
Y2

PROF. DR. MAHMOOD 2018-01-06 24
b- Deterministic channel: It has only one nonzero element in each row, the
transitional matrix P(Y∣X), as an example:
𝑃( 𝑌 ∣ 𝑋 ) =
𝑥1
𝑥2
𝑥3
[
𝑦1 𝑦2 𝑦3
1 0 0
1 0 0
0 0 1
0 1 0
0 1 0 ]
This channel has H(Y∣X)=0 and I(Y, X)=H(Y) with zero noisy entropy.
c- Ideal channel: It has only one nonzero element in each row and column, the
transitional matrix P(Y∣X), i.e. it is an identity matrix, as an example:

𝑃( 𝑌 ∣ 𝑋 ) = 𝑥2	[
0	1	0

𝑥1
𝑦1	𝑦2	𝑦3
1	0	0

𝑥3 001]
This channel has H(Y∣X)= H(X∣Y)=0 and I(Y, X)=H(Y)=H(X).
d- Noisy channel: No relation between input and output:
H( X ∣ Y ) = 𝐻(𝑌), H( Y ∣ X ) = 𝐻(𝑋)
𝐼(𝑋, 𝑌) = 0, 𝐶 = 0
𝐻(𝑋, 𝑌) = 𝐻(𝑋) + 𝐻(𝑌)
6- Shannon’s theorem:
a- A given communication system has a maximum rate of information C known as
the channel capacity.
b- If the information rate R is less than C, then one can approach arbitrarily small
error probabilities by using intelligent coding techniques.
c- To get lower error probabilities, the encoder has to work on longer blocks of
signal data. This entails longer delays and higher computational requirements.
Thus, if R ≤ C then transmission may be accomplished without error in the presence
of noise. The negation of this theorem is also true: if R > C, then errors cannot be
avoided regardless of the coding technique used.
PROF. DR. MAHMOOD 2018-01-06 25
Consider a bandlimited Gaussian channel operating in the presence of additive
Gaussian noise:
The Shannon-Hartley theorem states that the channel capacity is given by:
𝐶 = 𝐵𝑙𝑜𝑔2 (1 + 𝑁𝑆 )
Where C is the capacity in bits per second, B is the bandwidth of the channel in Hertz,
and S/N is the signal-to-noise ratio.
7- Channel Capacity (Discrete channel)
This is defined as the maximum of I(X,Y):
𝐶 = 𝑐ℎ𝑎𝑛𝑛𝑒𝑙 𝑐𝑎𝑝𝑎𝑐𝑖𝑡𝑦 = max[𝐼(𝑋, 𝑌)] 𝑏𝑖𝑡𝑠/𝑠𝑦𝑚𝑏𝑜𝑙
Physically it is the maximum amount of information each symbol can carry to the
receiver. Sometimes this capacity is also expressed in bits/sec if related to the rate of
producing symbols r:
𝑅(𝑋, 𝑌) = 𝑟 × 𝐼(𝑋, 𝑌) 𝑏𝑖𝑡𝑠/ sec 𝑜𝑟 𝑅(𝑋, 𝑌) = 𝐼(𝑋, 𝑌)/ 𝜏̅
a- Channel capacity of Symmetric channels:
The channel capacity is defined as max[𝐼(𝑋, 𝑌)]:
𝐼(𝑋, 𝑌) = 𝐻(𝑌) - 𝐻( 𝑌 ∣ 𝑋 )
𝐼(𝑋, 𝑌) = 𝐻(𝑌) + ∑ ∑ 𝑃(𝑥𝑖, 𝑦𝑗)
𝑛
𝑖=1
𝑚
𝑗=1
log2𝑃(𝑦𝑗 ∣ 𝑥𝑖)
But we have
PROF. DR. MAHMOOD 2018-01-06 26
𝑃(𝑥𝑖, 𝑦𝑗) = 𝑃(𝑥𝑖)𝑃(𝑦𝑗 ∣ 𝑥𝑖) 𝑝𝑢𝑡 𝑖𝑛 𝑎𝑏𝑜𝑣𝑒 𝑒𝑞𝑢𝑎𝑡𝑖𝑜𝑛 𝑦𝑖𝑒𝑙𝑑𝑒𝑠:
𝐼(𝑋, 𝑌) = 𝐻(𝑌) + ∑ ∑ 𝑃(𝑥𝑖)𝑃(𝑦𝑗 ∣ 𝑥𝑖)
𝑛
𝑖=1
𝑚
𝑗=1
log2𝑃(𝑦𝑗 ∣ 𝑥𝑖)
If the channel is symmetric the quantity:
∑ 𝑃(𝑦𝑗 ∣ 𝑥𝑖)log2𝑃(𝑦𝑗 ∣ 𝑥𝑖) = 𝐾
𝑚
𝑗=1
Where K is constant and independent of the row number i so that the equation
becomes:
𝐼(𝑋, 𝑌) = 𝐻(𝑌) + 𝐾 ∑ 𝑃(𝑥𝑖)
𝑛
𝑖=1
Hence 𝐼(𝑋, 𝑌) = 𝐻(𝑌) + 𝐾 for symmetric channels
Max of 𝐼(𝑋, 𝑌) = max[𝐻(𝑌) + 𝐾] = max[𝐻(𝑌)] + 𝐾
When Y has equiprobable symbols then max[𝐻(𝑌)] = 𝑙𝑜𝑔2𝑚
Then
𝐼(𝑋, 𝑌) = 𝑙𝑜𝑔2𝑚 + 𝐾
Or
𝐶 = 𝑙𝑜𝑔2𝑚 + 𝐾
Example 9:
For the BSC shown:
Find the channel capacity and efficiency if 𝐼(𝑥1) = 2𝑏𝑖𝑡𝑠
Solution:

𝑃( 𝑌 ∣ 𝑋 ) = [	]
0 .7	0 .3

0.3 0.70.7

0.7

Y1
X2 Y2

PROF. DR. MAHMOOD 2018-01-06 27
Since the channel is symmetric then
𝐶 = 𝑙𝑜𝑔2𝑚 + 𝐾 and 𝑛 = 𝑚
𝑤ℎ𝑒𝑟𝑒 𝑛 𝑎𝑛𝑑 𝑚 𝑎𝑟𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑟𝑜𝑤 𝑎𝑛𝑑 𝑐𝑜𝑙𝑢𝑚𝑛 𝑟𝑒𝑝𝑒𝑠𝑡𝑖𝑣𝑒𝑙𝑦
𝐾 = 0.7𝑙𝑜𝑔20.7 + 0.3𝑙𝑜𝑔20.3 = -0.88129
𝐶 = 1 - 0.88129 = 0.1187 𝑏𝑖𝑡𝑠/𝑠𝑦𝑚𝑏𝑜𝑙
The channel efficiency 𝜂 = 𝐼(𝑋,𝑌)
𝐶
𝐼(𝑥1) = -𝑙𝑜𝑔2𝑃(𝑥1) = 2
𝑃(𝑥1) = 2-2 = 0.25 𝑡ℎ𝑒𝑛 𝑃(𝑋) = [0.25 0.75]𝑇
And we have 𝑃(𝑥𝑖, 𝑦𝑗) = 𝑃(𝑥𝑖)𝑃(𝑦𝑗 ∣ 𝑥𝑖) so that

𝑃(𝑋, 𝑌) = [	]=[
0 .7 × 0 .25	0 .3 × 0 .25

0.3 × 0.75 0.7 × 0.750 0..175 225 0 0..075 525]
𝑃(𝑌) = [0.4 0.6] → 𝐻(𝑌) = 0.97095 𝑏𝑖𝑡𝑠/𝑠𝑦𝑚𝑏𝑜𝑙
𝐼(𝑋, 𝑌) = 𝐻(𝑌) + 𝐾 = 0.97095 - 0.88129 = 0.0896 𝑏𝑖𝑡𝑠/𝑠𝑦𝑚𝑏𝑜𝑙
Then 𝜂 = 0.0896
0.1187
= 75.6%
To find the channel redundancy:
𝑅 = 1 - 𝜂 = 1 - 0.756 = 0.244 𝑜𝑟 24.4%
1- Channel capacity of nonsymmetric channels:
We can find the channel capacity of nonsymmetric channel by the following
steps:
a- Find I(X, Y) as a function of input prob:
𝐼(𝑋, 𝑌) = 𝑓(𝑃(𝑥1), 𝑃(𝑥2) … … … , 𝑃(𝑥𝑛))
And use the constraint to reduce the number of variable by 1.
b- Partial differentiate I(X, Y) with respect to (n-1) input prob., then equate
these partial derivatives to zero.
c- Solve the (n-1) equations simultaneously then find
𝑃(𝑥1), 𝑃(𝑥2) … . . . . , 𝑃(𝑥𝑛) that gives maximum I(X, Y).
PROF. DR. MAHMOOD 2018-01-06 28
d- Put resulted values of input prob. in the function given in step 1 to find 𝐶 =
max[𝐼(𝑋, 𝑌)].
Example 10:
Find the channel capacity for the channel having the following transition:

𝑃( 𝑌 ∣ 𝑋 ) = [	]
0 .7	0 .3

0.1 0.9Assume that 𝑃(𝑥1) = 𝑃 ≅ 0.862
Solution: First not that the channel is not symmetric since the 1st row is not
permutation of 2nd row.
a- Let 𝑃(𝑥1) = 0.862 , 𝑡ℎ𝑒𝑛 𝑃(𝑥2) = 1 - 𝑃 = 0.138.
𝑃(𝑋, 𝑌) = 𝑃(𝑋) × 𝑃( 𝑌 ∣ 𝑋 )

∴	𝑃(𝑋, 𝑌) = [	]
0.1(1-0.862)	0.9(1-0.862)
0.7×0.862	0.3×0.862

= [00..0138 6034 00..2586 1242]
From above results 𝑃(𝑌) = [0.6172 0.3828]
𝐻(𝑌) = - ∑ 𝑃(𝑦𝑗)
𝑚
𝑗=1
log2𝑃(𝑦𝑗)
∴ 𝐻(𝑌) = 0.960 𝑏𝑖𝑡𝑠/𝑠𝑦𝑚𝑏𝑜𝑙
We have
𝐻(𝑌 ∣ 𝑋) = - ∑ ∑ 𝑃(𝑥𝑖, 𝑦𝑗)
𝑛
𝑖=1
𝑚
𝑗=1
log2𝑃(𝑦𝑗 ∣ 𝑥𝑖)
𝐻(𝑌 ∣ 𝑋) = -[0.6034 𝑙𝑛0.7 + 0.2586𝑙𝑛0.3 + 0.0138𝑙𝑛0.1 + 0.1242𝑙𝑛0.9]/𝑙𝑛
PROF. DR. MAHMOOD 2018-01-06 29
H( Y ∣ X ) = 0.8244 bits/symbol
𝐶 = max[𝐼(𝑋, 𝑌)] = 𝐻(𝑌) - H( Y ∣ X )
= 0.96021 - 0.8244 = 0.1358 𝑏𝑖𝑡𝑠/𝑠𝑦𝑚𝑏𝑜𝑙.
Review questions:
A binary source sending 𝑥1 with a probability of 0.4 and 𝑥2 with 0.6 probability
through a channel with a probabilities of errors of 0.1 for 𝑥1 and 0.2 for 𝑥2.Determine:
1- Source entropy.
2- Marginal entropy.
3- Joint entropy.
4- Conditional entropy 𝐻(𝑌𝑋).
5- Losses entropy 𝐻(𝑋𝑌).
6- Transinformation.
Solution:
1- The channel diagram:
Or 𝑃(𝑌X) = [0 0..9 2 0 0..1 8]
𝐻(𝑋) = - ∑ 𝑝(𝑥𝑖)
𝑛
𝑖=1
𝑙𝑜𝑔2𝑝(𝑥𝑖)
𝐻(𝑋) = -
[0.4 ln(0.4) + 0.6 ln(0.6)]
𝑙𝑛2 = 0.971
𝑏𝑖𝑡𝑠
𝑠𝑦𝑚𝑏𝑜𝑙
2- 𝑃(𝑋, 𝑌) = 𝑃(𝑌X) × 𝑃(𝑋)
∴ 𝑃(𝑋, 𝑌) = [0 0..9 2 × × 0 0..4 6 0 0..1 8 × × 0 0..4 6] = [0 0..36 12 0 0..04 48]
∴ 𝑃(𝑌) = [0.48 0.52]
0.9
0.8
0.1
0.2
0.6
0.4 𝑥1
𝑥2
𝑦1
𝑦2
PROF. DR. MAHMOOD 2018-01-06 30
𝐻(𝑌) = - ∑ 𝑝(𝑦𝑗)
𝑚
𝑗=1
𝑙𝑜𝑔2𝑝(𝑦𝑗)
𝐻(𝑌) = -
[0.48 ln(0.48) + 0.52 ln(0.52)]
ln(2) = 0.999 𝑏𝑖𝑡𝑠/𝑠𝑦𝑚𝑏𝑜𝑙
3- 𝐻(𝑋, 𝑌)
𝐻(𝑋, 𝑌) = - ∑ ∑ 𝑃(𝑥𝑖, 𝑦𝑗)
𝑛
𝑖=1
𝑚
𝑗=1
log2𝑃(𝑥𝑖, 𝑦𝑗)
𝐻(𝑋, 𝑌) = -
[0.36 ln(0.36) + 0.04 ln(0.04) + 0.12 ln(0.12) + 0.48 ln(0.48)]
ln(2)
= 1.592 𝑏𝑖𝑡𝑠/𝑠𝑦𝑚𝑏𝑜𝑙
4- 𝐻(𝑌X)
𝐻(𝑌X) == - ∑ ∑ 𝑃(𝑥𝑖, 𝑦𝑗)
𝑛
𝑖=1
𝑚
𝑗=1
log2𝑃(𝑦𝑗𝑥𝑖)
𝐻(𝑌X) = - [0.36 ln(0.9) + 0.12 ln(0.2ln ) +(20).04 ln(0.1) + 0.48 ln(0.8)]
= 0.621
𝑏𝑖𝑡𝑠
𝑠𝑦𝑚𝑏𝑜𝑙
Or 𝐻(𝑌X) = 𝐻(𝑋, 𝑌) - 𝐻(𝑋) = 1.592 - 0.971 = 0.621 𝑏𝑖𝑡𝑠
𝑠𝑦𝑚𝑏𝑜𝑙
5- 𝐻(𝑋Y) = 𝐻(𝑋, 𝑌) - 𝐻(𝑌) = 1.592 - 0.999 = 0.593 𝑏𝑖𝑡𝑠/𝑠𝑦𝑚𝑏𝑜𝑙
6- 𝐼(𝑋, 𝑌) = 𝐻(𝑋) - 𝐻(𝑋Y) = 0.971 - 0.593 = 0.378 bits/symbol
2- Cascading of Channels
If two channels are cascaded, then the overall transition matrix is the product of the two
transition matrices.
p(z / x)  p(y / x).p(z / y)
matrix
(n k)
matrix
(n m)
matrix
(m k)

Channel 1

. .

Channel 1

. .

1 m
. .
1 n
1 k
PROF. DR. MAHMOOD 2018-01-06 31
For the series information channel, the overall channel capacity is not exceed any of
each channel individually.
𝐼(𝑋, 𝑍) ≤ 𝐼(𝑋, 𝑌) & 𝐼(𝑋, 𝑍) ≤ 𝐼(𝑌, 𝑍)
Example:
Find the transition matrix p(z / x) for the cascaded channel shown.



0.3 0 0.7
0.8 0.2 0
p(y / x) ,



1 0
1 0
0.7 0.3
p(z / y)








0.91 0.09
0.76 0.24
1 0
1 0
0.7 0.3
0.3 0 0.7
0.8 0.2 0
p(z / x)
0.7
0.8
0.3
0.2
0.7
0.3
1
1

Material File

Chapter Two.pdf