In Mathematical Theory of Evidence Glenn Shafer talked about how Dempster’s rule of combination generalizes Bayesian conditioning. In this document we investigate numerically how a simple Bayesian model can be encoded into the language of belief function.

Recall the Bayes Rule of conditioning in simple terms:

\[P(H|E) = \dfrac{P(H) \cdot P(E|H)} {P(E)}\] Let’s see how this is translated in the belief functions setup.

In particular, the Bayesian belief functions concentrates their masses on the singletons only, unlike more general basic mass assignment functions. For instance, in a frame \(\Theta=\{a,b,c\}\), basic mass assignment \(m(\{a\})=0.2\), \(m(\{b\})=0.3\) and \(m(\{c\})=0.5\) defines a Bayesian belief function.

In the Bayesian language, this is the prior distribution \(P(H)\). Function *bca* is used to
set the distribution of *H*.

`## The prior distribution H`

```
## H specnb mass
## 1 a 1 0.2
## 2 b 2 0.3
## 3 c 3 0.5
```

Bayes_Rule.R The law of conditional probability is a special case of
Dempster’s rule of combination that all the masses focus on the event is
conditioned. For instance, basic mass assignment focuses all the masses
on subset \(E =\{b,c\}\). Hence, using
function *bca*, we set \(m(\{b,c\})=1\).

`## Setting an Event E = {b,c} with mass = 1`

```
## Event specnb mass
## 1 b + c 4 1
```

Bayes_Rule.R

Now we set the computation of Bayes’s Theorem in motion.

In a first step, we use function *dsrwon* to combine our two
basic mass assignments H and Event. The non-normalized Dempster Rule of
combination gives a mass distribution *H_Event* composed of two
parts:

- the distribution of the product \(P(H) \cdot P(E|H)\) on \(\Theta\);
- a mass allotted to the empty set \(m(\varnothing)\).

`## The combination of H and Event E`

```
## H_Event specnb mass
## 1 ø 1 0.2
## 2 b 3 0.3
## 3 c 4 0.5
```

Bayes_Rule.R It turns out that we can obtain the marginal \(P(E)\) from \(m(\varnothing)\): \[P(E) = 1 - m(\varnothing)\].

Hence, \(P(E)\) is nothing else than the normalization constant of Dempster’s rule of combination.

In our second step of computation we us function *nzdsr*, to
apply the normalization constant to distribution *H_Event*, which
gives the posterior distribution \(P(H|E)\)

`## The posterior distribution P(H|E)`

```
## H_given_E specnb mass
## 1 b 2 0.375
## 2 c 3 0.625
```

Bayes_Rule.R

Note that since *H_given_E* is defined only on singletons and
the mass allocated to \(\Theta\) is
zero. Hence \(bel(\cdot) = P(\cdot) =
Pl(\cdot)\), as shown by the following table.

```
## bel disbel unc plau rplau
## a 0.000 1.000 0 0.000 0.000
## b 0.375 0.625 0 0.375 0.600
## c 0.625 0.375 0 0.625 1.667
## frame 1.000 0.000 0 1.000 Inf
```

Bayes_Rule.R

In the first example, the conditioning event was a subset of the
frame \(\Theta\) of variable
*H*. We now show the computation of Bayes’s rule of conditioning
by Dempster’s Rule in the case of two variables.

Let’s say we have the variable H defined on \(\Theta = \{a, b, c\}\) as before.

`## The prior distribution`

```
## X specnb mass
## 1 a 1 0.2
## 2 b 2 0.3
## 3 c 3 0.5
```

Bayes_Rule.R let’s add a second variable E with three outcomes \(\Lambda =\{d, e, f\}\) .

\(P(\{d|a\})=0.1\), \(P(\{d|b\})=0.2\) and \(P(\{d|c\})=0.7\).

This distribution will be encoded in the product space \(\Theta \times \Lambda\) by setting

\(m(\{a,d\}) = 0.1\); \(m(\{b,d\}) = 0.2\); \(m(\{c,d\}) = 0.7\)

We now do this using function *bcaRel*.

`## Specify information on variables, description matrix and mass vector`

`## Identifying variables and frames`

```
## varnb size
## [1,] 1 3
## [2,] 4 3
```

`## Note that variables numbers must be in increasing order`

`## The description matrix of the relation between X and E`

```
## a b c d e f
## [1,] 1 0 0 1 0 0
## [2,] 0 1 0 1 0 0
## [3,] 0 0 1 1 0 0
## [4,] 1 1 1 1 1 1
```

`## Note Columns of matrix must follow variables ordering.`

`## Mass specifications`

```
## specnb mass
## [1,] 1 0.1
## [2,] 2 0.2
## [3,] 3 0.7
## [4,] 4 0.0
```

`## The relation between Evidence E and X`

```
## rel_EX specnb mass
## 1 a d 1 0.1
## 2 b d 2 0.2
## 3 c d 3 0.7
```

Bayes_Rule.R

Now we combine Prior \(P(X)\) with
rel_EX. But first, we need to extent *X* to the space \(\Theta \times \Lambda\).

`## Prior X extended in product space of (X,E`

```
## X_xtnd specnb mass
## 1 a d + a e + a f 1 0.2
## 2 b d + b e + b f 2 0.3
## 3 c d + c e + c f 3 0.5
```

Bayes_Rule.R Combine X extended and E_X in the product space \(\Theta \times \Lambda\).

`## Mass distribution of the combination of X extended and E_X`

```
## comb_X_EX specnb mass
## 1 ø 1 0.57
## 2 a d 2 0.02
## 3 b d 3 0.06
## 4 c d 4 0.35
```

Bayes_Rule.R As we can see, we have

the distribution of the product \(P(H) \cdot P(E|H)\) on \(\Theta \times \Lambda\);

a mass allotted to the empty set \(m(\varnothing)\), which is \(1 - P(E)\).

Using function *nzdsr*, we apply the normalization constant to
obtain the desired result. Then, using function *elim*, we obtain
the marginal of X, which turns out to be \(P(X
| E = d)\)

`## The normalized mass distribution of the combination of X extended and E_X`

```
## norm_comb_X_EX specnb mass
## 1 a d 1 0.0465116279069768
## 2 b d 2 0.13953488372093
## 3 c d 3 0.813953488372093
```

`## The posterior distribution P(X|E) for (a,d), (b,d), (c,d), after eliminating variable E`

```
## dist_XgE specnb mass
## 1 a 1 0.0465116279069768
## 2 b 2 0.13953488372093
## 3 c 3 0.813953488372093
```

Bayes_Rule.R