<P> This differential equation leads to the solution I (u) = k log ⁡ u (\ displaystyle I (u) = k \ log u) for any k ∈ R (\ displaystyle k \ in \ mathbb (R)). Condition 2 . leads to k <0 (\ displaystyle k <0) and especially, k (\ displaystyle k) can be chosen on the form k = − 1 / log ⁡ x (\ displaystyle k = - 1 / \ log x) with x> 1 (\ displaystyle x> 1), which is equivalent to choosing a specific base for the logarithm . The different units of information (bits for log, nats for the natural logarithm ln, bans for log and so on) are just constant multiples of each other . For instance, in case of a fair coin toss, heads provides log (2) = 1 bit of information, which is approximately 0.693 nats or 0.301 decimal digits . Because of additivity, n tosses provide n bits of information, which is approximately 0.693n nats or 0.301n decimal digits . </P> <P> Now, suppose we have a distribution where event i can happen with probability p . Suppose we have sampled it N times and outcome i was, accordingly, seen n = N p times . The total amount of information we have received is </P> <Dl> <Dd> ∑ i n i I (p i) = − ∑ i N p i log ⁡ p i (\ displaystyle \ sum _ (i) (n_ (i) \ mathrm (I) (p_ (i))) = - \ sum _ (i) (Np_ (i) \ log (p_ (i)))). </Dd> </Dl> <Dd> ∑ i n i I (p i) = − ∑ i N p i log ⁡ p i (\ displaystyle \ sum _ (i) (n_ (i) \ mathrm (I) (p_ (i))) = - \ sum _ (i) (Np_ (i) \ log (p_ (i)))). </Dd>

For m equally likely messages the average amount of information h is