# On ﬂoat­ing-point arith­metic and IEEE 754

Created: Sun Feb 3 09:39:23 CET 2019

Last mod­i­ﬁed: Sun Feb 3 23:50:31 CET 2019

I’m learn­ing about ﬂoat­ing-point com­pu­ta­tions.

I wrote both μ and ν with­out ac­cel­er­a­tors for the sake of sim­plic­ity.

``````(define (μ k a)
(define (helper a n)
(if (null? a)
0
(+ (/ (car a) (expt k n))
(helper (cdr a) (+ n 1)))))
(helper a 1))``````

In case you’ve ever won­dered how to en­code `1/10` in bi­nary.

``````(μ 2 '(0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 1))
; => 0.10000000149011612``````

Regarding IEEE 754 stan­dard for rep­re­sent­ing ﬂoat­ing point num­bers, we’d have an ex­po­nent of `-4`, en­coded as `123`, and the man­tissa `0.6`.

``````(define (ν k a)
(define (helper a n)
(if (null? a)
0
(+ (* (car a) (expt k n))
(helper (cdr a) (- n 1)))))
(helper a (- (length a) 1)))``````

ν does the same as μ with nat­u­rals in­stead of pos­i­tive real num­bers. For ex­am­ple, `(ν 2 '(1 1 0)) => 6`, `(ν 10 '(1 1 0)) => 110` and `(ν 16 '(15 15)) => 255`.

We can use those two func­tions to eval­u­ate any en­coded fp to a nu­mer­i­cal constant.

We can test this with the num­ber 0.085, which can be en­coded as fol­lows.

``````Sign Exponent  Mantissa
0    123 (-4)  0.36
0    01111011  01011100001010001111011``````

We’ll just read the steps in re­verse or­der.

• ap­ply `μ 2` to the man­tissa and add `1`, save as `a`
• sub­stract `127` to the ex­po­nent, save as `b`
• cal­cu­late `2b`, save as `c`
• mul­ti­ply both `a` and `c`
``````(let ((exponent '(0 1 1 1 1 0 1 1))
(mantissa '(0 1 0 1 1 1 0 0 0 0 1 0 1 0 0 0 1 1 1 1 0 1 1)))
(exact->inexact
(* (expt 2 (- (ν 2 exponent) 127))
(+ 1 (μ 2 mantissa)))))
; => 0.08500000089406967``````

Of course, you’d have to mul­ti­ply the whole re­sult by `-1` had the sign bit been set.

The same process can be ap­plied with the dou­ble pre­ci­sion form us­ing a bias of 1023 instead of `127`.

``````(let ((exponent '(0 1 1 1 1 1 1 1 0 1 1))
(mantissa '(1 0 0 1 1 0 0 1 1 0 0 1 1
0 0 1 1 0 0 1 1 0 0 1 1 0
0 1 1 0 0 1 1 0 0 1 1 0 0
1 1 0 0 1 1 0 0 1 1 0 1 0)))
(exact->inexact
(* (expt 2 (- (ν 2 exponent) 1023))
(+ 1 (μ 2 mantissa)))))
; => 0.1``````

Rounding er­rors

How come `0.1` has no ex­act rep­re­sen­ta­tion ?

The num­ber we ap­prox­i­mate `0.1` with is of the form, `x/y` where `y` is a power of two.

We have 10-1 = xy-1. It fol­lows that y = 10x (cross prod­uct) ≡ 2·2·…·2 = 10x. Because this equa­tion has no solution what­so­ever, we know that we can never rep­re­sent 0.1 in a base-2 system. Had the de­nom­i­na­tor been a power of two, we would have got ourselves an ex­act rep­re­sen­ta­tion.

source code