### Datasheet4U offers most rated semiconductors datasheets pdf

 Microchip Technology Semiconductor Electronic Components Datasheet

# AN575 Datasheet

### IEEE 754 Compliant Floating-Point Routines

 No Preview Available !
www.DataSheet.co.kr
IEEE 754 Compliant Floating-Point Routines
AN575
IEEE 754 Compliant Floating-Point Routines
Author: Frank Testa
INTRODUCTION
This application note presents an implementation of
the following floating point math routines for the
PIC16/17 microcontroller family:
INTxx( A )
float to integer conversion
FLOxx( A )
integer to float conversion
NRMxx( A )
normalize
FPAxx( A , B ) add/subtract
FPMxx( A , B ) multiply
FPDxx( A , B ) divide
Routines for the PIC16/17 families are provided in a
modified IEEE754 32 bit format together with versions in
24-bit truncated format.
FLOATING POINT ARITHMETIC
Although fixed point arithmetic can usually be employed
in many numerical problems through the use of proper
scaling techniques, this approach can become compli-
cated and sometimes result in less efficient code than is
possible using floating point methods[1]. Floating point
arithmetic is essentially equivalent to arithmetic in scien-
tific notation relative to a particular base or radix. In the
special case of base two or binary arithmetic, a number
A has the floating point representation:
n
A = f * (2e) , f =
{ a(k) * (2-k) }
k=1
where f is the fraction or mantissa, e is the exponent or
characteristic, n is the number of bits in f and a(k),
k=1,..,n are the bit values of f with a(1)=MSB. Typically,
the mantissa is in normalized sign-magnitude represen-
tation with implicit MSB equal to one, and e is stored in
biased form, where the bias is the magnitude of the most
negative possible exponent[1,2]. If m is the number of bits
in the biased exponent eb:
eb = e + 2m-1
Using biased exponents permits comparison of expo-
nents through a simple unsigned comparator, and fur-
ther results in a unique representation of zero given by
f = eb = 0.
Algorithms for radix conversion are discussed in
APPENDIX A, and can be used to produce the binary
floating point representation of a given decimal number.
Examples of sign-magnitude floating point representa-
tions of some decimal numbers are as follows:
Decimal
1.0
0.15625
0.1
1.23x103
ef
1 .10000000
-2 .10100000
-3 .110011001100....
11 .10011001110
It is important to note that the only numbers that can be
represented exactly in binary arithmetic are those which
are sums of powers of two, resulting in non-terminating
binary representations of some simple decimal numbers
such as .1 as shown above, and leading to truncation
errors regardless of the value of n. Floating point
calculations, even involving numbers admitting an exact
binary representation, usually lose information after
truncation to an n bit result, and therefore require some
rounding scheme to minimize such roundoff errors[1].
5
5-11
DS00575A-page 1
Datasheet pdf - http://www.DataSheet4U.net/

 Microchip Technology Semiconductor Electronic Components Datasheet

# AN575 Datasheet

### IEEE 754 Compliant Floating-Point Routines

 No Preview Available !
www.DataSheet.co.kr
IEEE 754 Compliant Floating-Point Routines
ROUNDING METHODS
Truncation of a binary representation to n bits is severely
biased since it always leads to a number whose magni-
tude is less than or equal to that of the exact value,
thereby possibly causing significant error buildup during
a long sequence of calculations. Simple adder-based
rounding by adding the NSB to the LSB is unbiased
except when the value to be rounded is equidistant from
the two nearest n bit values[1]. In this case, magnitudes
are always rounded up thereby producing a small but still
undesirable bias. This can be removed by stipulating
that in the equidistant case, the n bit value with LSB=0
is selected, commonly referred to as the rounding to the
nearest method, the default mode in the IEEE754 stan-
dard[4,5]. The number of guard bits or extra bits of
precision, is related to the sensitivity of the rounding
method since using more guard bits results in fewer
equidistant cases to be resolved. Since more than one
guard bit requires an extra byte in PIC16/17 arithmetic,
only one guard bit, usually handled in the carry bit, is
employed in this library of floating point routines. Near-
est neighbor rounding with one guard bit leads to the
following simple result:
n Bit Value Guard Bit
A0
A1
A+1 0
Result
round to A
if A,LSB=0, round to A
if A,LSB=1, round to A+1
round to A+1
Another interesting rounding method, is Von Neumann
rounding or jamming, where the exact number is trun-
cated to n bits and then set LSB=1. Although the errors
can be twice as large as in round to the nearest, it is
unbiased and requires little more effort than trunca-
tion[1].
FLOATING POINT FORMATS
In what follows, we use the following floating point
formats:
point
f1 f2
IEEE754 xxxxxxxx
32-bit
truncated xxxxxxxx
24-bit
.
.
Sxxxxxxx xxxxxxxx xxxxxxxx
Sxxxxxxx xxxxxxxx
where eb is the biased 8-bit exponent, with bias=27=128=
0x80, S is the sign bit, and bytes f0, f1 and f2 constitute
the mantissa with f0 the most significant byte with implicit
MSB = 1. It is important to note that the IEEE754
standard format[4] places the sign bit as the MSB of eb
with the LSB of the exponent as the MSB of f0. Because
of the inherent byte structure of the PIC16/17 family of
microcontrollers, more efficient code was possible by
to the IEEE standard.
The limiting absolute values of the above floating point
formats are given as follows:
|A|
32-bit format eb e
f
decimal
MAX 0xFF7FFFFF FF 7F 7FFFFF 1.7014117E+38
MIN 0x01000000 01 81 000000 2.9387359E-39
where the MSB is implicitly equal to one, and its bit
location is occupied by the sign bit. The 24-bit format has
the same structure but with only a 16-bit mantissa. While
24- to 32-bit conversion is trivial, requiring only an
additional zero byte in the mantissa, a 32- to 24-bit
conversion routine would typically employ nearest neigh-
bor rounding before truncation.
To produce the correct representation of a particular
decimal number, a high-level language compiler and
debugger could be used to display the internal binary
representation on a host computer and make the appro-
priate conversion to the above format. If this approach is
not feasible, algorithms for producing this representa-
tion are contained in Appendix A.
DS00575A-page 2
5-12
Datasheet pdf - http://www.DataSheet4U.net/

## Related Datasheet

 1 AN5707NS RV ELECTRONIC TUNER-CONTROL IC Panasonic Semiconductor 2 AN5710 B/W TV VIEDO IF Amplifer Panasonic Semiconductor 3 AN5715 Low Voltage VIF and SIF IC Panasonic 4 AN5720 B/W TV Video Detector AMplifier / IF AGC Circuit Matsushita 5 AN5730 B/W TV Sound IF Amplifier Panasonic 6 AN5733 Dual Attenuator Panasonic Semiconductor 7 AN574 App Note 101: Using the Secure Microcontroller Watchdog Timer Maxim Integrated Products 8 AN574 App Note 101: Using the Secure Microcontroller Watchdog Timer Maxim Integrated Products 9 AN575 IEEE 754 Compliant Floating-Point Routines Microchip Technology