floating point - Is it possible to omit rounding of intermediate results during arithmetic operation on multiple FP operands? -
is there possibility arithmetic operation on multiple floating point operands without rounding intermediate results , round final result, , there architectures doing it? because far i've seen after 2 floating point operands used in addition/subtraction operation result gets rounded before beingness used operand operation, i've seen this.
edit:
below examples considering single-precision format elucidate concept, 3 to the lowest degree important bits of intermediate 27-bit mantissas taking part in arithmetic operation guard, round , sticky bits; examples can see, using intermediate mantissa bit construction utilized in ieee754 compliant fp system, avoiding rounding of intermediate values possible , when it's done it'll accomplish more accurate result:
1_example 1:
a-b = 101001000010001110110100 1 0 1×2^exp
c = 100011001011001010010100×2^(exp-4) --> c = 000010001100101100101001 0 1 0×2^exp
1_1_if a-b-c calculated after a-b rounded:
rounded a-b = 101001000010001110110101×2^exp
a-b-c = 100110110101100010001011 1 1 0×2^exp
rounded a-b-c=100110110101100010001100×2^exp
1_2_if a-b-c calculated without a-b beingness rounded:
a-b-c=100110110101100010001011 0 1 1×2^exp
rounded a-b-c = 100110110101100010001011×2^exp
2_example 2:
a-b = 100001001100101011100000 1 0 1×2^exp
c = 101001011010001110001000×2^(exp-5) --> c = 000001010010110100011100 0 1 0×2^exp
2_1_if a-b-c calculated after a-b rounded:
rounded a-b = 100001001100101011100001×2^exp
a-b-c = 011111111001110111000100 1 1 0×2^exp
a-b-c shifted 1 bit left = 111111110011101110001001 1 0 0×2^(exp-1)
rounded a-b-c=11111111001110111000101×2^(exp-1)
2_2_if a-b-c calculated without a-b beingness rounded:
a-b-c=011111111001110111000100 0 1 1×2^exp
a-b-c shifted 1 bit left = 111111110011101110001000 1 1 0×2^(exp-1)
rounded a-b-c = 111111110011101110001001×2^(exp-1)
3_example 3:
a-b = 10000001101000110101001 1 1 1×2^exp
c = 100010100101011010010000×2^(exp-6) --> c = 000000100010100101011010 0 1 0×2^exp
3_1_if a-b-c calculated after a-b rounded:
rounded a-b = 10000001101000110101010×2^exp
a-b-c = 01111101010100001001111 1 1 0×2^exp
a-b-c shifted 1 bit left = 11111010101000010011111 1 0 0×2^(exp-1)
rounded a-b-c=11111010101000010100000×2^(exp-1)
3_2_if a-b-c calculated without a-b beingness rounded:
a-b-c = 01111101010100001001111 1 0 1×2^exp
a-b-c shifted 1 bit left = 11111010101000010011111 0 1 0×2^(exp-1)
rounded a-b-c=11111010101000010011111×2^(exp-1)
4_example 4:
a-b = 101100101000111000110101 0 1 1×2^exp
c = 100110010110011101100000×2^(exp-7) --> c = 000000010011001011001110 1 1 0×2^exp
4_1_if a-b-c calculated after a-b rounded:
rounded a-b = 101100101000111000110110×2^exp
a-b-c = 101100010101101101100111 0 1 0×2^exp
rounded a-b-c=101100010101101101100111×2^exp
4_2_if a-b-c calculated without a-b beingness rounded:
a-b-c=101100010101101101100111 1 0 1×2^exp
rounded a-b-c = 101100010101101101101000×2^exp
5_example 5:
a-b = 100000111011001111001010 0 1 1×2^exp
c = 110001011010010110010110×2^(exp-3) --> c = 000110001011010010110010 1 1 0×2^exp
5_1_if a-b-c calculated after a-b rounded:
rounded a-b = 100000111011001111001010×2^exp
a-b-c = 011010101111111100010111 0 1 0×2^exp
a-b-c shifted 1 bit left = 110101011111111000101110 1 0 0×2^(exp-1)
rounded a-b-c=110101011111111000101110×2^(exp-1)
5_2_if a-b-c calculated without a-b beingness rounded:
a-b-c = 011010101111111100010111 1 0 1×2^exp
a-b-c shifted 1 bit left = 110101011111111000101111 0 1 0×2^(exp-1)
rounded a-b-c=110101011111111000101111×2^(exp-1)
6_example 6:
a-b = 100000000011000111001010 0 0 1×2^exp
c = 110010001100110111000000×2^(exp-8) --> c = 000000001100100011001101 1 1 0×2^exp
6_1_if a-b-c calculated after a-b rounded:
rounded a-b = 100000000011000111001010×2^exp
a-b-c = 011111110110100011111100 0 1 0×2^exp
a-b-c shifted 1 bit left = 111111101101000111111000 1 0 0×2^(exp-1)
rounded a-b-c=111111101101000111111000×2^(exp-1)
6_2_if a-b-c calculated without a-b beingness rounded:
a-b-c = 011111110110100011111100 0 1 1×2^exp
a-b-c shifted 1 bit left = 111111101101000111111000 1 1 0×2^(exp-1)
rounded a-b-c=111111101101000111111001×2^(exp-1)
according paper referenced in question, possible calculate dot product of pair of length n vectors single rounding operation @ end, getting closest representable result dot product.
in practice, current computers round intermediate results, results in reply not closest representable. @ best, rounding extended format, tend reduce, not eliminate, intermediate result rounding error.
with fused multiply-add rounding done n times, 1 time each multiply-add. without fused multiply-add done twice each pair, 1 time after multiply , 1 time again after add.
the bibtex info paper is:
@inproceedings{ yao:correctly, author = "tao yao , deyuan gao , xiaoya fan , jari nurmi", title = "correctly rounded architectures floating-point multi-operand add-on , dot-product computation", booktitle = "asap'13", pages = {346-355}, year = {2013}, } floating-point computer-science computer-architecture floating-point-precision
No comments:
Post a Comment