Curious Anomaly In Python - Help Appreciated

Sort:
stephen_33

I have a short program written in Python (I'm running v3.7) that inputs values from a file, adds or subtracts from a running balance and checks the float-value of that running balance, maintained by the program, with that held in the input file.

Simple enough you'd think and up until today it's worked nicely but after updating my data file and running the figures through the program, I came across this unexpected result...

41389.46 + 1.70 yielded the result 41391.159999999996

this isn't a problem and reflects the fact that decimal values can't always be represented precisely by binary-based systems. When I need to display such a value I specify two decimal places and that's usually fine.

But the next line caused the problem...

41391.159999999996 - 41391.16 yields -7.275957614183426e-12

The expected balance is 0.00 but the following formatting of -7.275957614183426e-12 gives this bizarre (to me) result...

f'{-7.275957614183426e-12:.2f}' gives -0.00 (minus zero!)

of course when my program compares -0.00 with 0.00, it detects a mismatch.

It looks like a possible oversight (if not a bug?) within Python but is there any easy way I can correct this, preferably without the use of additional modules? I've been looking at the Python Decimal module but I'm not sure yet if the solution is in there.

7ania7

-7.275957614183426e-12 means -7.275957614183426 * 10^{-12} ~ -0.000000000007, which is just very small negative number. You can detect it by comparing with another very small fixed (of your choice) number.

You can write this pseudocode it in python easily (I don't know python):

epsilon = 0.000001 (for example)

is_almost_zero(x) =

... if (-epsilon < x < epsilon) return true

... else return false

 

Elroch

Interesting. I see the problem, but I don't think that it is a mystery.  There are three decimal numbers here ( 41389.46, 1.70, 41391.16 )  all of which have to be rounded when converted to binary. What you have found is that the rounding happens to not make the rounded numbers add up like the decimals. The formatting doesn't affect the underlying numbers.

@7ania7's solution of replacing tests for equality with tests of being within a tolerance is surely a good one for many purposes.

Note f'{-0.001:.2f}' is '-0.00' too.  The number's not zero, the string that represents it rounded to 2 DP just looks like zero.

stephen_33

It seems this isn't an oversight because it's mentioned in the doc's...

https://docs.python.org/3.7/library/string.html#formatspec


"Positive and negative infinity, positive and negative zero, and nans, are formatted as inf, -inf, 0, -0 and nan respectively, regardless of the precision"

7ania7

This is a different thing. There is a special value indicating +0 and -0 (meant for values beyond float number precision), and what you quote is how it is printed to the screen.

Your number is printed as -0.00 because you asked for 2 digits of precision when printing -0.000000000007, but your number is not the same thing as the -0 from your quote.

Elroch

Yes, indeed your doc quote implies that '0.00' comes from a number > 0 and '-0.00' comes from a number < zero.

stephen_33

Thanks for your help guys but I've managed to solve this one quite easily with a little reverse thinking.
Since the decimal value 0.00 is equivalent to -0.00 (a comparison returns 'True'), I've modified my code to test the float value of the running balance (but rounded to two decimal places) against the running balance given in my input file, converted to a floating decimal.

That works perfectly.

arrechea

generally speaking it's a bad idea to test for equality between any two floating numbers  because the computer cannot precisely represent them. This is true in for nearly all computer languages C, C++, C#, Java, Python, etc. Either cast the number to an integer then compare (depending upon your application) or use a range or round. Though rounding can also fail depending on how its don't (eg. 5.9 vs 6.1 are considered equal because they round both to six but 5.4 vs 5.6 are not because they round to 5 and 6).

Elroch
stephen_33 wrote:

Thanks for your help guys but I've managed to solve this one quite easily with a little reverse thinking.
Since the decimal value 0.00 is equivalent to -0.00 (a comparison returns 'True'), I've modified my code to test the float value of the running balance (but rounded to two decimal places) against the running balance given in my input file, converted to a floating decimal.

That works perfectly.

Your solution seems very close (maybe equivalent) the more conventional rounding of the numbers to the nearest 0.01 (various ways to achieve this, eg use math.round(), or multiply by 100 and cast to int as @arrchea suggest).

Probably not important, but it would be a bit quicker not to go via a string. representation. It's worth bearing in mind that that is what your code is doing - creating a string that is related to your number and usually used for printing.

The most robust solution was the first suggested - decide what tolerance you are happy with calling numbers the same and check the numbers are that close. Even rounding to 2 DP could give a discrepancy about once in 10^10 times if the resolution of your numbers is ~10^-12 (like your example).

stephen_33

@arrechea: But the 'error' (such as it is) is absolutely tiny and in the 17th./18th. decimal place! I require precision only to the nearest 0.01 and this is how Python stores such numbers in those increments:-

0.01
0.02
0.03
0.04
0.05
0.060000000000000005
0.07
0.08
0.09
0.09999999999999999
0.10999999999999999
0.11999999999999998
0.12999999999999998
0.13999999999999999
0.15
0.16
0.17

The Python round(value, 2) function returns truncated values perfectly precisely as far as I can see.

stephen_33
arrechea wrote:

..Either cast the number to an integer then compare (depending upon your application) or use a range or round. 

Not sure how that would work? You've probably realised this is about reconciling amounts of money credited to, or debited from, an account, so values typically have two digits after the decimal point.

Do you mean render all input values into integer form before performing any operations on them?

shplorf

Another approach to this kind of problem is to consistently represent the numbers using a smaller unit, so you can use ints all the way. So instead of the number 4.55, you use 455. Then for output, you'd still need to convert back with 455/100, but you wouldn't need to worry about imprecision at any point.

stephen_33
Elroch wrote:

...

The most robust solution was the first suggested - decide what tolerance you are happy with calling numbers the same and check the numbers are that close. Even rounding to 2 DP could give a discrepancy about once in 10^10 times if the resolution of your numbers is ~10^-12 (like your example).

An accumulating error? That might be a problem if I was processing many thousands of values but the total is typically less than fifty.

jlconn

The problem you're having is due to the way floating point numbers are handled by programming languages that abide by the IEEE 754 floating point representation standard.

Yours is the same problem that computer chess programmers have when representing evaluations, and it's why they don't use floating point numbers to store the values.

On input, multiply all values by 100 to move the two relevant decimal places to the left of the decimal, cast to int, and do all internal storage and calculations with integers, then on output, divide by 100 and format the string properly.

If you need to work with fractional cents, such as was once common in stores (e.g., coffee 16.5 cents per pound), simply multiply/divide by 1000 or 10000 or whatever precision you need.

This is what computer chess engines do vis-a-vis centipawn evaluations.

stephen_33

Yes, I decided to modify my code earlier today to convert all inputted values to integers and then divide results by 100 when it comes to displaying results.

Using floating point values threw up one problem after another!