Friday, July 2, 2010

Floating point (of no return)

As the song goes "No return, from the other side of Heaven". And indeed, welcome to Floating Point Hell !


I recently got involved in GSoC (Google Summer of Code, ask Google web search for the homepage) helping a student make Wideband audio codec become a reality in Sip Communicator. Which is a Java application.

And as expected from such a mathematical object, one needs to do all kind of clever calculations to compress a high rate audio source to a low rate bitstream. Of course, nowadays we receive a lot of support from the FPU making this all a piece of cake, with respect to calculation speed.

So, what's the big deal ? Let's see, as usual, it all begins with a standard. And a good one at that, the IEEE 754 standard. The problem with this standard is how it defines several bit format for floating point numbers, and how people decided, or not, to implement them.

And of course, as for most of the hardware things, there is a software counterpart that can start ruining things all the same. Well, the opposite is true and you can ruin good software with horrible hardware, but we are getting side tracked.

So, the wideband codec I am thinking about is CELT and I must say that it is pretty interesting, and much more geeky than the streamlined SILK codec (the one behind skype, also available to developers, kind of).

And a part of CELT implements mdct for some reason, and includes a test program. You don't see test-units for C code so often, the last I can think of where in NS (the network simulator) and I can't say it left me with any good keepsake.

So I try to run them, and guess what ? Results can differ. A lot. But not from one run to another, rather from one 'configuration' to another. For example, I get different results depending on the operating system. Or the compiler flags. An imac gives results that differ from a dell server (although they have the same FPU), which can themselves differ depending on the compiler flags I am using.

In the end, the reason I realized this is because this student tries to adapt libcelt (the CELT codec in a library form) to Sip Communicator, and gets very bad results when running the test. Not that bad, but much worse than the C counterpart. On the same hardware+OS obviously, or it wouldn't be fun.

The final word ? Floating point calculation seems to have been historically complicated (see references below) in the Java VM and it resulted in the current situation with FP-Strict and so on.

Technical explanation : actually, on x86 machines, the FPU supports the extended double-precision format that specifies at least 80 bits numbers. This being available, all calculations are done using the highest precision by the hardware, then rounded before being handled back to the software.

The thing is, this setting can be disabled, and probably has been, in some of the configurations that produce different results. Including within the JVM, although it sounds like it could be configurable according to the Java Language Specification (i.e. how Java should be, not always how it is practically).

Well, I had already had a glimpse of the FP woes that plague Java when doing the ilbc codec implementation, where I realized that C and Java implementations gave different results for the same sample data. I solved the problem by deciding that 'to my ear' the difference is inaudible.

This is probably the way to go, geez.

Further reading material, if you want to have some floating fun :
  1. Ye olde How Java Floating Point Hurts Everyone Everywhere by William Kahan, that might not make the test-of-time award, but sounds so nostalgic. Remember the Sun vs Microsoft Java war...
  2. The floating point chapter of the 'Introduction to GCC' book
  3. A good technical article on Floating Point computations, aimed at what can (will?) go wrong
  4. This webpage on the Sun/Java website with the simple title of "What Every Computer Scientist Should Know About Floating-Point Arithmetic" and apparently triggered by a W. Kahan lecture

No comments: