## Fast math?

Discuss issues specific the Java port of Box2D
toucansam
Posts: 356
Joined: Mon Jun 08, 2009 12:21 pm
Contact:

### Fast math?

So while optimizing I've been thinking a lot about all the Math library calls, which are known to be a bit expensive, and I'm thinking of trying to put in some faster math calls, like so:

Code: Select all

`   // UNTESTED   public static final float atan2( final float y, final float x) {      // float coeff_1 = PI/4;      // float coeff_2 = 3*coeff_1;      final float abs_y = abs( y) + .0000000001f; // kludge to prevent 0/0                                       // condition      float angle, r;      if ( x >= 0) {         r = (x - abs_y) / (x + abs_y);         // angle = coeff_1 - coeff_1 * r;         angle = 0.1963f * r * r * r - 0.9817f * r + PI / 4;      }      else {         r = (x + abs_y) / (abs_y - x);         // angle = coeff_2 - coeff_1 * r;         angle = 0.1963f * r * r * r - 0.9817f * r + 3 * PI / 4;      }      if ( y < 0) {         return -angle; // negate if in quad III or IV      }      else {         return angle;      }   }   /**    * Computes a fast approximation to <code>Math.pow(a, b)</code>. Adapted    * from <url>http://www.dctsystems.co.uk/Software/power.html</url>.    *     * @param a    *            a positive number    * @param b    *            a number    * @return a^b    */   // UNTESTED   // I'm not sure if this is correct for floats, this was meant for doubles   public static final float pow( final float a, float b) {      // adapted from: http://www.dctsystems.co.uk/Software/power.html      float x = Float.floatToRawIntBits( a);      x *= 1.0f / (1 << 23);      x = x - 127;      float y = x - MathUtils.floor( x);      b *= x + (y - y * y) * 0.346607f;      y = b - MathUtils.floor( b);      y = (y - y * y) * 0.33971f;      return Float.intBitsToFloat( (int) ((b + 127 - y) * (1 << 23)));   }   // UNTESTED   public static final float sqrt( float x) {      x = invSqrt( x);      if ( x != 0.0f) {         return 1.0f / x;      }      else {         return 0;      }   }   // UNTESTED   public final static float invSqrt( float x) {      final float xhalf = 0.5f * x;      int i = Float.floatToRawIntBits( x);      i = 0x5f3759df - (i >> 1);      x = Float.intBitsToFloat( i);      x = x * (1.5f - xhalf * x * x);      return x;   }`

This has some implications, like loss of accuracy. But you would still have identical results across machines (which is also why the Math library is so costly), so anything networked wouldn't have a problem.

What do you think? We could also just make it an option, although that would cost some overhead, as we would have to check for the option every call.

Michiel
Posts: 34
Joined: Sat May 02, 2009 6:07 am

### Re: Fast math?

Couldn't you make it into an interface with two concrete implementions? One with the fast math and one wrapping the Java Math class (I assume we are talking about the java.Math class).

EDIT: I don't have the slightest idea if that overhead would be greater or less than checking for an option though. I for welcome the fast math you presented because I use them extensively.

ewjordan
Posts: 803
Joined: Sun Sep 23, 2007 2:35 pm

### Re: Fast math?

I'd probably go with a static final flag to determine which path gets taken; the JVM can optimize that out real easy, I'm not sure how well it handles interfaces with multiple implementations in contrast.

I'm not actually sure how much speed we can gain by moving to different math implementations. We should probably run some tests to see if there's any difference, which is why it would be great to be able to swap the built in math functions for custom ones without too much hassle. From what I've heard the inverse square root doesn't really gain much these days, but you never know, it's at least worth checking out...

toucansam
Posts: 356
Joined: Mon Jun 08, 2009 12:21 pm
Contact:

### Re: Fast math?

I just added the fast math to MathUtils, and with the Piston Benchmark I see a fps increase from 237 to 269, which is pretty significant. The only issue is the accuracy, so to keep it pretty accurate I put a few more

Code: Select all

`      x = x * (1.5f - xhalf * x * x);`
segments onto the invSqrt method, which really didn't affect the performance at all. This was much more of an improvement that I had thought

ewjordan
Posts: 803
Joined: Sun Sep 23, 2007 2:35 pm

### Re: Fast math?

Wow, I agree, that's a lot more of an improvement than I'd expect! Sounds good, especially since it should be pretty easy to swap back and forth in case things change in future Java releases.

I'll give it a try on the Mac JVM as well, sometimes things differ there.

toucansam
Posts: 356
Joined: Mon Jun 08, 2009 12:21 pm
Contact:

### Re: Fast math?

That first one was on my oldish macbook. Here's my windows computer results (20 iterations):

Regular math:
Average time: 2.10227328E9
Average FPS: 478.63745

(new 6.14 jvm: 491 fps)

Fast math:
Average time: 1.91783091E9
Average FPS: 525.5782

(new 6.14 jvm: 545 fps)

so around 10% faster on my windows computer (11% with the new jvm), and 14% faster on the macbook. Cool!

Edit: this was without -server

Edit#2: for some reason I can't get the server jvm working on my windows computer. when I do -server -version, it says the server vm in "mixed" mode, but it says the same thing for -client, and the -XX:-PrintCompilation flag never works. I have the JDK installed too, not just the jre.

Villane
Posts: 101
Joined: Mon Sep 01, 2008 6:32 am

### Re: Fast math?

Have you tried some sin & cos implementations as well? I have tried several ones but didn't get a notable improvement in ScalaBox2D.

I'm seeing about 2% improvement in the pyramid test with this sqrt though.

toucansam
Posts: 356
Joined: Mon Jun 08, 2009 12:21 pm
Contact:

### Re: Fast math?

Villane wrote:Have you tried some sin & cos implementations as well? I have tried several ones but didn't get a notable improvement in ScalaBox2D.

I'm seeing about 2% improvement in the pyramid test with this sqrt though.

Make sure you have enough "x = x * (1.5f - xhalf * x * x);" lines in it, things got really unstable when there wasn't enough of those. I also suggest making your own "int floor(float)" and "float abs(float)" methods, as this was part of the group as well (and jbox2d has been using it's own max/min methods for a while now).

I hadn't really thought of cos and sin, I'll shove in a lookup table and see if there's any improvements.

toucansam
Posts: 356
Joined: Mon Jun 08, 2009 12:21 pm
Contact:

### Re: Fast math?

ok so I did to version of sin, one with a linear interpolation between table values and another with just returning the closest table value. Both are significatly faster than the Math package, but I did preform some extensive tests as to accuracy vs. speed. HEre are the results:

http://pastebin.ca/1510759
edit: better test (more results) http://pastebin.ca/1510786

Here's the lookup code (precision is the step size in the table, and tableLength is TWOPI/precision).

Code: Select all

`   public final float sin(float x){      x %= TWOPI;            if(LERP_LOOKUP){                  x /= precision;                  final int index = (int)x;                  if(index != 0){            x %= index;         }                  // the next index is 0         if(index == tableLength-1){            return ( (1-x)*sinLUT[index] + x * sinLUT[0]);         }else{            return ( (1-x)*sinLUT[index] + x * sinLUT[index + 1]);         }               }else{         return sinLUT[ MathUtils.round(x / precision) % tableLength];      }   }`

From the test results, it's obvious that lerp is slower than a straight lookup, but lerp is so much more accurate, especially at low precisions. And then you have to think about table size, and how much memory it will take up. It seems as though the speed is generally constant for both methods, so it comes down whether you want a super big table with fast lookup, or a smaller table with slightly slower lookup.
Last edited by toucansam on Tue Jul 28, 2009 11:13 pm, edited 1 time in total.

toucansam
Posts: 356
Joined: Mon Jun 08, 2009 12:21 pm
Contact:

### Re: Fast math?

I've set up the look up tables, options are in the Settings file (which are static final). The tests I made are also there, in the org.jbox2d.testbed.mathtests

Surprisingly, I got about the same performance results with it on or off on my macbook (2 fps faster), but on my windows machine it went from 530 fps to 602 fps.

Edit: It also gets slower if you have a lot of precision... I'm going to keep experimenting with this.
Edit 2: ok so even though lerp is faster than Math calls in the tests (separate one I did, not in the one you see above), it's slower than Math calls when running the piston test. The fasted I got the test to run was 602 fps, where it doesn't lerp and the precision is .00131. Higher precisions end up making jbox2d slower, as well as lower precisions (from what I tried). I'm just keeping it there and calling it a night.

Also, I updated the test.