Pixel Bender test drive
Flash 10 is out, and delivers a host of new cool features. One in particular attracted my attention: Pixel Benders! It's time to make a minimalistic example and see how it all works.
It's just a simple test that combines the distances from the current pixel to 9 points that I move around using sin/cos in Flash. The Pixel Bender language v1.0 does not support loops, so they need to be unrolled unfortunatly.
There has been lots of confusion about the promised GPU acceleration in Flash Player 10. Tinic Uro ( http://kaourantin.net ) which seems to be part of the Adobe Flash development team, stated quite clearly in his May 20th blog post that GPU acceleration would not be enabled for Pixel Benders inside the Flash player, but given that his blog post was written half a year before release, and the Pixel Bender language is based on GLSL I had to give it a try and see for myself. So I exported the SWF with the wmode set to GPU in addition to the CPU ("no hardware acceleration") version.
Try the wmode="gpu" version:
GPU
Clearly Pixel Benders are not executed on the GPU inside Flash 10 player, but still delivers good performance as the JIT compiler produces a lot more efficient code than AS3.0, SIMD execution units on the CPU are utilized, and it is multithreaded by nature so it executes across all CPU cores. Testing the performance on a few different systems gives the following results (results from the Pixel Bender toolkit that runs the kernel on the GPU are also included):
- Ultra low-end laptop: Celeron M 370 (single core Pentium M @1.5GHz) | 10 FPS | (100% CPU usage) Flash Player 10
- Core 2 E6600 (dual core @ 2.4GHz) | 50 FPS | (90% CPU usage) Flash Player 10
- Core 2 Q6600 (quad core @ 2.4GHz) | 90 FPS | (80% CPU usage) Flash Player 10
- Geforce 7600GT | ~250 FPS | Pixel Bender Toolkit
- Geforce 8800GTS | ~1200 FPS | Pixel Bender Toolkit
About 1200 FPS on a Geforce 8800GTS. Wow! What a shame this performance isn't available through Flash. The good news is Pixel Bender code is scaling really well across cores..
On a side note; Intel will release a "many core" chip called Larrabee, but it would probably only be accessable through simulated DirectX, OpenGL or a native API. See the August 08 paper. Realtime raytracing's getting closer! :-)
btw. Nice to meet another demoscener! I used to be on C64 demoscene :)