So I've made some really awesome progress on the renderer for 3.1 over the last few days. Wasn't sure if I would post there here or not, but it's sort of exciting so why not.
The following are framerates for 3000 Sprites running on a 4th gen iPod Touch. I used that since it was the oldest device I had on hand. 37 different textures were used and selected randomly.
unatlassed = Sprites loaded from individual files.
atlassed = Sprites loaded from a spritesheet.
batched = Using a CCSpriteBatchNode on 3.0, not needed with 3.1.
20% visible = Only 20% of the sprites are visible on the screen.
sorted = Sorting the sprites by rendering state to force a best case scenario for 3.1 to automatically batch them.
5 fps - unatlassed, unbatched
7 fps - atlassed, unbatched
10 fps - atlassed, batched
5 fps - unatlassed, unbatched, 20% visible
7 fps - atlassed, unbatched, 20% visible
44 fps - atlassed, batched, 20% visible
3 fps - unatlassed
10 fps - atlassed
11 fps - unatlassed, 20% visible
37 fps - unatlassed, sorted, 20% visible
45 fps - atlassed, 20% visible
When the automatic batching is working with unatlassed textures it’s making a draw call for nearly every sprite, and with it’s overhead is slightly more than the 3.0 rendering. When a spritesheet is used 3.1's performance is nearly as good as 3.0 with a batch node, but with no extra effort needed. In the 20% tests, the overhead is a little more apparent. Sorting the sprites by rendering state is nearly as good as using a spritesheet. Not too surprising, and it's good to know that you can significantly improve performance just by changing z-orders. Since most games don’t have completely random sequences of sprites, the real world performance of the render queue will probably lie somewhere between the “unatlassed” and “sorted” performance numbers even if they aren't using spritesheets.
I’m pretty happy with the way things are going so far. The rendering code is much simpler now. The worst case performance isn’t much worse, and the automatic batching performance is nearly as good as the existing batch nodes. Culling seems to work fantastically too. Better yet, there is still a lot of room for improvement so I only expect performance to go up even more.
So unless your app is rendering a completely random sequence of sprites that you can't put into a spritesheet, I would guess that most games will see at least a 2x+ rendering speedup without needing to do anything!