Yep, more little fiddly coding bits today. I’m very sorry.
Today it’s a bit more about optimization and keeping code clean and quick-running from the perspective of someone that is admittedly not extremely good at it, but is trying very hard to be. This is the sort of thing that happens when you start out doing SOME programming, leaving the field for design, and then ending up needing to write code for your own projects.
Moving on! Today is some reflections and discoveries regarding the Phone 7 Emulator versus native Windows builds, and profiling. As a bonus, an early shot of my phone project, currently called Nucleus.
Performance Early, Performance Often
I’ve tried to take to heart something that keeps coming up as advice for XNA game developers—which is to get concerned in a general sort of sense about performance early. Not in the sense of doing a lot of nitty-gritty optimization details, but in terms of sanity-checking and keeping on top of the big picture.
To that end, I’ve written into my nascent game library a Benchmarking namespace, which right now simply features a handy cross-platform Frames/Updates per second display. This is super-easy to put together and lets me just sort of keep a high-level eye out for problems cropping up. You can imagine my dismay when I went through some fair pains to avoid the kinds of problems I’ve talked about before and saw the results on the right.
37fps, of course, is just fine for a full-on game, particular considering the target fps for the WP7 platform is 30. The concern, though, is that I unlocked the frame rate to get an idea of my fastest-run speed, and the emulator doesn’t simulate the hardware speed. That is, I expect a lot more out of my slick gaming laptop when it’s just tossing a few sprites up and performing some basic input-processing.
Of course, it might be a little early to flip out and go on a performance crusade across very little code, so further investigation was warranted.
What are you Comparing Against?
It occurred to me very quickly that I had no baseline for what kind of performance to expect from the emulator. This was easy enough to resolve—I made a new project called Baseline. This project has only the code that XNA projects default-generate as a skeleton, with the addition of my FPS component. So in theory, it’s doing basically nothing but the most basic of things, plus calculating the fps and displaying it. Okay, fantastic. What does this give us?
Huh. Well, it’s unexpected, but basically heartening. I’m seeing roughly the same frame rate with the baseline that I am once the logic is involved. It suggests at least that I’m not trashing the system with my code and it has more to do with the emulator itself.
I’m not really convinced yet, though, that my code is alright.
Crossing the Platform Divide
Since I don’t yet have actual hardware to test on (but boy would I like one of those development devices!) I feel like i need to investigate the performance of my code better, and the only real avenue I have (that I’m aware of) is to look at what indications I can see under varied conditions and get a better feel for the impact of what I’m doing.
The solution is something we should all be doing anyway: cross-platform builds.
This is as simple as telling XNA Game Studio to make a copy of the project for Windows, and making judicious use of compiler conditionals in a few places. It is really, stupidly easy. Seriously. Do it.
Anyway, this opens up some new information. The Baseline code is flying along at nearly 5000 fps, which is, yes, a surprising jump from getting 37-ish on the same code. Nucleus isn’t nearly so obscene, since it actually does something—still, it shows roughly double the speed I was getting on the emulator. So it’s starting to look like the code really is kind of alright after all.
That’s Profilin’. An’ Profilin’ is… Good, Actually
Building for Windows as a target gives us another opportunity: to use the profiler to get a better idea of the state of the code. I’m working on VS2010 Express right now, which means that for the moment, I’m not getting to use the amazing built-in profiler. That’s alright, though, since CLR Profiler is free and does what I’m interested in right now.
As I’ve talked about previously, I’m mostly concerned about the possibility that I’ve overlooked some nasty garbage generation somewhere, and CLR’s going to help me with that. So I fire it up, start my Windows version of Nucleus, and just let it run through the loop for a while. After a couple minutes worth of 60-ish FPS, I ask for the report. I’m seeing 4 garbage collections, which isn’t bad, though I might prefer… oh… zero. Since the Windows GC is a lot nicer than the one on the phone, I have to consider the difference and take all 4 of those collections as if they were ugly non-generational collections(Gen 0 collections are nice and quick, but the phones don’t have generational collection).
While I was at it, I popped up my CPU and GPU monitor tools, and saw about what I’d expect—the hardware can barely tell anything’s happening. It peaks at somewhere around 10% GPU, and total load on the processor cores aren’t much elevated over their normal idle due to background processes.
Sidebar: The Input
Finding this little issue with the input loop brings up something worth mentioning—specifically how these issues can crop up in unexpected places. The loop in question is a foreach loop, and it seems like it should be fine, since it uses a class-level TouchCollection and the TouchLocations the loop creates are structs. But it seems that at runtime, the use of the loop also creates a TouchCollection Enumeration, which is apparently an object and ends up on the heap. Which just goes to show that there are places the Framework code will do things you don’t expect it to.
Poring through the CLR reports, it’s looking like those GCs aren’t happening often—maybe every 60 seconds or so, which isn’t horrible. But I’d prefer not seeing any of them during gameplay, of course. Looking through it also seems to indicate the issue is a ton of TouchCollection.Enumeration objects. Which is definitely part of the input loop. So it’s likely I should be doing something other than that convenient foreach loop that’s pulling in the multi-touch information… alas. I like that code. I guess we’ll see if it becomes a serious problem.
So What Has Been Learned?
Well, a lot, really. First, there’s massive benefit to running some basic performance benchmarking regularly and early, before the code gets large and complicated. Second, keeping a cross-platform build going, even if we have no intent to release on any other platforms, affords us the opportunity to better understand what impact our code is having outside the external constraints of the emulator(and also lets us use some of the better tools available to us).
Finally, sometimes your code is actually doing alright.