Just a quick one here. While Google App Engine's Python implementation limits you to a single thread it certainly isn't running a single thread and servicing requests from it. When running locally (where the performance per image is about the same) it certainly does appear to be single threaded as it takes an absolute age and logs are always one request after another. On the server however its a completely different case with multiple requests being served at the same time. This is what the Quota Breaking graphs seem to indicate as its servicing 90 requests a second which would seem to indicate that the App Engine just starts a new thread (or pulls from a thread pool) for each new request and is spreading theses requests across multiple CPUs. The reason I say that later piece is that the performance per image is pretty much the same all the time which indicates each one is getting dedicated time.
So lots of independent threads running on lots of different CPUs but probably sharing the same memory and storage space.