It's obvious you can see the problems they are dealing with, and those were predictable by everyone. Starting small would actually be very simple. They could start small by focusing on single player games, or non-graphics intensive games. But if cloud based FPS games require the same amount of bandwidth as a YouTube, they should have never stepped into this.Weird how they went about this. I'd always assumed that if one were to start out with cloud based gaming, if you started small and grew it over time eventually the technology necessary make this format more viable would arrive at some point in the near future - that is, emerging tech in reducing latency or a custom thin client the integrates with the service to improve performance. You keep the service going to continually improve. Think Google just spent crazy Google money with little vision. Reminds me of Microsoft moving into mobile phones for some reason.
Because there is no good way to start small with streaming anything. Just video streaming requires a fairly massive servers to get that compressed video to anyone at a reasonable speed(why do you think Youtube alternatives struggle so much with getting of the ground) and streaming an entire PC or games requires even bigger servers, mainly because there you cannot cheat by bunching together X amount of server racks or throttling traffic/server usage to get the best performance. You need at least a dedicated GPU, CPU and RAM so when you are offering 4k 60fps streaming you need as a bare minimum the EQUIVALENT of a RTX2080, a core i9 and 32gb of ram. So if your server has say something like 1tb of ram, something like 270 cores and 32 RTX level graphics processors then that server is hardcapped at servicing 32 customers at any given time. So just to offer such a service in one of the bigger american cities to say around 5000-10000 customers is a hefty investments that requires a lot of people to pay for subscription to be profitable. The equation changes slightly when you target lower standard such as 1080p 60fps but its not like it cuts the initial investment and maintenance costs by 90%.
Streaming just video on the other hand lets you use that same hypothetical server to service thousands of customers as sending 500mb of HD video of the net means the server wont hear from you for the next 10-20 minutes and can keep sending data packets to other customers.
If their strategy is to host a pc in the cloud and have you play games via virtual session, the strategy was terrible to start. They are playing to the weakness of the internet and client server technology, not it's strengths. Your vision above would require a tremendous amount of data transferred. YouTube isn't even a fair comparison. When you are watching a video, you can't change the outcome - its compressed and decompressed and streamed on the fly and streaming video traffic is one way.
It would also be very odd to renderer the same map 32 times, one for each customer. Essentially, the nature of FPS games would need to be changed to make Google's strategy work. Maybe that's why the huge outlay for vendors from Google. That is to say - right now in World of Warcraft, the client hosts the bulk of the heavy lifting. Once one connects to battlenet - Blizzards server maintains player interactions and such, but doesn't host the image of everyone else - your client does. FPS games are designed around limiting the work the server has to do. Creating a server with 32 video processors for 32 players sounds downright horrible. Why would the same map need to be rendered 32 times unless it was a limitation of the game itself? A multi-player game designed around thin clients would need to be built from the ground up, and keep the amount of data transferred as skinny as possible because Google can't control end user data connections, but will have to deal with support calls from everyone on DSL or such.