Server Foundations
When talking about an MMORPG - or as we like to call games like Chronicles of Elyria, a Multiplayer Evolving Online World (Meow!) - there’s nothing more foundational than the back-end server. The back-end server is responsible for managing the entire state of the game, performing all the necessary computations for systems like AI, physics, pathfinding, and combat, and for receiving and processing all the input for, in the case of CoE, hundreds of thousands of connected clients. In January of 2017 we began the long process of taking what was mostly an offline, single-player game – designed primarily to validate user experience and gameplay feel – and turn it into a MEOW (what can I say, the Internet loves cats).
The process of building a scalable, reliable back-end server with all the gameplay mechanics for a game like Chronicles of Elyria can be divided up into two primary components. There’s the platform, necessary for providing infrastructure such as entity and state management, fault tolerance, load balancing, and a method of cross-process communication, and then there are the services or workers that are responsible for performing all the actual gameplay logic and processing the user input of all the clients.
Shortly after we announced development of Chronicles of Elyria we started looking around for ways to solve the typical problems associated with building a large, distributed simulation. In short, we were looking for ways to speed up development of the platform. As part of our research we got in contact with Improbable and as a first effort at bringing our game online, we began integrating SpatialOS in January of 2017. As a test, we built our own PhysX simulation in SpatialOS which, when integrated with our Unreal Engine 4 (UE4) client, allowed us to move around the world using our own custom physics engine in a way that provided all the benefits SpatialOS has to offer.
Of course, SpatialOS isn't a game engine, so while it provided a platform on which we could build our game, we were still responsible for writing all the systems of CoE. As we were developing those systems we discovered we could make headway faster, enable unique gameplay features, and leverage a large ecosystem of existing code by moving from C# to JavaScript/TypeScript as the primary language of our backend. Fortunately for us, as of SpatialOS 10.3 Improbable had an experimental JavaScript SDK, which meant we could continue to use SpatialOS along with our new chosen language and architecture. So in early spring we made the move from C# to JavaScript and, as a result, saw rapid improvements in the rate at which we could develop new features. This was also around the time we saw accelerated progress in the development of the VoxElyria client (formerly referred to as ElyriaMUD).
But, then we started encountering some issues. First, while SpatialOS provided load-balancing and fault tolerance for all our spatial workers, there were still many workers that were not spatial in nature. Workers such as our authentication and login server, our AI system, and our Procedural Story Engine. For these, we still needed our own load balancer, fault-tolerance, and cross-process communication. So, we began researching different technologies that, while not a single solution, would provide the scalability, reliability, storage, and communication benefits normally provided by SpatialOS.
Second, we started to have some concerns about the financial viability of SpatialOS for our needs. Whenever you use SpatialOS you're also signing up to have your game hosted by Improbable. That has the benefit of lowering operations costs, but has the drawback of passing all the hosting fees they pay to their cloud partner onto us. It also means we don't have the ability to choose a hosting partner - whether cloud, bare metal, or dedicated servers that meet our performance needs. And CoE has some very specific needs! In specific, our use of Offline Player Characters (OPCs), the extremely large size of our world, the vast number of entities in the world, and the way we divide our game server up into dozens of different worker types meant that SpatialOS was particularly expensive for our use-cases. Our philosophy has always been about keeping our hosting fees low so we could pass those savings onto you. With SpatialOS, our hosting fees would have been more expensive, which would have forced us to increase our prices - something we didn’t want to do.
Of course, we brought our concerns to Improbable, and over the last eight months they’ve done a fantastic job working with us to try and bring the price down. Unfortunately, it remains an expensive solution for us. To make sure we were prepared, we began looking for alternative technology that could fill any gaps left behind if we were unable to use SpatialOS for any reason.
As we had already started leveraging Docker Swarm, a Container technology for load balancing and fault tolerance of our non-spatial services, we knew we could transition to a full Docker stack if necessary for our spatial services as well. When we realized we were going to need to communicate between our spatial services and non-spatial services, we integrated RabbitMQ into our back-end, a super fast routing and message protocol used to serve 10's of millions of requests by banks and other high-traffic websites across the internet each day. And because we needed persistent storage for all our non-spatial data, we integrated PostgreSQL into our backend. Fortunately, PostgreSQL supports SQL, NoSQL, and even spatial queries, enabling it to act as a backend and persistent storage for both our spatial and non-spatial data.
All of the above technologies were integrated into our backend to solve non-spatial related problems, but we made those choices because we knew we could lean on them if worst-came-to-worse. And then we encountered our third and final roadblock. In the most recent release of SpatialOS Improbable deprecated their JavaScript SDK and marked it as unsupported. This left us with our biggest challenge yet - we no longer had a good way to interact with the SpatialOS backend.
When this happened, the engineering guild and I spent several meetings exploring our options with respect to shimming an interop layer between SpatialOS and our JavaScript-based backend. But in the end, we realized that it would be too time-consuming and error-prone to try to continue to use SpatialOS when we already had an efficient routing and communication protocol, an architecture that allowed for scalability and fault tolerance, and a persistent storage solution that enabled us to track and update the state of the world. So, as of the end of 2017, Chronicles of Elyria is no longer using SpatialOS, and is entirely built on the Soulborn Engine. We really enjoyed working with the folks over at Improbable and we are still looking forward to how their platform impacts the online gaming industry as a whole. Their technology is still an extremely powerful solution for virtually all distributed simulations out there. But for our particular technology choices and game mechanics, it just wasn’t the ideal solution.
As you can see, our non-spatial services required a large part of 2017 to be spent getting our core stack set up and creating the necessary components to have our own, proprietary platform. And because we were concerned about how things would end up with SpatialOS, we continued to make progress on these areas within the Soulborn Engine throughout 2017. In fact, we went to PAX West this year and showed off our jousting demo. This was the first time we'd taken a multiplayer demo to PAX since our combat demo back in 2016, but unknown to many, the Joust Demo was running on a locally deployed version of the Soulborn Engine. This means that everyone who played the jousting demo at PAX West has already played on our new server stack.
And finally, in December 2017, we released Version 0.1.0 of Chronicles of Elyria to our “Friends and Family”. This was the first time offsite users were able to play Chronicles of Elyria. Again, the milestone was completed using an entirely Soulborn-based game engine. So, what does this all mean in a nutshell? It means that in 2017 we began by using SpatialOS, and ended with something that, while not a single solution, does everything SpatialOS did - ultimately providing us all the same functionality as SpatialOS, while allowing us to keep our operating costs low and providing us more control over our server performance. With the transition complete, we’re now ready to move forward with more core gameplay mechanics in 2018.