Cleaning-up Operations - Development Log #414

Michi talks about the recent server instability and how the team tries to solve it.

Avatar Michi

Michi (molp)

The performance and stability of our servers has been declining over the last weeks and even months. A typical symptom of this is the sudden disconnect of a bunch of players. Another example is when the APEX console turns red, and takes a while until it can connect again. Sometime servers will just feel very slow and sluggish.

The reason behind the performance issues is that our servers are a bit short on RAM. Over time, they use up almost all the memory there is, making it harder and harder for the game finding free memory. Garbage collections, the process of freeing up unused memory, also takes longer and longer. At a certain point, when there is no memory left or response times get too large the server is automatically killed and restarted. This means that players will be disconnected and in some cases it can cause cascading failures, where the other servers try to manage the load of the dying node and also fail.

The solution is to reduce the memory footprint of our game objects, or buy bigger servers. In the past we did both and buying better machines is definitely an expensive option.

This time we opted to analyze the memory usage and tried to find spots where we can save memory. It didn't take too long to find out that companies are by far the most game objects (entities) we have. Looking at the typical memory usage one part of these company entities sticks out: the accounting behavior.

The accounting behavior is responsible for keeping track of the money. It does so by storing bookings, which essentially represent a money transfer from one account to another. These accounts do not necessarily have to be cash accounts. They can track any kind of revenues or expenses. Bookings are grouped in accounting periods, which have the length of seven days. In the past we kept many periods because we didn't have any memory problems. A few months ago we reduced the amount of kept periods to save some memory. Now with more players playing and players having larger and larger companies, the sheer amount of bookings can make up 30% of the memory footprint of a typical company entity.

So last week I implemented a change that removes old bookings from accounting periods and replaces them with aggregated sums. The aggregated sums use way less memory and still provide enough information for the existing commands. We now keep a rolling list of cash bookings, so the list of recent cash transactions in FIN will not be deleted when a new accounting period starts each Monday.

I deployed the changes to our test server last week and everything looks good so far. I hope to deploy to production this week!

Last but not least: Holiday season is coming up and PiBoy314 is organizing the PrUn Secret Santa event again! You can register over here!

As always: we'd love to hear what you think: join us on Discord or the forums!

Happy trading!