Looking for Performance Gaps - Development Log #522

In this week's devlog Michi talks about influence reports and finding the cause of the performance issues.

Avatar Michi

Michi (molp)

This week I finished implementing the most important transfers of influence:

  • from company to planet
  • from planet to system
  • from system to faction.

You might remember that the plan was to deploy these changes, see how much influence is being generated and transferred over time and use that information to adjust the influence system (and maybe its game design). Fabian will take care of that, and I can work on other topics in the meantime. We haven't implemented any visible changes for the influence system yet, so don't worry, if you don't notice any changes just yet.

In order for Fabian to retrieve the necessary planet, system and faction data, I implemented a CSV report that can be triggered by an internal tool. I hope that nothing explodes when these reports retrieve the data, as they touch every planet, system, faction and population. That is a lot of entities to speak to in a short time, so fingers crossed.

Remember last week, devlog #521, when I said that I was happy with the recent performance changes because they lead to fewer server pod restarts? In the forums players wrote that the game still feels sluggish and sometimes actions can take several seconds until the green "action succeeded" notification shows? I rarely see such delays, but then again, my company is small compared to what others have built. I decided to dig deeper into this topic. To find performance bottlenecks and then improve it, it is necessary to be able to measure the performance first. That way I can be sure to make the changes in the right spot and actually improve the performance.

My first hunch was that snapshotting could be a cause of the delays. After every few thousand events an entity has written to the database, we create a snapshot of the entity. That allows loading the entity back from the database after a server restarts way faster. During snapshot-taking all actions that are being sent to an entity are stashed for later execution and that would perfectly fit into the problem description I had: players perform several actions, they all seem to hang for a short while and resolve at the same time.

The analysis of the resulting data showed that snapshotting can be slow for larger companies, in some cases it can take up to 1.5 seconds to write a snapshot to the database. Since snapshots are only taken every few thousand events, there has to be something else causing the delays.

The next candidate on the list of potential causes is the way we handle scheduled commands. A command is, simply spoken, an action that the player (or a system of the game) sends to an entity and expects it to do something. An example would be "transfer x units of z from inventory a to b". A scheduled command is an internal command, that can be executed right now or in the near future. For example, if we have the segment of a flight, the fleet behavior will schedule a command to end the current flight segment and start the next one, at the time the flight segment ends. The behavior will then get a notification from an internal system once that time has arrived and can execute that command.

Determining what the next command for an entity is, can be costly: we have to go through all behaviors of an entity (ships, contracts, bases, ..) and determine what needs to be updated next. I improved the logging around scheduled commands a lot, and I am sure I will find some performance bottlenecks. The logging is already deployed, and I should have proper results by next week.

As always, we'd love to hear what you think: join us on Discord or the forums!

Happy trading!