2/2/2020 in devlog
In today's development log, we're addressing a few issues the past release brought with it – but more importantly, you're getting a good look at the next, much bigger one!
It has been one of these weeks where you start with a clear plan: finish the maintenance release, release it, work on the new named release. But things never turn out the way you think. It is now Friday afternoon and I have yet to start working on the new named release. The backend release Martin mentioned and the maintenance release left us with a lot of unexpected bugs to clean up: hanging production lines, full ship inventories that have no visible cargo, just to mention two. But the good news is that we fixed most of the issues and therefore we got the maintenance release out of the way.
Speaking of releases, I thought it would be about time to introduce the next named release with the code name Presence!
Presence is all about local market shipping contracts that you have requested so many times. We want to keep it simple, keep the development time short and see how it turns out. If everything goes according to our plan (hint: it won't) we will release the feature at the end of February, early March. Presence really consists of three main parts, one of which is rather technical:
The local markets will get a new ad type called shipping. A company looking for someone to transport some of their goods will be able to post an ad. The ad will share a lot of elements with the existing ads (commodity, amount, currency, price, delivery time, ad visibility) and introduce two new ones: origin and destination. With these components the creator of the ad can determine where the hauler should pick up the goods and where to drop them off. The requirements are that the principal must own fixed stores, e.g. no ships, at both locations and one of the locations must be the local market. Right now this would only be possible if the principal had two bases, which brings us to the next change.
We will introduce a new planetary project called warehouses (proper name pending) that works like a self-storage facility. Players can rent storage units that will work just like a storage building in a base. These units will require a weekly rental fee that is being set by the planetary governor. If the warehouse runs out of available storage units an expansion is possible up to a certain amount of times to increase the number of storage units.
The last change handles the representation of freight in a haulers inventory. The freight will be represented as a generic icon with a certain weight and volume so the hauler will not know about its contents. Clicking on it will open the corresponding contract. We might even use this method to display materials that are being provisioned for a pick-up condition to make it more obvious that they are still in the inventory, but blocked.
This week was about nailing down the planning for the "Presence" release, including the balancing for the new planetary project and the gameplay possibilities it will enable. Other than that, I continued last week's inquiries regarding a MM rebalancing for certain products, which should also find its way into said release. Finally, since research and concept work can't start early enough, I already invested some time into thinking about the next big feature we have our eyes on after Presence. But that's a topic for another day...
A little story from the trenches today, maybe a bit technical:
A lot (well, a couple) of confused faces could be seen in the simulogics development offices this week: After we rolled out my backend release which, if anything, should have introduced some performance improvements, we saw drastic decreases in performance across the board. All server nodes were using a lot more memory and the CPUs where running at more than twice the load compared to before the patch.
So I started looking for the cause. The problem in a situation like this usually is that you are missing exactly the kind of instrumentation you need at that very moment. In our case we saw an above-average amount of so-called "hydration timeouts" which basically just means that the client requested a piece of data that contains references to one or more entities (like companies), but that the information to display those entities (like a company's name) could not be retrieved from the respective entity actor in time before the data is sent to the client. In that case, the general piece of data is sent to the client but the entity references are still just that: references. The end-user gets to see a "Hydration Timout" error message with the respective entity ID instead.
There are basically two reasons why such a timeout can happen: Either the respective entity is broken and therefore can't reply to the request at all or it's just busy and replies too late. When an entity is broken, it's pretty obvious because we see the errors on the logs, and when it's too busy, it is most likely because it is "recovering" (more on what that means in another devlog on the architecture of PrUn that I have promised for ages now). But because the latter has never been an issue before and we had optimizations to the recovery mechanism planned for the future anyway, we never put in any instrumentation to track how long it actually took.
After I hacked in a quick-and-dirty logging mechanism we realized that many entities take well above a minute to recover now, which in itself did not explain our problem, though: They surely didn't start taking this long just this week. It must have been the case before and users didn't notice any performance problems of this magnitude before the upgrade.
It took Michi's shrewd eyes to notice that many entities would show up more than once. That's weird because once an entity has recovered and it doesn't get passivated for any reason, it should remain active in memory. So after many hours of searching, I found the culprit: A single configuration flag the default value of which has changed between releases of our actor framework. It just killed any actor after 2 minutes of idling, forcing our whole simulation into a constant state of reloading entities. After changing "120s" to "off", the problem was gone. And so were many hours of my precious time…
Hi, guys! I'd like to take a moment to say that I'm still with simulogics and will stay around until at least the end February. Apparently, some of you thought I had already moved on, but you're not getting rid of me that easily. I just haven't had much interesting to report over the past few weeks. Same this week, actually; I've been preparing a final round of interviews with my potential successors next week and spent a lot of time translating our new TOS and Privacy Statement. I know, fascinating stuff. (: See you next week!