Tuesday, June 26, 2018

Networking and Technical Debt

Early in the development of Broadside I was still learning the networking API and was just trying to get something to work.  The intention at the time was to get ships sailing, cannonballs flying, and prove the game was possible.  Getting things working at all was put as a priority over getting things to work the most efficient way, with the idea being that I would revisit these systems at a later date to optimize them.

Unfortunately that time has come.  As time has gone on, more systems have been added to the game that need to be synced to the clients.  From crew to cargo to city markets, there's added up a good amount of data now that needs to be sent to the clients and kept in sync.  After adding some larger ships, which necessitate even more data to sync, I hit a brick wall that took a little while to understand.

I was getting errors on the server where a client would no longer receive any server updates.  I tracked the issue down to something I'll call "reliable message pressure", where reliable messages are attempted to be sent to the clients at a rate faster than the server can actually send them (number of messages here is more important than the amount of data in each message, as the amount of data in most messages is actually incredibly small) and that would result in messages being delayed and ultimately available buffers for either storing new messages to send or waiting for acknowledgement messages to be returned, would be filled and no new data could be sent.  This would prevent even unreliable messages from sending as well.  This would appear on the client as if it froze, but it didn't actually disconnect.

At first I simply attempted to increase the buffer space, but that just resulted in delaying the issue.  Then I went and did a proper investigation and found it had to do with how I implemented syncing the state of the ships in the first place - mostly code that has been in place since the beginning of the project and had forgotten exactly what I had done for expediency.

I was syncing a timer every frame over the reliable channel that the clients weren't even using.  I was syncing data like crew allocations to every client when only the owner of the ship cares about that data.  I was sending requests from the client to the server to turn the ship every frame over the reliable channel.  I was syncing data like a ship's cargo to the owner of the ship every few seconds even if the contents had not changed.  And much more.

So I've eliminated the error now by going through and syncing any data only the owner of the ship cares about just to that client, instead of all clients.  I've moved any non-critical syncing or commands from the player, such as turning the ship, over to the unreliable channel (if you're holding a turn key it doesn't matter if a frame or two gets lost, and there is no need to send an acknowledgement back to the client).

So just in what I've been able to improve in a day has cut the amount of sync messages by over 60% as seen by the client, and probably much higher than that when taking into account dropped messages the server never even was putting on the wire due to reliable message pressure on that client's connection.  I'm continuing to work on the issue to reduce the amount by another 50% by calculating things like ship speed and wind angle on the clients for all ships instead of syncing that from the server, and improving my in house network transform syncing script.

Currently I use my in house network transform syncing script to sync the position of cannonballs while reducing the frequency of position/velocity updates needed in comparison to the script built into the networking API.  With my own custom script I only need to update cannonballs about 4 times per second compared to at least 15 with the built in version.  Unfortunately in testing it doesn't work smoothly when attached to ships so I'm still using the built in script for those.  The built in script needs to sync position 29 times per second to get a smooth movement and turning effect without any noticeable jitter or snapping, but my goal to be able to drop that to 10 or less per second.  This will be worked on again in the near future.

The slow rocking of the ship is also something currently synced entirely from the server to all clients.  I'm going to change that to being calculated locally on the clients, with periodic updates to keep in sync with the server.  So instead of several times per second the rocking with be updated by the server maybe once per minute.  We'll see how that works in testing and adjust from there.

So the ultimate goal is to be able to support approximately 60 players connected per zone, with at least a dozen AI controlled ships.  Actual play testing so far has been limited to far fewer numbers, but as more improvements are implemented we'll be involving more play testers and monitoring performance.  At this point I'm pretty confident we'll be able to hit those numbers.

No comments:

Post a Comment