Friday, August 30, 2019

Networking Code Gets Complicated

Disclaimer: You'll only find this interesting if you're curious about network game programming.  Otherwise you might as well skip this one.  

So I noticed a couple months ago that there was a problem with the positioning of ships far away from the player.  Their position was not updating properly, or not at all, until the player gets within distance of about 2/3 the way to the edge of the minimap to them.  Then they suddenly snap to the correct position.  Now that is the distance as seen on the server, not the client.  The biggest problem on the client is when ships have moved far away from their starting position the player could approach what they believe to be a nearby ship and get right onto them, but that ship is actually very far away and the client is just not getting an update.  

So this was confusing to me, because this was not always the case.  When I originally wrote and tested the network API this appeared to be working fine.  I've also seen it working fine in Broadside previously.  But I have made some updates to the network API since then, and changes to the ships themselves, so I was not sure when this broke or why.  I did though know it had something to do with being subscribed or unsubscribed.  

Object Subscriptions


So to reduce network traffic and support larger numbers of networked objects I created a system built into the network API I call Subscriptions.  The way it works is periodically at a set interval the zone server checks range from all connected players to all other networked objects.  Any objects within that range the player's connection gets subscribed to, and beyond that range they are unsubscribed.  Networked objects are anything the server controls and syncs with the clients.  Ships and cannonballs are the best examples, but there's actually even more stuff.  

Currently I'm running the zone servers at a frame rate of 25 frames per second, and sending position updates for ships specifically every 1/25 seconds, or about every server frame.  That is the rate for subscribed clients.  If the client is unsubscribed to the object I was sending an update once every 2 seconds.  This significantly reduces the amount of data the server needs to send to each client, but shouldn't look that odd since those ships getting slow updates are far away.  

There's more to subscriptions than just that too.  Such as syncing specific variables only to subscribers like how much flooding a ship has, how many crew it has.  When a client gets subscribed all these "SyncVars" are automatically synced all at once to the newly subscribed client, so there's no difference really if it is information the client only needs when they are close to the object (those 2 specifically are just used for displaying above the ship at close range).  

Investigation


So I determined the positioning problem was only affecting objects the player was unsubscribed to, primarily ships.  What is going on with the unsubscribed ships?  Looking at the code I didn't pick up on anything obvious.  I first tried reducing the unsubscribed update frequency to once a second.  No change.  I found a bug that in certain circumstances could cause a network object which snapped to a new position on the client to be moved back to a previous position, which may appear that it didn't actually move.  That's the bug I fixed for 0.7.12 mentioned in the dev log.  That turned out not to be the cause though.  

I investigated if sending messages targeting unsubscribed clients in general was broken (I don't use this in many places in the code, so it was possible), but no dice.  The network API uses a channel system, where each channel has different settings.  Settings include whether to send with encryption, to send reliably, to hold short messages for a short time to allow for other messages on the same channel to be combined, etc.  The JCGTransformSync component that handles syncing positions actually is the only thing which uses channels 2 and 3.  Channel 2 is used for subscribed clients, and channel 3 is used for unsubscribed.  Almost everything else in Broadside is either on channel 0 or channel 1.  Ah ha!  Something might be set up wrong with channel 3!  

Nope, nothing wrong with channel 3.  

I looked at a lot more areas of code, more investigating, and today I finally found the issue as I was walking through all the code involved.  

The Cause


So the issue ended up being in JCGTransformSync itself.  Even though I send position updates 25 times per second to subscribers and now 1 per second to unsubscribed clients, I only do so if the object has moved since the last time an update was sent.  I send the update and save those values.  The next time I try to send an update I check if the object has moved more than a certain amount (very small amount) and only if so do I send the update.  This means that a large fleet, even if you are subscribed, actually generates next to no network traffic if they are not moving.  

The problem was I send the update to all subscribed clients, save the new position sent.  Then I try to send to all unsubscribed clients and first compare the current position against the last sent position.  Well it is comparing against the position just sent to the subscribed clients a moment ago, which is exactly the same as the position now obviously.  So no unsubscribed updates are sent to anyone.  

This bug though has been there since I originally wrote JCGTransformSync though, why didn't I see it back then?  I've only noticed the issue for a few months at most.  

Ahh, the bug has existed since Broadside 0.6.0, since September 2018, but it only became really noticeable starting in 0.7.2 when I changed the update frequency from every 1/10 second to every 1/25 second.  You see at a 25 FPS frame rate when sending updates to subscribers every 1/10 second there will be some frames when I don't try to send an update to subscribers but I do try to send an update to the unsubscribed clients.  The objects are all really distant so I probably didn't notice they weren't happening as often as they should, but the positions wouldn't end up that wildly off as they have been.  Now that I'm sending subscriber updates every frame (even when there aren't actually any clients subscribed it still runs through the code as if it is going to but just doesn't send anything out) there never is a frame when attempting to send to the unsubscribed in which I didn't already attempt to send to the subscribed.  

So now I changed the code so whenever it tries to update the unsubscribed it ignores if the object has moved or not.  It just sends them.  Sending an update every second or two for each object isn't a lot of traffic, but I may revisit this later to track last sent subscribed and unsubscribed positions separately.  But for now this feature should be working.  

No comments:

Post a Comment