Wednesday, November 14, 2018

Continued progress, networking API success, market updates, current development

Networking API Updates


The new in house networking API has been very successful.  Only relatively minor bugs and improvements have been needed to it so far after putting it through testing in the project.  The biggest issue found was some testers were unable to receive certain packets but were not disconnecting.  I tracked down the most likely cause to some ISP's were dropping larger sized UDP packets without fragmenting them even though the packets should not have been sending with the don't fragment flag (was using packet sizes near max MTU). 

When first integrated into Broadside, sizes of different messages and overall UDP payload size was hard coded in all over the place with ample buffer room so nothing would accidentally overflow.  I went and completed the packet size feature, where on one of the main scripts I can set the exact maximum UDP payload size (packet size equals UDP payload + UDP header + IP header) and all other scripts maximize message sizes to the exact maximum number of bytes to fit.  Then I set the default UDP payload size to 508, which is a good safe maximum from my research where ISP's won't drop (or even fragment) UDP packets.  Testing by testers with the problem ISP's was successful. 

I'm seriously considering releasing JCGNetwork to other developers for free either to use directly, or can be used as example code on how they can utilize the Socket class for their own projects. 

City Markets

Buying/selling at city markets is now complete, as well as the supply/demand system.  The lists of all items as well as what is available from NPC's at various markets is still being worked on, but the core mechanics and the UI are all complete and successfully tested.  It was kind of exciting making the first trade runs while trying to avoid NPC pirates.  Default pricing of various ships and other ship related items, which had previously been set at 1 real, have been changed to more realistic pricing.  The game is starting to feel like a real MMO.  

To do still are player created buy/sell orders.  The back end of this is complete, but the UI still needs to be created.  

Server Performance

I did a small redesign of how server instances are launched to support creating a separate development mode build just for the zone servers which I can attach the profiler and examine what is burning so much CPU.  CPU performance on the server has been a bubbling issue where the current CPU usage on the main thread would prevent hitting the target of 60 to 100 concurrent users per zone.  As it is AI ships are producing the same CPU load as player ships, so this was easy to test by increasing the number of AI ship spawns.  

Zone servers are multithreaded, but the main thread was using approximately 2/3 of the CPU cycles.  I'm using a 5 year old AMD 8 core CPU on the test server, so a newer CPU with better single core performance would raise the top end, but on the test server I'd see performance issues cropping up at around 40 ships, significant issues around 60 ships, and become completely unresponsive much higher than that.  

The profiler exposed that to my surprise it was the physics system using the majority of the CPU.  I'm going to experiment with disabling colliders on AI ships that aren't near any players, but I found that the default fixed time step of 0.02 (50 FPS physics frame rate) is unnecessary for this game.  I was able to significantly drop physics usage by going to a 0.04 fixed time step, cutting the physics CPU usage by roughly half with no negative noticeable change to the user.  

In addition, I had previously assumed that all of my various systems that are updating every ship on every frame were the primary cause of high CPU usage, but my previous optimization passes were apparently more successful than I had thought.  These systems used a very low amount of CPU.  Also JCGNetwork used almost no CPU (granted I was testing with low user counts), and GC (garbage collection - managed code automatic memory management) was insignificant CPU usage.  Still, once the physics system was reduced in CPU usage, my scripts were taking up a little over half of the CPU cycles used.  I tracked the issue to a 3rd party script related to the water system I am using, specifically the ship wake feature on the surface of the water as ships move.  Even when ships are idle this script was burning CPU, and I don't even activate the wakes on the server.  

A simple "if (JCGNetworkManager.IsClient())" line so none of the code runs on the server anymore, and this bug was resolved, dropping script usage by well over half.  And because I dropped the physics frame rate down to 25, I didn't see a reason to keep the overall frame rate at 60, so I went ahead and dropped that to 25 as well for now.  The result is CPU usage that may very well support 100 clients per zone as it is now, with several avenues for further improvement to be investigated.  

Current Development

My volunteer testers are getting excited about the game, I'm getting excited about the game, and things are going generally very well.  One issue that has bugged the testers is I am currently just killing the server processes which is causing corruption of the database (not data corruption, but certain updates from the zone servers are getting written to disk by the database, which conflict with other data that isn't updated yet).  The biggest issue is ships, for example if the player switches to a different ship and then logs off, that player is immediately updated in the database with the ship they are sailing, but the ships in the city are not.  If I kill the server processes before the cities update to the database the same ship could be recorded as both being sailed by the player and stored at the dock in a city, causing problems in the game because these situations are unexpected and aren't explicitly handled.  So for now I've been just frequently wiping the database, annoying the testers who put work into their characters.  Some of this is unavoidable, but once a week is an unnecessary irritation.  

So I'm currently working on the Command Console, which is a separate application for controlling the server cluster.  The first feature added is the server cluster spin down command.  Where the Command Console tells the Master Server to switch to auto spin down mode.  The Master Server then manages a safe power down of the entire cluster.  It starts by shutting down the Login Server, followed by all the Zone Servers.  Before shutting down, the zone servers disconnect all players and update all player characters, ships, city storage, and city markets to the database.  This is then followed by shutting down the Tracker Server, followed by shutting down the Database Server which writes to disk before exiting.  After that the Master Server powers off.  The server cluster is then ready for a build update safely.  

After this is added, then I'll implement automatic spinning down of empty zones, which is just a single zone server doing the same process as above, the master server powering on zones when requested by the login server or zone servers when a player needs to enter the zone, and finally transitions between zones as the player travels the map.  That will be exciting because a lot of the fun in Broadside is just sailing and exploring, but you're currently limited to zone 0's English Channel area map.  I'll then work on implementing the rest of Northern Europe, followed by the Mediterranean and west Africa, eventually getting to India and east Asia.  This will allow for a lot of the major trade routes to be tested, and get more feedback on the game, as well as allowing a lot of exploration with hours of sailing possible without hitting the edge of the implemented world, even though this will be only approximately 1/3 or less of the planned game world.  

Tuesday, September 18, 2018

Integration of JCGNetwork is complete! Quick discussion, and then quick dev roadmap.

The integration of the new JCGNetwork API, completely replacing the Unet networking API, has been completed.  Problems appeared with the design of the smooth syncing code regarding the turning of objects, so that has been disabled for now for a future redesign.  Overall though the process was time consuming, stressful, but was relatively free of issues.  I'm still looking for issues, significantly fewer bugs were found in the conversion and the new networking API. 

So in the immediate term, I'm still looking for new issues that have appeared as a result of the switch, but then returning to working on new feature development.  Specifically back to working on the market trading system, and AI ships.  The market trading system is already 2/3 done.  It is core to making money in the game, so very important.  The AI control of ships is core to a lot of the fun in the game outside of player vs player combat, so is also very important. 

Currently with market trading players can make purchases from cities, but the supply and demand system is not fully implemented, and players can not yet sell.  You also cannot yet place your own persistent market orders (either buy or sell) yet.  With AI, right now it will follow the closest player, but has not yet implemented more advanced behavior such as pulling alongside for a broadside, and does not yet know how to fire cannons.  Once that part is implemented there will be a bit more fun to be had. 

After those are implemented, then implementing the scooping of loot from item drops will be next, which ties in with both having fun fighting AI ships (which already produce item drops when destroyed, just as player ships already do), and ties in with the sale of that loot on the various city markets. 

The next feature after that will be completing the randomized map spawn points feature, which will be the semi-random appears of things like AI spawn points, free loot drops, resources such as schools of fish, etc.  This is about 1/3 implemented already.  This was held back partly by Unet due to the inefficiency of spawning a high number of AI ships in a zone even at large distance to any players.  All players would get the full set of updates for these AI ships.  Unet did have a feature to handle this, the network proximity checker, but was not a good fit for the game's design.  I believe this should be solved with JCGNetwork's subscription system. 

At that point the game will be in a good place to finish implementation of the zone transfer system, where when the player reaches the edge of a zone they are automatically transferred to the adjacent zone server.  This will then allow for very long distance travel, differences in the difficulty of AI between zones, and the leveraging of the market trade system for the player to make money in game.  Then I can begin the work of building out new zones for the player to enter and explore, eventually covering the entire world.  The focus initially will be on the British isles, followed by Europe as a whole, then Africa and on to India. 

Further in the future will then be resource gathering, manufacturing, player skill training and skill trees, and player owned trading companies.  At that point the game will be possibly in a state to start getting some public feedback through a private or even public beta. 

Monday, September 10, 2018

New Networking API Complete! - JCGNetwork

The new in house networking API has now been completed and tested, and I've started with the task of ripping out Unet and replacing it with the new API - JCGNetwork.  I took some inspiration from the Unet HLAPI but no code, and really do everything a whole lot differently.  I'm considering releasing it as its own product to help other developers.

JCGNetwork uses a layered transport on top of UDP, supporting multiple channels, channel settings, and flexible message sizes.  All channels have packet ordering enforced, remove duplicates, and channels can be reliable or unreliable.  JCGNetwork is multithreaded, where the actual sending/receiving occurs on a separate thread, while the processing of the message contents occurs on the main thread.

With Unet I was implementing encryption on a message by message basis, but I've integrated a fast and weak encryption as a channel option of JCGNetwork.  The encryption is by no means hack proof, but will make casual reading or spoofing of packets more difficult.  Additionally I've integrated both hardware and IP ban lists.

With Unet messages sizes, rate of message send, and message buffer size were all limited in their configuration.  In fact, I suspect a bug in the fragmented message system in the Unet LLAPI was my primary problem I was hitting in the end.  Even without that possible issue, Unet's fragmented message system supported only a small number of fragments, meaning that message size was always a concern outside of performance.  Standard channels couldn't support more than a single packet of data (up to 1500 bytes), and fragmented channels supported only up to 64 fragments, but chopped them down to 500 bytes by default.  Messages would also be held to wait for additional messages to combine with by default of 0.1 seconds, and was only globally configurable.  Message buffers could overfill without any way of seeing what their status was, without any notification other than in the console log, and without any remedy other than possibly slowing down sends even though your code couldn't know to do so.

With JCGNetwork message fragmentation supports up to max int number of fragments for a single message (2 billion+), and occurs automatically without having to configure the channel for fragmentation.  Not that you should try to send a 2 billion fragmented message, but it basically just gets the networking API out of the way as a blocker to sending large messages if needed.  JCGNetwork supports small message combining with again a default of holding for up to 0.1 seconds, but this is per channel configurable.  You can have a separate high performance channel with a 0 wait time that sends immediately.  This is nice because when you need performance for things like close by objects updating their positions, the performance is there, and when the object is distant you can use a channel with a holding time for more efficient use of the network when speed isn't that important.  JCGNetwork also places no arbitrary limits on outgoing or incoming buffer sizes either, using basic lists and queues that can again support up to max int number of items, so you'll run out of memory rather than run out of buffer size, but it should never get to that.

The high level networking uses a scheme similar in concept to the Unet HLAPI but functioning entirely differently.  Rather than using a not very performance friendly automatic serialization/deserialization system, with JCGNetwork the developer manually writes all serialize/deserialize functions for all messages, RPCs, and sync variables.  This takes a little longer to develop the game, but results in a more clear understanding of what is going on, more control, and should be higher performance.

The high level API of JCGNetwork still uses the concept of a Player Object, and uses a unified RPC like system in place of Command, ClientRpc, and TargetRpc of Unet.  In JCGNetwork the server can always send RPCs to clients, but the client can only send an RPC to the server on an object owned by that client's connection.  This though can be a weakness of Unet, so in JCGNetwork I've added the option on an object by object basis to allow unsafe client RPCs, where any client can send an RPC on that object even without being the owner.  Similar to Unet, sync variables only go from server to client, but differently the server will send all variables instead of just what changed, and the server needs to manually set a bool to true when it wants to send those variables to the clients.

Unet had a network visibility system that would only instantiate network objects within a certain range.  This had many limitations, such as basing on the physics system so needed a collider, and it may not be ideal to wait to instantiate an object until it is close.  There were also a good number of reported issues with this system that never were resolved.  JCGNetwork uses a completely different system called object subscriptions.  Every few seconds all networked objects will check distance against all Player Objects and add or remove those players' connections from their subscribed lists.  All connected clients will instantiate all networked objects, but unsubscribed connections will get fewer or even no updates on those objects, depending on the scripts run on those objects.  This allows for things like frequent position updates for objects close by, but objects far in the distance are still seen but their position isn't updated as frequently because it doesn't really matter at that distance.

So the work is on integrating this new system.  I expect the work to be completed within a few weeks.  I'm really excited about it, so I'll probably put in a lot of extra time so I can see the results.

Tuesday, July 31, 2018

Custom Networking API

So over the last few months I've been having difficulty with Unity's Unet networking API.  Contrary to the experience of some, the High Level API (HLAPI) has been working pretty well.  The issues have all come down to the Unet Low Level API (LLAPI). 

It is difficult to tell what the problems are, because the LLAPI is kept largely as a black box.  The problems may be possible to resolve, but it is almost impossible to see what is getting stuck in the LLAPI buffers and there is no way to directly clear them.  The buffer sizes even when maxed out all turn out to be arbitrarily low.  This means, in my opinion, Unet is good for small games with limited data transfers, but does not scale up well to an MMO. 

So after months of fighting with it I've decided to take the worst tasting medicine and develop a custom networking API for Broadside.  This will add approximately 3-5 additional months of work, but should eliminate any further networking issues with lack of visibility. 

Tuesday, June 26, 2018

Networking and Technical Debt

Early in the development of Broadside I was still learning the networking API and was just trying to get something to work.  The intention at the time was to get ships sailing, cannonballs flying, and prove the game was possible.  Getting things working at all was put as a priority over getting things to work the most efficient way, with the idea being that I would revisit these systems at a later date to optimize them.

Unfortunately that time has come.  As time has gone on, more systems have been added to the game that need to be synced to the clients.  From crew to cargo to city markets, there's added up a good amount of data now that needs to be sent to the clients and kept in sync.  After adding some larger ships, which necessitate even more data to sync, I hit a brick wall that took a little while to understand.

I was getting errors on the server where a client would no longer receive any server updates.  I tracked the issue down to something I'll call "reliable message pressure", where reliable messages are attempted to be sent to the clients at a rate faster than the server can actually send them (number of messages here is more important than the amount of data in each message, as the amount of data in most messages is actually incredibly small) and that would result in messages being delayed and ultimately available buffers for either storing new messages to send or waiting for acknowledgement messages to be returned, would be filled and no new data could be sent.  This would prevent even unreliable messages from sending as well.  This would appear on the client as if it froze, but it didn't actually disconnect.

At first I simply attempted to increase the buffer space, but that just resulted in delaying the issue.  Then I went and did a proper investigation and found it had to do with how I implemented syncing the state of the ships in the first place - mostly code that has been in place since the beginning of the project and had forgotten exactly what I had done for expediency.

I was syncing a timer every frame over the reliable channel that the clients weren't even using.  I was syncing data like crew allocations to every client when only the owner of the ship cares about that data.  I was sending requests from the client to the server to turn the ship every frame over the reliable channel.  I was syncing data like a ship's cargo to the owner of the ship every few seconds even if the contents had not changed.  And much more.

So I've eliminated the error now by going through and syncing any data only the owner of the ship cares about just to that client, instead of all clients.  I've moved any non-critical syncing or commands from the player, such as turning the ship, over to the unreliable channel (if you're holding a turn key it doesn't matter if a frame or two gets lost, and there is no need to send an acknowledgement back to the client).

So just in what I've been able to improve in a day has cut the amount of sync messages by over 60% as seen by the client, and probably much higher than that when taking into account dropped messages the server never even was putting on the wire due to reliable message pressure on that client's connection.  I'm continuing to work on the issue to reduce the amount by another 50% by calculating things like ship speed and wind angle on the clients for all ships instead of syncing that from the server, and improving my in house network transform syncing script.

Currently I use my in house network transform syncing script to sync the position of cannonballs while reducing the frequency of position/velocity updates needed in comparison to the script built into the networking API.  With my own custom script I only need to update cannonballs about 4 times per second compared to at least 15 with the built in version.  Unfortunately in testing it doesn't work smoothly when attached to ships so I'm still using the built in script for those.  The built in script needs to sync position 29 times per second to get a smooth movement and turning effect without any noticeable jitter or snapping, but my goal to be able to drop that to 10 or less per second.  This will be worked on again in the near future.

The slow rocking of the ship is also something currently synced entirely from the server to all clients.  I'm going to change that to being calculated locally on the clients, with periodic updates to keep in sync with the server.  So instead of several times per second the rocking with be updated by the server maybe once per minute.  We'll see how that works in testing and adjust from there.

So the ultimate goal is to be able to support approximately 60 players connected per zone, with at least a dozen AI controlled ships.  Actual play testing so far has been limited to far fewer numbers, but as more improvements are implemented we'll be involving more play testers and monitoring performance.  At this point I'm pretty confident we'll be able to hit those numbers.

Wednesday, June 20, 2018

Firing a full broadside

Now that the majority of functionality for what you can do when docked at the cities has been implemented, as well as making market purchases, development is moving on to implementing additional ships and items into the game, implementing more combat mechanics, filling out NPC ship AI behavior, and gathering items in the game. 

Part of the updates to the combat mechanics is an alternate way to fire your ship's cannons.  The standard way of firing cannons is to select a side of your ship, aim the cannons, and press the space bar to fire an individual cannon.  You can hold down the space bar for continuous fire. 

Some ships have multiple gun decks on the port and starboard sides.  This allows for installing different guns on these decks.  All guns on a single deck need to be of the same type, and load the same ammunition, but multiple gun decks allows for varying your weaponry on the same side of the ship.  When firing using the space bar you manually switch between gun decks. 

The new "Full Broadside" button, available under the Guns status tab in the lower left of the screen, ignores most of that and automatically queues up all guns on the entire side of the ship to fire as fast as possible.  This has advantages and disadvantages compared to firing using the space bar.  Normally guns on a single deck are fired in sequence starting from the front of the ship and working their way to the rear.  When firing a full broadside, all guns are fired in a random order.  In addition, when firing by space bar you can continually adjust your aim for each shot.  In full broadside mode, the aim of all guns is set when you press the Full Broadside button and can't be adjusted again until all guns have fired. 

Those are some of the disadvantages, but on to the advantages.  Once you initiate a full broadside, you can switch to an alternate camera, either out to 3rd person or looking down another cannon sight.  You can even fire from a second side of the ship while your crew is still firing your broadside from the first side.  It is even possible to initiate what is called a double broadside, where you do a full broadside from both port and starboard simultaneously. 

Another large advantage is specifically for ships with multiple gun decks on the sides.  Gun decks are limited to a rate of fire, but when you have more than one gun deck firing at once, the overall rate of fire is increased.  If you have 3 gun decks firing at once, the effective time between shots is cut to 1/3 of normal. 

This all adds up to a way to effectively deal a fast volley of damage, most effective at close range due to limitations to your aim. 

Saturday, June 2, 2018

Current Progress

Just a quick update.  Implementation of the City UI for all things regarding moving items and outfitting ships is now completed.  This was a complicated and time consuming feature that is pretty central to the game.  Development is now on finishing the market system and implementing its UI.