• You've discovered RedGuides 📕 an EverQuest multi-boxing community 🛡️🧙🗡️. We want you to play several EQ characters at once, come join us and say hello! 👋
  • IS THIS SITE UGLY? Change the look. To dismiss this notice, click the X --->

News - EQ Latency Being Addressed (1 Viewer)

Sic

[sic]
Moderator
Joined
May 5, 2016
RedCents
29,308¢
Pronouns
He/Him


For those of you upset and frustrated by lag, we understand and hear you. It frustrates us, too. This post is meant to provide some insight into what causes these issues, how we got to where we are today, and what is being done to improve performance now and in the future.

Game performance is something we work on continuously and is very challenging to solve completely. EQ has a lot of complex systems, and at least 25 years of code. More than 50 different programmers have come and gone over the years. One of the many challenges of working on EQ is understanding what someone was trying to do when they wrote the code we now are modifying, and doing so correctly without breaking existing things. It is common that we do break things (despite thorough testing), and we fix them as soon as we can.

Unfortunately, in the past, EQ hasn't always reliably measured what impact changes to the game have made to performance over time. As more and more things were added to the game, performance declined. The game's current state is that we need to fix numerous issues that can be quite difficult to fix (a minor improvement may take one programmer a week to complete), and see only a small improvement which is imperceptible to players.

Over the past few years, we have made a lot of changes to alleviate performance issues, but it is still a significant problem, and is well known and cared about by the developers. However, it is understandable that these changes feel as though nothing has been done. It is as frustrating for you as it is for us to have poor server performance. We want nothing more than to allow more players to be on at the same time, for zones to run smoothly in raids, and for playing the game to be a good experience. This year, the top issue for the engineering team to tackle is raid zone performance. It is important to us to improve things, and we know very well the impact it has on your enjoyment of and ability to play the game.

Some of the hits to performance in EQ in the past were to help with other problems, like lack of memory. Now that EQ is 64-bit, this is no longer an issue. It is possible these memory improvement changes can be reversed in favor of game performance, but much like that work was an enormous task to accomplish and polish such that the game worked, reversing these would be a significant effort, again, with no visible change (i.e. a massive amount of work and changes to result in no visible change to players which is sometimes the whole point). For these reasons, a lot of the time it looks as though nothing is being done. For every live patch there are numerous internal patch notes not visible to players, often a much longer list than the live patch notes.

There has been some speculation on the forums that EQ's performance issues are caused by poor quality hardware. In the past few years, EQ has had a complete turnover in the hardware running our world and zone servers. In the next month, we are planning to add additional hosts and memory to run zones. These kinds of changes are made all the time and are mostly invisible to players, but do help improve the game and how it performs. Running good hardware is something EQ invests highly in and improves continuously in small increments. There are teams who assist EQ behind the scenes (and Daybreak as a whole) with this kind of work. A lot of people contribute positively to the uptime, performance, and success of this game.

There are two major types of lag in EQ. World/server lag (i.e. Antonius Bayle or Zek), and zone lag (raid performance issues). Symptoms of world lag are things like chat lag, slow zone times, and overall performance. World lag also affects how quickly zones respond to things that rely on the world—if world lag is bad, zones will also begin to fail. Symptoms of zone lag include delays on casting spells and the movement of NPCs (like rubberbanding). Zone lag is the main issue regarding raid zone performance, but it is partially impacted by the world's performance as well.

So what causes world and zone lag? One of the issues that affects both the zone and world is a large quantity of packets. The zone and world will only process packets for a maximum time of 500ms before moving on. This causes a cascade of issues where players see there was no response to their action, and so they do it again, or they spam it. The world/zone gets more and more requests and can't handle them. A good example of this is when PoP releases on a TLP server. At the launch in Plane of Knowledge, each request to enter and leave the zone is fairly taxing and it takes the zone an astounding amount of time to catch up. It is barely able to transfer players to the next zone due to the amount of requests it is getting. This bottleneck affects other zone functions like the library door not responding, and players are unable to do almost anything. If the world gets too many requests, it will fail to update important things. If the world does not touch base with each running zone every 5 minutes, they are considered disconnected. When this happens, we see issues like zones going down, players getting disconnected, players logging in again which is taxing on the world, causing more zones to go down and aggravating the issue to the point where the world itself may go down. This occurs on new TLP servers if there is no limit on the players who are allowed to log in. It can also cause issues on live servers when many players log in at the same time after a patch.

World lag is also impacted by the number of players online and the number of zones running. The more players online and the more zones, the worse performance gets. This becomes worse as the game advances through expansions. Early TLP servers can support more players than servers with all expansions released. As more systems are active, all that data is being handled, and it takes processing time. One of the major culprits of world performance is real estate - each time your character zones, all the items you own in any real estate need to be reloaded, sent to the world, and sent to you on the destination zone. The lookup of this data has a minimum response time of 60ms. When a lot of players are on a server, that time adds up.

Issues unique to zone lag are primarily caused by spells, combat, and the movement of NPCs. The large quantity of spells that go off in the modern raiding game cause the zone to hit the limit of 500ms of processing time repeatedly. This is when you start to see performance issues. Each zone loop, if the zone doesn't process fast enough, we don't process NPC movement in time, and you see things like rubberbanding and failure for event mechanics to occur in a reasonable time (such as the auras in Oubliette of Light and Aten Ha Ra). When you think about the number of pets (specifically swarm pets) that can be created in a raid, think also about the amount of time the zone needs to spend processing their movement. When the zone loop takes so long to occur, NPCs can barely move (even if you exclude pets), because the zone is busy processing hundreds of spells in each loop. When you think about the numerous SPAs and complexity of things they can do in EQ, that is part of why spells are so taxing; and also why making improvements to how the spell system performs is so difficult.

This issue is compounded by melee procs, spells that land on many targets (Splash/Squall), spells that trigger other spells, sympathetic procs, and Twincast. Along with spells, melee has an impact - think about the number of swings you make compared to the early game. As more and more abilities were added to the game that allowed players to get additional hits, this added another round that the zone needs to process each hit. Melee combat code is not as complex as spell code, but it is not great. Think about each hit checking for hit/miss, riposte, parry, block, dodge, strikethrough—calculating the impact of heroic stats on each one. One of the especially negative impacts on melee performance has been the SPA that returns HP on swings. Each hit is now going through spell code as well as melee code. Compared to spells, an enormous amount of melee hits occur in the same timespan. By themselves, no spell or ability seems outlandishly bad for performance. But put them all together in a raid with 54 players, and the zone is brought to its knees.

One question players have about raid zone performance is why it varies in how bad it is. Each time you request an instance, it can be started on any one set of numerous zone hosts. These are very powerful machines that have load balancing to try to prevent too many raid instances and high load zones from running on the same host. Despite this, if one host is running several raid zones that are very taxing on the CPU, it will negatively affect other zones running on the same host. If you have a raid instance that performed strangely well, the zone host it was running on was not running as many other raid instances (if any). This is part of why raid lag is much worse on Sundays, as that is the most popular raid time in the game, and gives you the highest probability of having your instance run on a taxed zone host. Changes being made in this month’s update will also help zone performance by fixing a bug that allowed instances to stay active longer than they should've. This caused zone performance to degrade the longer a server was up.

What is being done about game performance? Last year, we completed a project to implement a telemetry system for accurately measuring performance. It allows us to see detailed information about what and where in EQ’s code things are running poorly. We periodically capture performance metrics from worlds, zones (especially raid zones under a lot of strain), and the game client. This year we are focusing on improvements specific to raid zone performance, as that is the most significant issue in the modern game. These performance captures allow us to identify performance bottlenecks, so we can focus our efforts on things that will make the most positive impact. Aside from that, one of the most common types of performance issues are things that “block” a process from doing anything else. For example, when you log into a world your character may not be cached, and the world will do what is called a “blocking load” on the character database. While this load is being processed, the world is not able to do anything else. These type of blocking load issues are fixed (in a rough sense) by making them asynchronous. We create a request list for data from another process and allow things to continue being processed until the data we requested arrives. When it arrives, the process continues to where it needs to go as the original code did previously. These types of changes are sometimes more difficult than they sound, and are often risky depending on the different ways that data can be requested, and handled after the fact. Some examples of changes like this that we’ve fixed are fellowship experience sharing, loading guilds when entering the game from character select, real estate definitions (aside from items), and applying account-wide lockouts. EQ has a surprising amount of complexity when it comes to how different game mechanics perform.

We hope these details have provided some insight into the intricacy of EQ's performance. Know that the development team does care and wants to help improve how the game runs. We make improvements as we are able, and will continue to do so in the future. It is very challenging, but also a very important investment into the health and future of the game.

—EverQuest Team
 
News - EQ Latency Being Addressed

Users who are viewing this thread

Back
Top