Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Early Retry Mechanism for SteamNetworkingSockets Authentication #301

Open
nicopaes opened this issue Oct 17, 2023 · 9 comments
Open

Early Retry Mechanism for SteamNetworkingSockets Authentication #301

nicopaes opened this issue Oct 17, 2023 · 9 comments

Comments

@nicopaes
Copy link

Hey GameNetworkingSockets maintainers!

As a developer working with the Steam API through the Steamwork.NET wrapper in Unity, I am reaching out to bring an issue to your attention and seek your insights on how to enhance the authentication process in our integration.

Technical Info:
Unity Version -> LTS 2021.3.17f1
Steamworks.NET Version -> 20.2.0

Issue Description:
Our users have been encountering authentication issues when trying to connect to the Steam backend.

Currently, we employ the SteamNetworkingSockets.InitAuthentication() function to initiate and, if necessary, retry the authentication process. The challenge we face is that, according to the documentation, this function allows unlimited retries, but only after the previous attempt has failed entirely.

The crux of the matter is the substantial delay before the authentication process is deemed a failure, resulting in unnecessary waiting periods. We're trying to figure out a method for introducing early retry attempts without the need to wait for a complete failure – much like a timeout mechanism.

We kindly request your expertise and advice on this matter. Are there any suggestions, ideas, or potential solutions you could recommend to help us implement retries without having to wait for the current process to fail entirely?

@zpostfacto
Copy link
Contributor

Do you know which step is taking a long time and stalling the retry?

@nicopaes
Copy link
Author

Do you know which step is taking a long time and stalling the retry?

Yes. Using the SteamNetworkingSockets.GetAuthenticationStatus(out netAuthenticationStatus_T) to check the current Status we receive back the enum with the value k_ESteamNetworkingAvailability_Retrying.

When the AuthenticationStatus is in this state calling the InitAuthentication() doesn't restart the process. We've to wait until the Status returns ESteamNetworkingAvailability.k_ESteamNetworkingAvailability_Failed. Sometimes this takes 5 to 10 seconds depending on the user.

@zpostfacto
Copy link
Contributor

Can you tell which step it is waiting on? Maybe the contents of SteamNetAuthenticationStatus_t::m_debugMsg will say what it's doing?

@nicopaes
Copy link
Author

I've the m_debugMsg printed when the status changes via callback.

It usually goes:

-> k_ESteamNetworkingAvailability_Retrying :::: Attempt #X to fetch config from https://api.steampowered.com/ISteamApps/GetSDRConfig/v1?appid=X

-> k_ESteamNetworkingAvailability_Failed :::: No response from server

@zpostfacto
Copy link
Contributor

I would really like to debug this. How often is it happening? We designed that endpoint to have extremely high availability. It's actually served by Akamai and we set all sorts of aggressive http caching headers so that Akamai will serve state data if Steam is down, etc.

I think the answer to your immediate question is that there isn't much more we can do to "kick" the API. You can listen for a callback when the authentication status and immediately retry if it fails. You can just sit in a loop and constantly ask it to initialize until you get back success status. But if one step is just stalling, it's just not working and we cannot really retry again before the previous attempt fails. (I don't want to change the code so that there can be more than one request in flight at a time.)

One thing I can investigate is adjusting that WebAPI fetch to use a shorter timeout. It really should be nearly instant. But I think if we're waiting on that fetch, there isn't much more we can do if that isn't working.

Also - the very first fetch might fail legitimately, that is normal, since we use the only-if-cached header and so it will only check the local cache. That might fail immediately, and that's expected and normal. The idea here is that we have cached data we apply it immediately, and then we immediately issue a real request to check for an up-to-date version.

If you have any tools at your disposal to help me understand why that API fetch is failing, I would really appreciate it. It does seem to be failing more than I would expect given the significant measures we have take to make it highly available.
(I am looking into the same basic problem in CSGO.) Where are you in the world, when you do a DNS lookup on api.steampowered.com, what Akamai edge hosts will serve the request, have you noticed any patterns that cause it, etc?

@nicopaes
Copy link
Author

How often is it happening?

According to our tests, this is happening 90% of the time. The game we're working on has a pretty massive Chinese following, so players from this region are experiencing this issue, unlike the rest of the world.

Where are you in the world, when you do a DNS lookup on api.steampowered.com, what Akamai edge hosts will serve the request, have you noticed any patterns that cause it, etc?

We are using a function from a library (Heathen Steamworks) to determine the host's country from their IP (it is highly likely that they use a Steam API call internally). We can see that the players experiencing this issue are from China, Hong Kong, Taiwan, and Singapore without using VPNs.

With the force retry method, we are achieving an 85% success rate for the authentication to occur within a 10-minute window. Based on the logs, it takes an average of 9 calls to the "GetSDRConfig" endpoint. This is a good success rate, but 10 min average to connect is too much to ask to players, so an early-retry mechanism would help us expedite the times for players.

We can conduct some additional testing and try to log more information as you requested to provide more context for you. For now, I thought it would be important to share this additional information.

@zpostfacto
Copy link
Contributor

Got it, that makes sense. I have made some improvements for China specifically. I've shipped them in CSGO and will try to get them into the full Steam client ASAP.

@nicopaes
Copy link
Author

Got it, that makes sense. I have made some improvements for China specifically. I've shipped them in CSGO and will try to get them into the full Steam client ASAP.

That's great to hear, looking forward to it! Can you give us a heads up when it goes live so we can run more tests?

@bravarda
Copy link

bravarda commented Nov 8, 2023

Hey there, any news on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants