Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

10.49.1 failing to connect to Atlas Device Sync #8531

Closed
andreasley opened this issue Mar 29, 2024 · 26 comments
Closed

10.49.1 failing to connect to Atlas Device Sync #8531

andreasley opened this issue Mar 29, 2024 · 26 comments

Comments

@andreasley
Copy link

andreasley commented Mar 29, 2024

Description

I'm using MongoDB Atlas with Device Sync.
The bug appear after upgrading after upgrading from Realm v10.49.0 to v10.49.1.

After launching the app, the sync error RLMSyncErrorClientSessionError is passed into the closure at app.syncManager.errorHandler. The error does not contain any additional details.

The error is triggered by a RLM_ERR_CONNECTION_CLOSED that's originating here:

https://github.com/realm/realm-core/blob/4eef991270a5e2161e37e8c227404327ac61f618/src/realm/sync/network/websocket.cpp#L1021

The SDK keeps trying to connect to Atlas Device Sync every few seconds and all attempts keep failing.

No log entries are shown in Atlas Device Sync for the device in question.

The error is reproducible on both Macs I've tested (Mac Studio and MacBook Pro, both with Apple Silicon) and on an iPad. One Mac is using wired Ethernet and the other devices are on Wi-Fi.

Can you reproduce the bug?

95% of the time. It first only appeared on the Macs, but after a few hours, the iPad started exhibiting the same behavior.
If the app and all associated data is deleted from the device and then reinstalled, it will work fine the first time it is launched.

Version

10.49.1

What Atlas Services are you using?

Atlas Device Sync

Are you using encryption?

Yes

Platform OS and version(s)

macOS Sonoma 14.4.1 (23E224)
iPadOS 17.4.1

Build environment

Xcode 15.4
Swift Package Manager

Copy link

sync-by-unito bot commented Mar 29, 2024

➤ PM Bot commented:

Jira ticket: RCOCOA-2324

@andreasley andreasley changed the title 10.49.1 failing to connect to Atlas Device Sync on macOS 10.49.1 failing to connect to Atlas Device Sync Mar 29, 2024
@andreasley
Copy link
Author

andreasley commented Mar 29, 2024

The problem seems to be that the SDK is trying to connect to the wrong host.
In SyncSession::create_sync_session, session_config.server_address is...

  • ws.realm.mongodb.com when the connection fails
  • ws.europe-west1.gcp.realm.mongodb.com when the connection succeeds

My suspicion: The URL is only updated if the user token is missing or expired.

This bug may have been introduced in realm/realm-core@255cb33.

@Jaycyn
Copy link

Jaycyn commented Mar 29, 2024

I think I duplicated this issue and was seeing the same thing.

However, right in the middle of typing this, it seemed to resolve itself.

The issue appeasrs to revolve around running the app with an already authenticated user - it seemed to be trying to connect to an incorrect endpoint. I had to log out and log back in and then it worked and connected to the correct endpoint.

Can you try your code again and see if it's an ongoing issue?

@Jaycyn
Copy link

Jaycyn commented Mar 29, 2024

Well, let me retract my above comment as it's doing it again.

After updating to 10.49.1 the app does not connect and the only error shown (in the app) was

Info: Connection[1]: Closing the websocket with error code=WebSocket: Internal Server Error, message='error', was_clean=true

Upon checking the Logs in the Realm Console, there was no error.

I then re-ran the app (with logged the user out), logged in and it connected and worked correctly. No further issues.

However, if I quit the app while still authenticated, and then re-run the app, it attempts to connect but then fails with the same above error. I have to log out and log back in to connect.

I can verify the endpoint it attempts to connect to when starting the app with an already-authenticated user is different than the connection if I log out and log back in - maybe that's by design?

Here's what I see if logging in, quitting and re-running the app
Info: Connected to endpoint '3.18.10.3:443' (from 'my ip address')

Here's what I see if I log out, quit the app, re-run it and log back in
Info: Connected to endpoint '35.168.174.121:443' (from 'my ip address')

There are no errors in the console at all - but that being said, the endpoint doesn't seem to the correct one if the user is already authenticated upon trying to connect

My endpoint looks like

Info: Connection[1]: Connecting to 'wss://ws.us-east-1.aws.realm.mongodb.com:443/api/client/v2.0/app/my_cool_app-xyzzy/realm-sync'

Is this a coding issue on our end or a SDK issue? If it's determined to be a coding, we are using the code to log in from the documentation and can supply that.

@KudeusVince
Copy link

We are facing exactly the same issue, please help

@vishaldeshai
Copy link

After migrating version to 10.49.1, we also faced the same error.

Error : Info: Connection[1]: Closing the websocket with error code=WebSocket: Internal Server Error, message='error', was_clean=true

@anton-plebanovich
Copy link

I have the same error after updating to 10.49.1

04.02 18:53:14.345 | R | [Info] Connection[1]: Connecting to 'wss://ws.realm.mongodb.com:443/api/client/v2.0/app/***/realm-sync'
04.02 18:53:14.346 | R | [Detail] Resolving 'ws.realm.mongodb.com:443'
04.02 18:53:14.348 | R | [Detail] Connecting to endpoint '3.7.61.93:443' (1/3)
04.02 18:53:14.556 | R | [Info] Connected to endpoint '3.7.61.93:443' (from '192.168.0.102:55873')
04.02 18:53:15.457 | R | [Detail] Connection[1]: Negotiated protocol version: 11
04.02 18:53:15.457 | R | [Info] Connection[1]: Closing the websocket with error code=WebSocket: Internal Server Error, message='error', was_clean=true
04.02 18:53:15.458 | R | [Detail] Connection[1]: Allowing reconnection in 31856 milliseconds

Copy link

sync-by-unito bot commented Apr 3, 2024

➤ marysiapietraszewska commented:

[~jason.flax@mongodb.com] [~jonathan.reams@mongodb.com] can you take a look at it?

@michael-wb
Copy link

Hi @andreasley,

This looks like it is related to the latest base URL changes added recently - the websocket will try to connect to the default websocket address determined by the provided App::Config::base_url value, or based on the default (https://realm.mongodb.com, soon to be https://services.cloud.mongodb.com) if none is provided. If this fails, it will try to query the location endpoint to try to retrieve the correct localized websocket URL for the app and use that when reconnecting.

When you log in before starting the sync session with the cloud, the location info was already queried/updated per the log in request to the server. The websocket connection is a bit different, since it uses a different code path/underlying driver and updating the location takes a bit more coordination.

I am working on a couple of fixes around this area, but a temporary workaround, for now, would be to specify the localized base_url value (e.g. https://us-east-1.aws.realm.mongodb.com) when creating the App object in your client app.

@andreasley
Copy link
Author

Thanks for looking into this, @michael-wb.

RealmSwift.App doesn't have an initializer that allows specifying the URL, but I can deploy with v10.49.0 in the meantime.

@michael-wb
Copy link

You can configure the baseUrl in swift using AppConfiguration:
https://www.mongodb.com/docs/atlas/device-sdks/sdk/swift/app-services/connect-to-app-services-backend/#configuration

@Jaycyn
Copy link

Jaycyn commented Apr 3, 2024

Thank you @michael-wb

For those of us not familiar with directly working with App Services, since the workaround is to create a custom AppConfiguration, what values do the other parameters need to be set to? From the docs

let configuration = AppConfiguration(
   baseURL: "https://services.cloud.mongodb.com", // point this to https://us-east-1.aws.realm.mongodb.com
   transport: nil, // Custom RLMNetworkTransportProtocol
   localAppName: "My App",
   localAppVersion: "3.14.159",
   defaultRequestTimeoutMS: 30000
)

let app = App(id: "my-app-services-app-id", configuration: configuration)

can they stay the same as in the docs per above? If not what should the values be?

@michael-wb
Copy link

Hi @Jaycyn
If you don't need to configure those values, then you can just not define them and they will use default values. e.g.:

let configuration = AppConfiguration(
   baseURL: "https://us-east-1.aws.realm.mongodb.com/"
)

let app = App(id: "my-app-services-app-id", configuration: configuration)

API reference: https://www.mongodb.com/docs/realm-sdks/swift/latest/Extensions/AppConfiguration.html

@Jaycyn
Copy link

Jaycyn commented Apr 3, 2024

Original code which logs in and out correctly

let gTodoApp = App(id: Constants.REALM_APP_ID)

new code

let configuration = AppConfiguration(
   baseURL: "https://us-east-1.aws.realm.mongodb.com/"
)

let gTodoApp = App(id: Constants.REALM_APP_ID, configuration: configuration)

results in

Error: App: request location failed (MalformedJson): [json.exception.parse_error.101] parse error at line 1, column 1: syntax error while parsing value - invalid literal; last read: '<'
Error: App: log_in_with_credentials failed: 0 message: [json.exception.parse_error.101] parse error at line 1, column 1: syntax error while parsing value - invalid literal; last read: '<'

I am sure I am overlooking something but it's not clear where I should begin looking since only one line of code was changed.

@michael-wb
Copy link

Oh, sorry - try removing the / from the end of baseUrl and see if that fixes it.

@Jaycyn
Copy link

Jaycyn commented Apr 3, 2024

That corrected the issue and it appears to be working. Thank you for the super fast responses.

@vishaldeshai
Copy link

That corrected the issue and it appears to be working. Thank you for the super fast responses.

But there was some issue related to meta data version in earlier version.

@andreasley
Copy link
Author

You can configure the baseUrl in swift using AppConfiguration: https://www.mongodb.com/docs/atlas/device-sdks/sdk/swift/app-services/connect-to-app-services-backend/#configuration

Unfortunately, my app is using User/Password authentication and I have to use user.configuration() to get an AppConfiguration. There doesn't seem to be a way to set the baseUrl later (perhaps an oversight in the API?).

@Jaycyn
Copy link

Jaycyn commented Apr 12, 2024

@andreasley Not sure the issue being described is clear.

We also use User/Password Authentication and generally, getting an AppConfiguration is unrelated to the login process.

This code is working for us

let configuration = AppConfiguration(
   baseURL: "https://us-east-1.aws.realm.mongodb.com"
)

let gTodoApp = App(id: Constants.REALM_APP_ID, configuration: configuration)

and then the login is something like this

@MainActor
func login() async throws -> User {
    let creds = Credentials.emailPassword(email: "some email", password: "some password)
    let user = try await gTodoApp.login(credentials: creds)
    print("successfully logged in user: \(user)")
    return user
}

@andreasley
Copy link
Author

andreasley commented Apr 12, 2024

We also use User/Password Authentication and generally, getting an AppConfiguration is unrelated to the login process.

You're absolutely right – I was looking at Realm.Configuration (which also takes a URL) instead of AppConfiguration. Using AppConfiguration(baseURL: "https://europe-west1.gcp.realm.mongodb.com") works perfectly fine as a workaround for me.

Apologies for the oversight. Thankfully, the weekend's almost here. ;)

@michael-wb
Copy link

@andreasley / @Jaycyn / @vishaldeshai / @anton-plebanovich / @KudeusVince
FYI - there was a server fix to address this failure that you were seeing.

Under certain circumstances, (e.g. starting a sync websocket session with a cached user) the client will connect to the "global" MongoDB realm endpoint "https://realm.mongodb.com", (or "https://services.cloud.mongodb.com" after the domain update), which would be translated to a localized server by the DNS based on your area. If the cloud app happens to reside on the localized server you connected to, everything was good. If not, then the server was responding with the WebSocket: Internal Server Error error and an arbitrary "error" message.

The server fix was to tell the client to connect to the correct localized server via a redirect response from the server instead of returning the error you were seeing.

As a result, your client app should no longer be required to specify the localized server base URL and it should "just work" without crashing.

@anton-plebanovich
Copy link

@michael-wb I have tried with the latest 10.50.0 version and while I confirm it works the solution still seems to be a workaround. Are there plans to improve it in the future? It works fine on the first install but on the second run (probably due to a cached user?) I see the issue happening in the logs again and the first connection even retried it once before the second connection succeeded (third run).

First run

05.05 08:59:16.629 | R | [Info] Connection[1]: Session[1]: Binding ***
05.05 08:59:16.629 | R | [Info] Connection[1]: Session[1]: client_reset_config = false, Realm exists = true 
05.05 08:59:16.630 | R | [Info] Connection[1]: Connecting to 'wss://eu-west-1.aws.ws.services.cloud.mongodb.com:443/api/client/v2.0/app/dtsandboxanton-nmfsd/realm-sync'
05.05 08:59:17.517 | R | [Detail] Connection[1]: Negotiated protocol version: 12
05.05 08:59:17.517 | R | [Info] Connection[1]: Connected to app services with request id: "6636f605aa360504a1c2d301"

Seconds run

05.05 09:01:01.654 | R | [Info] Connection[1]: Session[1]: Binding ***
05.05 09:01:01.654 | R | [Info] Connection[1]: Session[1]: client_reset_config = false, Realm exists = true 
05.05 09:01:01.655 | R | [Info] Connection[1]: Connecting to 'wss://ws.services.cloud.mongodb.com:443/api/client/v2.0/app/dtsandboxanton-nmfsd/realm-sync'
05.05 09:01:02.434 | R | [Detail] Connection[1]: Negotiated protocol version: 12
05.05 09:01:02.434 | R | [Info] Connection[1]: Connected to app services with request id: "6636f66e7eac53ca6f1a93bc"
05.05 09:01:02.435 | R | [Info] Connection[1]: Closing the websocket with error code=WebSocket: Moved Permanently, message='moved permanently', was_clean=true
05.05 09:01:02.436 | R | [Detail] Connection[1]: Allowing reconnection in 803 milliseconds
05.05 09:01:03.176 | R | [Detail] Connection[1]: Canceling reconnect delay
05.05 09:01:03.176 | R | [Info] Connection[2]: Session[2]: Binding ***
05.05 09:01:03.176 | R | [Info] Connection[2]: Session[2]: client_reset_config = false, Realm exists = true 
05.05 09:01:03.177 | R | [Info] Connection[2]: Connecting to 'wss://eu-west-1.aws.ws.services.cloud.mongodb.com:443/api/client/v2.0/app/dtsandboxanton-nmfsd/realm-sync'
05.05 09:01:04.074 | R | [Detail] Connection[2]: Negotiated protocol version: 12
05.05 09:01:04.075 | R | [Info] Connection[2]: Connected to app services with request id: "6636f670c7cd8566b84eb775"

Third run

05.05 09:01:40.776 | N | VPN connection required: false
05.05 09:01:41.019 | R | [Info] Connection[1]: Session[1]: Binding ***
05.05 09:01:41.020 | R | [Info] Connection[1]: Session[1]: client_reset_config = false, Realm exists = true 
05.05 09:01:41.021 | R | [Info] Connection[1]: Connecting to 'wss://ws.services.cloud.mongodb.com:443/api/client/v2.0/app/dtsandboxanton-nmfsd/realm-sync'
05.05 09:01:41.522 | R | [Detail] Connection[1]: Negotiated protocol version: 12
05.05 09:01:41.522 | R | [Info] Connection[1]: Connected to app services with request id: "6636f6957eac53ca6f1c4c71"
05.05 09:01:41.523 | R | [Info] Connection[1]: Closing the websocket with error code=WebSocket: Moved Permanently, message='moved permanently', was_clean=true
05.05 09:01:41.524 | R | [Detail] Connection[1]: Allowing reconnection in 798 milliseconds
05.05 09:01:42.323 | R | [Info] Connection[1]: Connecting to 'wss://ws.services.cloud.mongodb.com:443/api/client/v2.0/app/dtsandboxanton-nmfsd/realm-sync'
05.05 09:01:42.331 | R | [Info] Connection[2]: Session[2]: Binding ***
05.05 09:01:42.331 | R | [Info] Connection[2]: Session[2]: client_reset_config = false, Realm exists = true 
05.05 09:01:42.332 | R | [Info] Connection[2]: Connecting to 'wss://eu-west-1.aws.ws.services.cloud.mongodb.com:443/api/client/v2.0/app/dtsandboxanton-nmfsd/realm-sync'
05.05 09:01:42.812 | R | [Detail] Connection[1]: Negotiated protocol version: 12
05.05 09:01:42.812 | R | [Info] Connection[1]: Closing the websocket with error code=WebSocket: Moved Permanently, message='moved permanently', was_clean=true
05.05 09:01:42.940 | R | [Detail] Connection[2]: Negotiated protocol version: 12
05.05 09:01:42.940 | R | [Info] Connection[2]: Connected to app services with request id: "6636f697aa360504a1c448e3"

@nirinchev
Copy link
Member

This is the intended behavior. The websocket connection is established against the default url, after which the server responds with the local url that should be used. Why do you believe this is a workaround that needs to be fixed?

@anton-plebanovich
Copy link

@nirinchev Longer overall connection duration: 2-3 seconds for subsequent runs versus 1 second for the initial run in my case. As an optimization, it may preserve the returned URL instead of always trying the default URL and failing. Currently, the logic I have uses cached user to speed up application startup but with the new behavior it looks like login every time is actually faster startup flow which is counterintuitive and killing optimizations I have.

This is the old framework behavior which takes only 1 second to finish the connection without errors:

05.05 09:50:09.540 | R | [Info] Connection[1]: Session[1]: Binding ***
05.05 09:50:09.543 | R | [Info] Connection[1]: Session[1]: client_reset_config = false, Realm exists = true 
05.05 09:50:09.543 | R | [Info] Connection[1]: Connecting to 'wss://ws.eu-west-1.aws.realm.mongodb.com:443/api/client/v2.0/app/dtsandboxanton-nmfsd/realm-sync'
05.05 09:50:10.480 | R | [Detail] Connection[1]: Negotiated protocol version: 11
05.05 09:50:10.480 | R | [Info] Connection[1]: Connected to app services with request id: "663701f2c7cd8566b872cc77"

Updating to the new framework version causes longer overall connection time and so is regression and update blocker from my point of view.

@michael-wb
Copy link

@anton-plebanovich - understood. If the sync time from app start is a priority/concern, the best method is to specify the localized server, since you will connect directly to the appropriate server on the first attempt.
We have some other work going on around the app services and sync components of the client which may help at some point in the future to reduce the sync connect on app start time, but it still won't be as fast as specifying the localized server.
FYI-If you ever decide to change the deployment models (server) in the future, any clients that haven't updated and still try to connect to the original server will be redirected to the appropriate one.

@anton-plebanovich
Copy link

Thank you @michael-wb I verified the solution and it works for me 👍
Knowing it will work in a deployment model change case is also helpful. Thank you for the info 🤝

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants