Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VPN server failure detection recovery #786

Merged
merged 44 commits into from
Apr 30, 2024
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
bf94293
Small tidy up to remove dependency cycle
graeme Apr 8, 2024
369c273
Inject settings rather than environment to client
graeme Apr 17, 2024
c1ff07d
Add failure recovery register request
graeme Apr 17, 2024
5b802d8
Add log category for failure recovery
graeme Apr 17, 2024
a21a3dd
Label the config+server tuple values
graeme Apr 17, 2024
bbaaa81
Store server not just info
graeme Apr 17, 2024
5f51518
Abstract reasserting from PacketTunnelProvider
graeme Apr 17, 2024
94e094a
Create failurerecoveryHandler + empty tests
graeme Apr 17, 2024
beedbde
Pull updating adapter config into helper function
graeme Apr 17, 2024
5159e38
Pull deviceManager into a lazy var
graeme Apr 17, 2024
4a52c88
Add failure monitoring and recovery to PTP
graeme Apr 17, 2024
727c271
Remove location from request
graeme Apr 18, 2024
78d3286
Don't regenerate the key on failure recovery
graeme Apr 18, 2024
95ba950
Fix broken existing tests
graeme Apr 18, 2024
ec7a3a1
Write tests for failureRecoveryHandler
graeme Apr 18, 2024
18d6a59
Add comment about todo pixel
graeme Apr 18, 2024
d82cea6
Merge branch 'main' into graeme/vpn-server-failure-detection-recovery
graeme Apr 23, 2024
c729eb1
Update function name to be more consistent
graeme Apr 23, 2024
5513d30
Make sure tunnelUpdateAttempt pixels are fired
graeme Apr 23, 2024
72e07ab
Fix swiftlint
graeme Apr 18, 2024
ce6f8d9
Add initial retry logic
graeme Apr 19, 2024
390d5bb
Add more logging
graeme Apr 19, 2024
b33debe
Fixed no recovery errors
graeme Apr 19, 2024
f76ecc8
Update tests for failure recovery
graeme Apr 19, 2024
d3f387e
Stop failure monitor when handshake succeeds
graeme Apr 19, 2024
ee7ad24
Tidy up from PR feedback
graeme Apr 24, 2024
4ae3962
Add events for pixels
graeme Apr 22, 2024
81ba57a
Tidy up tests
graeme Apr 23, 2024
58a2fbf
Swiftlint fixes
graeme Apr 24, 2024
e82dc3f
Tidy up error handling
graeme Apr 24, 2024
cb9505f
Fix reasserting problem
graeme Apr 24, 2024
4df8447
Swiftlint
graeme Apr 24, 2024
62e5ae1
Another swiftlint fix
graeme Apr 25, 2024
a88f8ea
Make devicemanager property private
graeme Apr 25, 2024
b26a039
Remove... x? (wtf)
graeme Apr 25, 2024
a13e884
Use convenience serverInfo property
graeme Apr 26, 2024
7c9ce07
Use the typealiased tuple
graeme Apr 26, 2024
5766031
Revert to injecting environment not settings
graeme Apr 26, 2024
7e80423
Always stop reasserting regardless of result
graeme Apr 26, 2024
77b69d0
Fix retry logic
graeme Apr 26, 2024
6d3410b
Better handle task cancellation
graeme Apr 26, 2024
0406ce0
Improve readability of GenrateTunnelConfigurationResult
graeme Apr 26, 2024
acb5485
Make sure netP convertible errors are converted
graeme Apr 29, 2024
e0a1456
Merge remote-tracking branch 'origin/main' into graeme/vpn-server-fai…
graeme Apr 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ public actor NetworkProtectionTunnelFailureMonitor {
task?.isCancelled == false
}

private weak var tunnelProvider: PacketTunnelProvider?
private let handshakeReporter: HandshakeReporting
diegoreymendez marked this conversation as resolved.
Show resolved Hide resolved

private let networkMonitor = NWPathMonitor()

Expand All @@ -61,8 +61,8 @@ public actor NetworkProtectionTunnelFailureMonitor {

// MARK: - Init & deinit

init(tunnelProvider: PacketTunnelProvider) {
self.tunnelProvider = tunnelProvider
init(handshakeReporter: HandshakeReporting) {
self.handshakeReporter = handshakeReporter
self.networkMonitor.start(queue: .global())

os_log("[+] %{public}@", log: .networkProtectionMemoryLog, type: .debug, String(describing: self))
Expand Down Expand Up @@ -110,7 +110,7 @@ public actor NetworkProtectionTunnelFailureMonitor {
return
}

let mostRecentHandshake = await tunnelProvider?.mostRecentHandshake() ?? 0
let mostRecentHandshake = (try? await handshakeReporter.getMostRecentHandshake()) ?? 0

guard mostRecentHandshake > 0 else {
os_log("⚫️ Got handshake timestamp at or below 0, skipping check", log: .networkProtectionTunnelFailureMonitorLog, type: .debug)
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
//
// WireGuardAdapter+HandshakeReporting.swift
//
// Copyright © 2023 DuckDuckGo. All rights reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//

import Foundation

protocol HandshakeReporting {
/// Retrieves the number of seconds of the most recent handshake for the previously added peer entry, expressed relative to the Unix epoch.
///
/// - Throws: ConfigReadingError
/// - Returns: Interval between the most recent handshake and the Unix epoch.
///
func getMostRecentHandshake() async throws -> TimeInterval
}

extension WireGuardAdapter: HandshakeReporting {}
Original file line number Diff line number Diff line change
Expand Up @@ -33,12 +33,12 @@ public final class NetworkProtectionCodeRedemptionCoordinator: NetworkProtection
private let isManualCodeRedemptionFlow: Bool
private let errorEvents: EventMapping<NetworkProtectionError>

convenience public init(environment: VPNSettings.SelectedEnvironment,
convenience public init(settings: VPNSettings,
tokenStore: NetworkProtectionTokenStore,
isManualCodeRedemptionFlow: Bool = false,
errorEvents: EventMapping<NetworkProtectionError>,
isSubscriptionEnabled: Bool) {
self.init(networkClient: NetworkProtectionBackendClient(environment: environment, isSubscriptionEnabled: isSubscriptionEnabled),
self.init(networkClient: NetworkProtectionBackendClient(settings: settings, isSubscriptionEnabled: isSubscriptionEnabled),
tokenStore: tokenStore,
isManualCodeRedemptionFlow: isManualCodeRedemptionFlow,
errorEvents: errorEvents)
Expand Down
7 changes: 7 additions & 0 deletions Sources/NetworkProtection/Logging/Logging.swift
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,10 @@ extension OSLog {
Logging.networkProtectionTunnelFailureMonitorLoggingEnabled ? Logging.networkProtectionTunnelFailureMonitor : .disabled
}

public static var networkProtectionServerFailureRecoveryLog: OSLog {
diegoreymendez marked this conversation as resolved.
Show resolved Hide resolved
Logging.networkProtectionServerFailureRecoveryLoggingEnabled ? Logging.networkProtectionServerFailureRecovery : .disabled
}

public static var networkProtectionConnectionTesterLog: OSLog {
Logging.networkProtectionConnectionTesterLoggingEnabled ? Logging.networkProtectionConnectionTesterLog : .disabled
}
Expand Down Expand Up @@ -90,6 +94,9 @@ struct Logging {
fileprivate static let networkProtectionTunnelFailureMonitorLoggingEnabled = true
fileprivate static let networkProtectionTunnelFailureMonitor: OSLog = OSLog(subsystem: subsystem, category: "Network Protection: Tunnel Failure Monitor")

fileprivate static let networkProtectionServerFailureRecoveryLoggingEnabled = true
fileprivate static let networkProtectionServerFailureRecovery: OSLog = OSLog(subsystem: subsystem, category: "Network Protection: Server Failure Recovery")

fileprivate static let networkProtectionConnectionTesterLoggingEnabled = true
fileprivate static let networkProtectionConnectionTesterLog: OSLog = OSLog(subsystem: subsystem, category: "Network Protection: Connection Tester")

Expand Down
16 changes: 11 additions & 5 deletions Sources/NetworkProtection/NetworkProtectionDeviceManager.swift
Original file line number Diff line number Diff line change
Expand Up @@ -31,13 +31,16 @@ public enum NetworkProtectionServerSelectionMethod: CustomDebugStringConvertible
"avoidServer: \(serverName)"
case .preferredLocation(let location):
"preferredLocation: \(location)"
case .failureRecovery(serverName: let serverName):
"failureRecovery: \(serverName)"
}
}

case automatic
case preferredServer(serverName: String)
case avoidServer(serverName: String)
case preferredLocation(NetworkProtectionSelectedLocation)
case failureRecovery(serverName: String)
}

public protocol NetworkProtectionDeviceManagement {
Expand All @@ -46,7 +49,7 @@ public protocol NetworkProtectionDeviceManagement {
includedRoutes: [IPAddressRange],
excludedRoutes: [IPAddressRange],
isKillSwitchEnabled: Bool,
regenerateKey: Bool) async throws -> (TunnelConfiguration, NetworkProtectionServerInfo)
regenerateKey: Bool) async throws -> (tunnelConfig: TunnelConfiguration, server: NetworkProtectionServer)

}

Expand All @@ -59,12 +62,12 @@ public actor NetworkProtectionDeviceManager: NetworkProtectionDeviceManagement {

private let isSubscriptionEnabled: Bool

public init(environment: VPNSettings.SelectedEnvironment,
public init(settings: VPNSettings,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above this should be rolled back - we can't support updating the environment without restarting the connection. This change can give the reader the false impression that's possible.

tokenStore: NetworkProtectionTokenStore,
keyStore: NetworkProtectionKeyStore,
errorEvents: EventMapping<NetworkProtectionError>?,
isSubscriptionEnabled: Bool) {
self.init(networkClient: NetworkProtectionBackendClient(environment: environment, isSubscriptionEnabled: isSubscriptionEnabled),
self.init(networkClient: NetworkProtectionBackendClient(settings: settings, isSubscriptionEnabled: isSubscriptionEnabled),
tokenStore: tokenStore,
keyStore: keyStore,
errorEvents: errorEvents,
Expand Down Expand Up @@ -115,7 +118,7 @@ public actor NetworkProtectionDeviceManager: NetworkProtectionDeviceManagement {
includedRoutes: [IPAddressRange],
excludedRoutes: [IPAddressRange],
isKillSwitchEnabled: Bool,
regenerateKey: Bool) async throws -> (TunnelConfiguration, NetworkProtectionServerInfo) {
regenerateKey: Bool) async throws -> (tunnelConfig: TunnelConfiguration, server: NetworkProtectionServer) {
graeme marked this conversation as resolved.
Show resolved Hide resolved
var keyPair: KeyPair

if regenerateKey {
Expand Down Expand Up @@ -153,7 +156,7 @@ public actor NetworkProtectionDeviceManager: NetworkProtectionDeviceManagement {
includedRoutes: includedRoutes,
excludedRoutes: excludedRoutes,
isKillSwitchEnabled: isKillSwitchEnabled)
return (configuration, selectedServer.serverInfo)
return (configuration, selectedServer)
} catch let error as NetworkProtectionError {
errorEvents?.fire(error)
throw error
Expand Down Expand Up @@ -194,6 +197,9 @@ public actor NetworkProtectionDeviceManager: NetworkProtectionDeviceManagement {
case .preferredLocation(let location):
serverSelection = .location(country: location.country, city: location.city)
excludedServerName = nil
case .failureRecovery(serverName: let serverName):
serverSelection = .recovery(server: serverName)
excludedServerName = nil
}

let requestBody = RegisterKeyRequestBody(publicKey: keyPair.publicKey,
Expand Down
26 changes: 23 additions & 3 deletions Sources/NetworkProtection/Networking/NetworkProtectionClient.swift
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,7 @@ struct RegisterKeyRequestBody: Encodable {
let server: String?
let country: String?
let city: String?
let mode: String?

init(publicKey: PublicKey,
serverSelection: RegisterServerSelection) {
Expand All @@ -103,14 +104,29 @@ struct RegisterKeyRequestBody: Encodable {
server = nil
self.country = country
self.city = city
case .recovery(server: let server):
self.server = server
self.country = nil
self.city = nil
}
mode = serverSelection.mode
}
}

enum RegisterServerSelection {
case automatic
case server(name: String)
case location(country: String, city: String?)
case recovery(server: String)

var mode: String? {
switch self {
case .recovery:
"failureRecovery"
case .automatic, .location, .server:
nil
}
}
}

struct RedeemInviteCodeRequestBody: Encodable {
Expand Down Expand Up @@ -175,12 +191,16 @@ final class NetworkProtectionBackendClient: NetworkProtectionClient {
return decoder
}()

private let endpointURL: URL
private var endpointURL: URL {
settings.selectedEnvironment.endpointURL
}

private let isSubscriptionEnabled: Bool
private let settings: VPNSettings

init(environment: VPNSettings.SelectedEnvironment = .default, isSubscriptionEnabled: Bool) {
init(settings: VPNSettings, isSubscriptionEnabled: Bool) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change allowed us to instantiate the client so that we don’t need to re-init the NetworkProtectionDeviceManager every time it's used. Should make testing a bit easier further down the line + also means we can guarantee the client always gets the most up-to-date environment setting

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I love this, though. I don't think we should support real-time updating of the environment while the VPN is connected.

If anything, we should discourage it.

If we want to change the environment the whole VPN should be stopped instead.

  1. How necessary is this change? Can we roll it back?
  2. Do we ever expect to support changing the selected environment while the connection is already established?

Note: this wasn't a problem before because it was possible to change environments, but we no longer can support this, and should in fact remove all the code that allows this to happen when the connection is up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also feel like replacing the environment param with settings should at least be considered outside the scope of this PR/branch. It already does a lot and refactoring this might lead to side effect.

Copy link
Contributor Author

@graeme graeme Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn’t fully explain the above. The reason why it’s important to not recreate the device manager every time is that we could end up with two register requests happening at the same time which could create racey behaviour. The NetworkProtectionDeviceManager is an actor and it makes sense to just have one instance which will queue up these requests rather than creating new instances every time start, stop or failure recovery happens.

Rreplacing the environment param with settings was so that we could still support changing the environment without needing to restart the PacketTunnelProvider, but @diegoreymendez and I had a chat and we think it’s actually probable more sensible to require restarting as the token we’re using would become invalid.

I’m going to revert the injection of settings instead of environment, but I will still create a single instance of the NetworkProtectionDeviceManager. This will mean that we need to manually restart the tunnel for the env change to take effect.

self.isSubscriptionEnabled = isSubscriptionEnabled
self.endpointURL = environment.endpointURL
self.settings = settings
}

public enum GetLocationsError: CustomNSError {
Expand Down