Achieving Tighter Finite-Time Rates for Heterogeneous Federated Stochastic Approximation under Markovian Sampling
- This is my second paper for the pursuit of my PhD degree.
- This paper has been submitted to NeuRIPS 2025.
- About this paper
- Two sets of algorithms in reinforcement learning (RL): policy iteration, and value-iteration.
- Recall that in my last paper Towards Fast Rates for Federated and Multi Task Reinforcement Learning, we focused on heterogeneous FRL problems with polcy-iteration methods, policy gradient methods to be precise.
- We propose FedHSA, an algorithm designed for value-iteration methods in heterogeneous FRL settings, compensenting for the previous paper in FRL algorithms.
- We consider the generic federated stochastic approximation framework, where the goal is for the agents to communicate intermittently via a server and find the fixed point of an average of different contractive operators. This formulation can be applied to many RL algorithms such as TD and Q.
- The proposed FedHSA algorithm is proved to converge precisely to the desired point with collaborative speedup and no heterogeneity bias under Markovian sampling.
- The Markovian data and heterogeneous operators account for its applicability to FRL algorithms.
- The framework we study in this paper is the stochastic approximation (SA) framework, which captures a large class of problems in optimization, control and RL such as SGD, TD-learning and Q-learning.
- We wish to speed up the learning process by collecting data collaboratively from different environments, taking inspiration from federated learning (FL).
- However, in practice, data usually come from different (heterogeneous) environments, and there is ample reason to consider correlation between data samples, which is the case in practical RL algorithms.
- The problem interest is
- Agent
$i$ only has access to a sequence of noisy operators$G_i(\cdot, o_{i,t})$ . - The observations
${o_{i,t}}$ are genearted from an ergodic Markov chain specific to agent$i$ to capture temporal correlations between data. - Agents communicate intermittenly through a central server.
- In the single-agent (centralized) setting, athe finite-time rate for SA is
where
-
Key Question: Is it possible to converge exactly to
$\theta^\star$ in the federated case, with a$M$ -fold reduction on the variance term, capturing the benefit of collaboration?
- The local noisy operators
$G_i$ 's are$L$ -Lipschitz. - The average true operator
$\bar G$ is 1-point strongly monotone w.r.t.$\theta^\star$ . - Each agent's underlying Markov chain is ergodic.
- Observations genearted for different agents are independent.
- Even given that each agent has access to the true local operator
$\bar G_i$ , but the number of local steps$H=2$ , we can prove that the vanilla FedAvg-like local SA algorithm fails to converge to the true optimum$\theta^\star$ , known as the ''client-drift'' issue in federated optimization. - To be specific, there is a bias term in the final bound that impedes convergence. Can we get rid of that bias while still enjoying the benefit of collaboration?
- Two types of data correlations have to be tackled: i) temporal correlations in the data for any given agent, and ii) correlations induced by exchanging data across agents---temporal and spatial correlations.
- Matching centralized rates with variance reduced
$M$ -fold.
- Linear speedup w.r.t. the number of agents with no heterogeneity bias.
- FedHSA has a lower error floor compared to the vanilla federated SA algorithm.
- The error floor improves with even more agents.