Skip to content

Traveling Salesman Problem using linear programming math model, Reinforcement Learning and Simulated Annealing

Notifications You must be signed in to change notification settings

adames-ouro/NetworkModel-QLearning-SimulatedAnnealing

Repository files navigation

Network Model, Q Learning & Simulated Annealing

Network Image


Problem Statement:

In the Traveling Salesman Problem (TSP), we have a set of cities, and the distances between each pair of cities are known. The problem is to find the shortest possible route that visits each city exactly once and returns to the starting city (often referred to as the depot).


Math Image


Q Image

Algorithm

  1. Initialize Q-values arbitrarily for all state-action pairs.
  2. For each episode:
    1. Start at a random state.
    2. For each step in the episode:
      1. Choose an action a from state s using a policy derived from the current Q (e.g., epsilon - greedy).
      2. Take the action, and observe the reward r and the new state s.
      3. Update the Q-value using the Bellman equation.
      4. Set s =s`.
    3. Repeat until a terminal state is reached.

Exploration vs. Exploitation

A significant challenge in Q-learning is the trade-off between exploration (trying new actions) and exploitation (relying on known information). A common solution is the epsilon - greedy policy:

  • With probability epsilon, choose a random action.
  • With probability 1 - epsilon, choose the action with the highest Q-value.

Over time, epsilon is decreased so that the agent explores less and exploits more.


Simulated_Annealing Image

Simulated Annealing (SA) is a probabilistic optimization technique inspired by the annealing process in metallurgy. Annealing involves heating a material and then cooling it slowly. Simulated annealing applies this concept to find an approximation to the global optimum of a function.

The Traveling Salesman Problem (TSP) is a classic optimization problem where a salesman wishes to visit a number of cities exactly once and return to the starting city while minimizing the total distance traveled.

Key Concepts:

  • Temperature T: It's a control parameter that determines the probability of accepting a worse solution than the current one. A high temperature increases the chance of accepting worse solutions, promoting exploration, while a low temperature tends to accept only better solutions, promoting exploitation.

  • Cooling Schedule: How the temperature decreases over time. A typical method is geometric cooling, where the temperature is multiplied by a constant factor alpha.

  • Neighbor Solution Generation: In TSP, a neighbor can be generated by various methods such as swapping two cities, reversing a subsection of the tour, or other perturbations.