Network Model, Q Learning & Simulated Annealing

Problem Statement:

In the Traveling Salesman Problem (TSP), we have a set of cities, and the distances between each pair of cities are known. The problem is to find the shortest possible route that visits each city exactly once and returns to the starting city (often referred to as the depot).

Algorithm

Initialize Q-values arbitrarily for all state-action pairs.
For each episode:
1. Start at a random state.
2. For each step in the episode:
  1. Choose an action a from state s using a policy derived from the current Q (e.g., epsilon - greedy).
  2. Take the action, and observe the reward r and the new state s.
  3. Update the Q-value using the Bellman equation.
  4. Set s =s`.
3. Repeat until a terminal state is reached.

Exploration vs. Exploitation

A significant challenge in Q-learning is the trade-off between exploration (trying new actions) and exploitation (relying on known information). A common solution is the epsilon - greedy policy:

With probability epsilon, choose a random action.
With probability 1 - epsilon, choose the action with the highest Q-value.

Over time, epsilon is decreased so that the agent explores less and exploits more.

Simulated Annealing (SA) is a probabilistic optimization technique inspired by the annealing process in metallurgy. Annealing involves heating a material and then cooling it slowly. Simulated annealing applies this concept to find an approximation to the global optimum of a function.

The Traveling Salesman Problem (TSP) is a classic optimization problem where a salesman wishes to visit a number of cities exactly once and return to the starting city while minimizing the total distance traveled.

Key Concepts:

Temperature T: It's a control parameter that determines the probability of accepting a worse solution than the current one. A high temperature increases the chance of accepting worse solutions, promoting exploration, while a low temperature tends to accept only better solutions, promoting exploitation.
Cooling Schedule: How the temperature decreases over time. A typical method is geometric cooling, where the temperature is multiplied by a constant factor alpha.
Neighbor Solution Generation: In TSP, a neighbor can be generated by various methods such as swapping two cities, reversing a subsection of the tour, or other perturbations.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
images		images
.gitignore		.gitignore
Optimization - Traveling Salesman Problem.ipynb		Optimization - Traveling Salesman Problem.ipynb
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

images

images

.gitignore

.gitignore

Optimization - Traveling Salesman Problem.ipynb

Optimization - Traveling Salesman Problem.ipynb

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Network Model, Q Learning & Simulated Annealing

Problem Statement:

Algorithm

Exploration vs. Exploitation

Key Concepts:

About

Languages

adames-ouro/NetworkModel-QLearning-SimulatedAnnealing

Folders and files

Latest commit

History

Repository files navigation

Network Model, Q Learning & Simulated Annealing

Problem Statement:

Algorithm

Exploration vs. Exploitation

Key Concepts:

About

Topics

Resources

Stars

Watchers

Forks

Languages