Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DQN report. [QUESTION] #1180

Open
smbrine opened this issue Mar 29, 2023 · 0 comments
Open

DQN report. [QUESTION] #1180

smbrine opened this issue Mar 29, 2023 · 0 comments

Comments

@smbrine
Copy link

smbrine commented Mar 29, 2023

Introduction
My model acts like a compulsive masochist. Great beginning, innit? I will attach my parameters a bit further in a text but don't strictly orient on them bc I'm changing them all the time because of the following:

Describe the bug
I have a very simple ping-pong env (custom one, not gym) and I sat up an agent without any issues except, probably, one. Potential problem is in the reward system but nevertheless it shouldn't act like it does. My reward system bases on a simple

if not done:
     reward = 1
 else:
     reward = 0

and probably he should try to get as many reward points as possible and it does so but only in the first 10k steps. Neither of the parameters affects on this occasion. Ofc hyperparams changes its performance but nothing more. After 10k it starts to dodge a ball but sometimes it gets about 5-10 points but dodges a 100 episodes afterwards.
Code example
I will throw everything important (imo) in a single logical sequence but i can invite in repo if needed. rew_mean looks like this. As you can see, it smashes after 10k. Btw, after learning starts parameter it smashes even lower and I don't know how's that even possible. Here's one more graph.

framebuffer = 5
learning_rate = 0.0001
total_timesteps = 10000000 # something like the infinity. I have a callback each 5k steps.
env = PingPongEnv()
env = DummyVecEnv([lambda: env])
env = VecTransposeImage(env)
env = VecFrameStack(env, n_stack=framebuffer)

model = DQN('CnnPolicy', env, verbose=1, tau=0.001, tensorboard_log=LOG_DIR, 
                            learning_rate=learning_rate, buffer_size=10000, learning_starts=100000, 
                            train_freq=1000, target_update_interval=20000, exploration_inital_eps=1, 
                            exploration_final_eps=0.00001, exploration fraction=0.001)

System Info
Describe the characteristic of your environment:

  • As far as I'm using hell lotta libraries for a single purpose, I can't write about each and every, but globally I'm using conda when available and pip when conda is unable to find required packages
  • I have a singe GPU. GTX 1060 6G but it's utilized about 10-15% and mem usage is around 3-4 gigs. Ram is also not overfitted as well as cpu and disks (just in case).
  • Python 3.10.10, conda = 23.1.0, latest at the moment.
  • I'm not using tensorflow so it's not even installed. pytorch is 2.0.0, latest stable at the moment.
  • Stable_baselines3 v1.7.0, pytorch-cuda v11.8, gym v0.21.0.

Additional context
Ping-pong is written on arcade by my brother but I'm not sure if it's useful info bc I'm not diving into his code, I use direct input instead.
I use win32gui to grab images but it gives back about 150-200 images per second so its definitely not the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant