simple_dqn_tf2.py Doesn't allow for multiple return actions

If you try to change the n_actions parameter then when the model trys to learn it will fail 

```
164/164 [==============================] - 0s 998us/step
164/164 [==============================] - 0s 887us/step
[[[nan nan nan ... nan nan nan]]

 [[nan nan nan ... nan nan nan]]

 [[nan nan nan ... nan nan nan]]

 ...

 [[nan nan nan ... nan nan nan]]

 [[nan nan nan ... nan nan nan]]

 [[nan nan nan ... nan nan nan]]] [   0    1    2 ... 5245 5246 5247] [list([2, 2, 5]) list([2, 1, 6]) list([3, 0, 6]) ... list([3, 0, 7])
 list([3, 8, 5]) list([3, 0, 3])]
Traceback (most recent call last):
  File "main.py", line 30, in <module>
    agent.learn()
  File "simple_dqn_tf2.py", line 95, in learn
    self.gamma * np.max(q_next, axis=1)*dones
ValueError: operands could not be broadcast together with shapes (5248,82) (5248,)
```

This definitely has to do with the shape of the stored action. I'm just not sure how to fix it. 

5248 = n_actions * batch_size
82 = n_actions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

simple_dqn_tf2.py Doesn't allow for multiple return actions #56

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

simple_dqn_tf2.py Doesn't allow for multiple return actions #56

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions