We’re analyzing the mech-int-ungodly Impala architecture, from the paper. Basically
=== Impala
conv
maxpool2D
---- residual x2:
relu
conv
relu
conv
residual add from input to this residual block
=== /IMPALA
(repeat 2 more impalas)
---
relu
flatten
fully connected
relu
---
linear policy and value heads
so this mess has sixteen conv layers, was trained on pixels. We’re not doing coinrun for this MATS sprint, although a good amount of tooling should cross over.
This has presented some challenges—no linearity from decomposing an ongoing residual stream into head-contributions.
Are you using decision transformers or other RL agents on procgens ? Also, do you plan to work on coinrun ?
We’re analyzing the mech-int-ungodly Impala architecture, from the paper. Basically
so this mess has sixteen conv layers, was trained on pixels. We’re not doing coinrun for this MATS sprint, although a good amount of tooling should cross over.
This has presented some challenges—no linearity from decomposing an ongoing residual stream into head-contributions.