If I may ask, has anyone been able to replicate their original results?
I can’t be sure I’ve not missed something but I haven’t found anything, did another search just now. Their code is all on github here so it could be quite easily checked with access to a cluster of GPUs.
Perhaps the more interesting question though, from a production perspective would be how well this system scales up to medium and large amounts of data/training. Does it fall behind MuZero and if so, when? If so, which of the algorithm changes cause this? What if the model complexity was brought back up to the size of MuZero?
How much compute do you need?
It depends on how broken my system is. A single run on a single game they claim takes 7 hours on 4 20GB GPUs, so a proper evaluation would take perhaps a week. In reality I’d need to first get it set up, and for full replication would need to port the MCTS code to C which would be new to me and get the system distributing work across the GPUs correctly. Then it’d be a case of seeing how well it’s training and debugging failure—it all seems to work on simple games but even with the limited training I can do I think it should be better on Atari, though it’s hard to be sure. In total I guess a couple of months with access would be needed—it’s all pretty new to me, that’s why I’m doing it!
Are you that adverse to simple real world applications such as sidewalk delivery robots?
In terms of potential jobs? Yeah I’m sufficiently worried about AGI that I plan to at least exhaust my options for working on alignment full time before considering other employment, other than short-term ML work to help build skills.
Would a AWS G4ad.16xlarge instance be sufficient to match their setup? My open source robotics startup is not particularly well funded, but I am extremely interested in seeing someone replicate their results and could potentially help some with compute costs.
I don’t have the funds to offer a full time position anyway. It’s more that I would like to see reinforcement learning become practical for solving problems such as “get from your current gps position to this one without crashing” and your previous comments about improvements seem to indicate some opposition to that sort of thing due to concerns about where it could lead.
I would be interested in collaboration, but I am trying to solve a immediate real world problem.
So not sure you could do a perfect replication but you should be able to do a similar run to their 100K steps runs in less than a day I think.
I would also potentially be interested in collaboration—there are some things I’m not keen to help with and especially to publish on but I think we could probably work something out—I’ll send you a DM tomorrow.
If I may ask, has anyone been able to replicate their original results? I’ve been hesitant to sink many resources into it because it’s not clear.
Also, how much compute do you need? Are you that adverse to simple real world applications such as sidewalk delivery robots?
I can’t be sure I’ve not missed something but I haven’t found anything, did another search just now. Their code is all on github here so it could be quite easily checked with access to a cluster of GPUs.
Perhaps the more interesting question though, from a production perspective would be how well this system scales up to medium and large amounts of data/training. Does it fall behind MuZero and if so, when? If so, which of the algorithm changes cause this? What if the model complexity was brought back up to the size of MuZero?
It depends on how broken my system is. A single run on a single game they claim takes 7 hours on 4 20GB GPUs, so a proper evaluation would take perhaps a week. In reality I’d need to first get it set up, and for full replication would need to port the MCTS code to C which would be new to me and get the system distributing work across the GPUs correctly. Then it’d be a case of seeing how well it’s training and debugging failure—it all seems to work on simple games but even with the limited training I can do I think it should be better on Atari, though it’s hard to be sure. In total I guess a couple of months with access would be needed—it’s all pretty new to me, that’s why I’m doing it!
In terms of potential jobs? Yeah I’m sufficiently worried about AGI that I plan to at least exhaust my options for working on alignment full time before considering other employment, other than short-term ML work to help build skills.
Would a AWS G4ad.16xlarge instance be sufficient to match their setup? My open source robotics startup is not particularly well funded, but I am extremely interested in seeing someone replicate their results and could potentially help some with compute costs.
I don’t have the funds to offer a full time position anyway. It’s more that I would like to see reinforcement learning become practical for solving problems such as “get from your current gps position to this one without crashing” and your previous comments about improvements seem to indicate some opposition to that sort of thing due to concerns about where it could lead.
I would be interested in collaboration, but I am trying to solve a immediate real world problem.
Looking at:
Setup Property | AWS G4ad.16xlarge | Claimed Eff0 setup
n GPU | 4 | 4
n CPU | 64 | 96
memory/GPU | 8GB | 20GB
So not sure you could do a perfect replication but you should be able to do a similar run to their 100K steps runs in less than a day I think.
I would also potentially be interested in collaboration—there are some things I’m not keen to help with and especially to publish on but I think we could probably work something out—I’ll send you a DM tomorrow.
Sounds good. Thanks