Preliminary benchmarks had shown poor results. It seems that dataset quality is much worse compared to what LLaMA had or maybe there is some other issue.
Yet another proof that top-notch LLMs are not just data + compute, they require some black magic.
Generally, I am not sure if it’s bad for safety in the notkilleveryoneism sense: such things prevent agent overhang and make current (non-lethal) problems more visible.
Hard to say if net good or net bad, too many factors and the impact of each are not clear.
Preliminary benchmarks had shown poor results. It seems that dataset quality is much worse compared to what LLaMA had or maybe there is some other issue.
Yet another proof that top-notch LLMs are not just data + compute, they require some black magic.
Generally, I am not sure if it’s bad for safety in the notkilleveryoneism sense: such things prevent agent overhang and make current (non-lethal) problems more visible.
Hard to say if net good or net bad, too many factors and the impact of each are not clear.