Agree that some way of talking is useful, but disagree with where you ended up. “Human-level” is part of an ontology from far away, and the need you’re pointing to is for an ontology from close up.
Here’s what I mean by “far away” vs “close up”:
When I’m two miles from my house, “towards my house” is a well-defined direction. When I’m inside my house, “towards my house” is meaningless as a direction to move. The close up ontology needs to expand details that the far away one compresses.
For operating when we’re close to human level AI, we don’t want “human level” to remain a basic location, we want to split up our map into finer details. We could split cognition into skills like working memory, knowledge, planning, creativity, learning, etc., splitting them further or adding dimensions for their style and not just their quality if needed.
But I also think that maybe there’s a small refinement of these definitions which saves most of the value for the purposes of planning for safety.
If you simply narrow the focus of the terms to “world optimization power”, then I think it basically holds.
Subskills, like working memory, may or may not be a part of a given entity being evaluated. The evaluation metric doesn’t care if it is just measuring the downstream output of world optimization.
Admittedly, this isn’t something we have particularly good measures of. It’s a wide target to try to cover with a static eval. But I still think it works conceptually, so long as we acknowledge that our measures of it are limited approximations.
Agree that some way of talking is useful, but disagree with where you ended up. “Human-level” is part of an ontology from far away, and the need you’re pointing to is for an ontology from close up.
Here’s what I mean by “far away” vs “close up”:
When I’m two miles from my house, “towards my house” is a well-defined direction. When I’m inside my house, “towards my house” is meaningless as a direction to move. The close up ontology needs to expand details that the far away one compresses.
For operating when we’re close to human level AI, we don’t want “human level” to remain a basic location, we want to split up our map into finer details. We could split cognition into skills like working memory, knowledge, planning, creativity, learning, etc., splitting them further or adding dimensions for their style and not just their quality if needed.
Yeah, that’s a good point.
But I also think that maybe there’s a small refinement of these definitions which saves most of the value for the purposes of planning for safety.
If you simply narrow the focus of the terms to “world optimization power”, then I think it basically holds.
Subskills, like working memory, may or may not be a part of a given entity being evaluated. The evaluation metric doesn’t care if it is just measuring the downstream output of world optimization.
Admittedly, this isn’t something we have particularly good measures of. It’s a wide target to try to cover with a static eval. But I still think it works conceptually, so long as we acknowledge that our measures of it are limited approximations.