But then one needs to factor in “simplicity” or the prior penalty from description length:
Note also that these are average effects; they are just for forming intuitions.
Your concern was:
is there a β such that BoMAI is both safe and intelligent enough to answer questions like “how to build a safe unbounded AGI” [after a reasonable number of episodes]?
This was the sort of thing I assumed could be improved upon later once the asymptotic result was established. Now that you’re asking for the improvement, here’s a proposal:
Set β safely. Once enough observations have been provided that you believe human-level AI should be possible, exclude world-models that use less than s←1 computation steps per episode. Every episode, increase s until human-level performance is reached. Under the assumption that the average computation time of a malign world-model is at least a constant times that of the “corresponding” benign one (corresponding in the sense of using the same ((coarse) approximate) simulation of the world), then s←αs should be safe for some α>1 (and α−1≉0).
I need to think more carefully about what happens here, but I think the design space is large.
Some visualizations which might help with this:
But then one needs to factor in “simplicity” or the prior penalty from description length:
Note also that these are average effects; they are just for forming intuitions.
Your concern was:
This was the sort of thing I assumed could be improved upon later once the asymptotic result was established. Now that you’re asking for the improvement, here’s a proposal:
Set β safely. Once enough observations have been provided that you believe human-level AI should be possible, exclude world-models that use less than s←1 computation steps per episode. Every episode, increase s until human-level performance is reached. Under the assumption that the average computation time of a malign world-model is at least a constant times that of the “corresponding” benign one (corresponding in the sense of using the same ((coarse) approximate) simulation of the world), then s←αs should be safe for some α>1 (and α−1≉0).
I need to think more carefully about what happens here, but I think the design space is large.
Fixed your images. You have to press space after you use that syntax for the images to actually get fetched and displayed. Sorry for the confusion.
Thanks!