you need to ensure that your model is aligned, robust, and reliable (at least if you want to deploy it and get economic value from it).
I think it suffices for the model to be inner aligned (or deceptively inner aligned) for it to have economic value, at least in domains where (1) there is a usable training signal that corresponds to economic value (e.g. users’ time spent in social media platforms, net income in algo-trading companies, or even the stock price in any public company); and (2) the downside economic risk from a non-robust behavior is limited (e.g. an algo-trading company does not need its model to be robust/reliable, assuming the downside risk from each trade is limited by design).
Sure, I mean, logistic regression has had economic value and it doesn’t seem meaningful to me to say whether it is “aligned” or “inner aligned”. I’m talking about transformative AI systems, where downside risk is almost certainly not limited.
We might get TAI due to efforts by, say, an algo-trading company that develops trading AI systems. The company can limit the mundane downside risks that it faces from non-robust behaviors of its AI systems (e.g. by limiting the fraction of its fund that the AI systems control). Of course, the actual downside risk to the company includes outcomes like existential catastrophes, but it’s not clear to me why we should expect that prior to such extreme outcomes their AI systems would behave in ways that are detrimental to economic value.
I predict that this will not lead to transformative AI; I don’t see how an algorithmic trading system leads to an impact on the world comparable to the industrial revolution.
You can tell a story where you get an Eliezer-style near-omniscient superintelligent algorithmic trading system that then reshapes the world because it is a superintelligence, and that the researchers thought that it was not a superintelligence and so assumed that the downside risk was bounded, but both clauses (Eliezer-style superintelligence and researchers being horribly miscalibrated) seem unlikely to me.
My point here is that in a world where an algo-trading company has the lead in AI capabilities, there need not be a point in time (prior to an existential catastrophe or existential security) where investing more resources into the company’s safety-indifferent AI R&D does not seem profitable in expectation. This claim can be true regardless of researchers’ observations beliefs and actions in given situations.
I think it suffices for the model to be inner aligned (or deceptively inner aligned) for it to have economic value, at least in domains where (1) there is a usable training signal that corresponds to economic value (e.g. users’ time spent in social media platforms, net income in algo-trading companies, or even the stock price in any public company); and (2) the downside economic risk from a non-robust behavior is limited (e.g. an algo-trading company does not need its model to be robust/reliable, assuming the downside risk from each trade is limited by design).
Sure, I mean, logistic regression has had economic value and it doesn’t seem meaningful to me to say whether it is “aligned” or “inner aligned”. I’m talking about transformative AI systems, where downside risk is almost certainly not limited.
We might get TAI due to efforts by, say, an algo-trading company that develops trading AI systems. The company can limit the mundane downside risks that it faces from non-robust behaviors of its AI systems (e.g. by limiting the fraction of its fund that the AI systems control). Of course, the actual downside risk to the company includes outcomes like existential catastrophes, but it’s not clear to me why we should expect that prior to such extreme outcomes their AI systems would behave in ways that are detrimental to economic value.
I predict that this will not lead to transformative AI; I don’t see how an algorithmic trading system leads to an impact on the world comparable to the industrial revolution.
You can tell a story where you get an Eliezer-style near-omniscient superintelligent algorithmic trading system that then reshapes the world because it is a superintelligence, and that the researchers thought that it was not a superintelligence and so assumed that the downside risk was bounded, but both clauses (Eliezer-style superintelligence and researchers being horribly miscalibrated) seem unlikely to me.
My point here is that in a world where an algo-trading company has the lead in AI capabilities, there need not be a point in time (prior to an existential catastrophe or existential security) where investing more resources into the company’s safety-indifferent AI R&D does not seem profitable in expectation. This claim can be true regardless of researchers’ observations beliefs and actions in given situations.