With respect to the RLNF idea, we are definitely very sympathetic to wireheading concerns. We think that approach is promising if we are able to obtain better reward signals given all of the sub-symbolic information that neural signals can offer in order to better understand human intent, but as you correctly pointed out that can be used to better trick the human evaluator as well. We think this alreadyhappens to a lesser extent and we expect that both current methods and future ones have to account for this particular risk.
More generally, we strongly agree that building out BCI is like a tightrope walk. Our original theory of change explicitly focuses on this: in expectation, BCI is not going to be built safely by giant tech companies of the world, largely given short-term profit-related incentives—which is why we want to build it ourselves as a bootstrapped company whose revenue has come from things other than BCI. Accordingly, we can focus on walking this BCI developmental tightrope safely and for the benefit of humanity without worrying if we profit from this work.
We do call some of these concerns out in the post, eg:
We also recognize that many of these proposals have a double-edged sword quality that requires extremely careful consideration—e.g., building BCI that makes humans more competent could also make bad actors more competent, give AI systems manipulation-conducive information about the processes of our cognition that we don’t even know, and so on. We take these risks very seriously and think that any well-defined alignment agenda must also put forward a convincing plan for avoiding them (with full knowledge of the fact that if they can’t be avoided, they are not viable directions.)
Overall—in spite of the double-edged nature of alignment work potentially facilitating capabilities breakthroughs—we think it is critical to avoid base rate neglect in acknowledging how unbelievablyaggressively people (who are generally alignment-ambivalent) are now pushing forward capabilities work. Against this base rate, we suspect our contributions to inadvertently pushing forward capabilities will be relatively negligible. This does not imply that we shouldn’t be extremely cautious, have rigorous info/exfohazard standards, think carefully about unintended consequences, etc—it just means that we want to be pragmatic about the fact that we can help solve alignment while being reasonably confident that the overall expected value of this work will outweigh the overall expected harm (again, especially given the incredibly high, already-happening background rate of alignment-ambivalent capabilities progress).
More generally, we strongly agree that building out BCI is like a tightrope walk. Our original theory of change explicitly focuses on this: in expectation, BCI is not going to be built safely by giant tech companies of the world, largely given short-term profit-related incentives—which is why we want to build it ourselves as a bootstrapped company whose revenue has come from things other than BCI. Accordingly, we can focus on walking this BCI developmental tightrope safely and for the benefit of humanity without worrying if we profit from this work.
I can push back on this somewhat by noting that most risks from BCI may lay outside of the scope of control of any company that builds it and “plugs people in”, but rather in the wider economy and social ecosystem. The only thing that may matter is the bandwidth and the noisiness of information channel between the brain and the digital sphere, and it seems agnostic to whether a profit-maximising, risk-ambivalent, or a risk-conscious company is building the BCI.
It’s a great point that the broader social and economic implications of BCI extend beyond the control of any single company, AE no doubt included. Still, while bandwidth and noisiness of the tech are potentially orthogonal to one’s intentions, companies with unambiguous humanity-forward missions (like AE) are far more likely to actually care about the societal implications, and therefore, to build BCI that attempts to address these concerns at the ground level.
In general, we expect the by-default path to powerful BCI (i.e., one where we are completely uninvolved) to be negative/rife with s-risks/significant invasions of privacy and autonomy, etc, which is why we are actively working to nudge the developmental trajectory of BCI in a more positive direction—i.e., one where the only major incentive is build the most human-flourishing-conducive BCI tech we possibly can.
With respect to the RLNF idea, we are definitely very sympathetic to wireheading concerns. We think that approach is promising if we are able to obtain better reward signals given all of the sub-symbolic information that neural signals can offer in order to better understand human intent, but as you correctly pointed out that can be used to better trick the human evaluator as well. We think this already happens to a lesser extent and we expect that both current methods and future ones have to account for this particular risk.
More generally, we strongly agree that building out BCI is like a tightrope walk. Our original theory of change explicitly focuses on this: in expectation, BCI is not going to be built safely by giant tech companies of the world, largely given short-term profit-related incentives—which is why we want to build it ourselves as a bootstrapped company whose revenue has come from things other than BCI. Accordingly, we can focus on walking this BCI developmental tightrope safely and for the benefit of humanity without worrying if we profit from this work.
We do call some of these concerns out in the post, eg:
Overall—in spite of the double-edged nature of alignment work potentially facilitating capabilities breakthroughs—we think it is critical to avoid base rate neglect in acknowledging how unbelievably aggressively people (who are generally alignment-ambivalent) are now pushing forward capabilities work. Against this base rate, we suspect our contributions to inadvertently pushing forward capabilities will be relatively negligible. This does not imply that we shouldn’t be extremely cautious, have rigorous info/exfohazard standards, think carefully about unintended consequences, etc—it just means that we want to be pragmatic about the fact that we can help solve alignment while being reasonably confident that the overall expected value of this work will outweigh the overall expected harm (again, especially given the incredibly high, already-happening background rate of alignment-ambivalent capabilities progress).
I can push back on this somewhat by noting that most risks from BCI may lay outside of the scope of control of any company that builds it and “plugs people in”, but rather in the wider economy and social ecosystem. The only thing that may matter is the bandwidth and the noisiness of information channel between the brain and the digital sphere, and it seems agnostic to whether a profit-maximising, risk-ambivalent, or a risk-conscious company is building the BCI.
It’s a great point that the broader social and economic implications of BCI extend beyond the control of any single company, AE no doubt included. Still, while bandwidth and noisiness of the tech are potentially orthogonal to one’s intentions, companies with unambiguous humanity-forward missions (like AE) are far more likely to actually care about the societal implications, and therefore, to build BCI that attempts to address these concerns at the ground level.
In general, we expect the by-default path to powerful BCI (i.e., one where we are completely uninvolved) to be negative/rife with s-risks/significant invasions of privacy and autonomy, etc, which is why we are actively working to nudge the developmental trajectory of BCI in a more positive direction—i.e., one where the only major incentive is build the most human-flourishing-conducive BCI tech we possibly can.