You’re right, I used Claude in some parts like the energy bottleneck section (fed some of the Zuck interview transcript for example to digest the info for the post, for example), but rewrote what I thought should be rewritten and agree with the resulting post. Though, as I said, I didn’t want to spend more time on this to clarify things further so I just posted it.
Overall, I wanted to put something out quickly regarding photonic computing (and share thoughts on the OAI hiring) because I rarely hear compute and governance people talk about it. (However, someone working on governance did message me to say the same thing and they’ve recently been starting to look into this, so I’m happy it got me connected with them!)
Energy requirements are an issue locally, when you need to build a single large datacenter on short notice. With distributedtraining, you only need to care about a global energy budget. The world generates about 20,000 GW of power, the H100s fueled with 1% of that would cost trillions of dollars.
I think the crux for feasibility of further scaling (beyond $10-$50 billion) is whether systems with currently-reasonable cost keep getting sufficiently more useful, for example enable economically valuable agentic behavior, things like preparing pull requests based on feature/bug discussion on an issue tracker, or fixing failing builds.
Agreed and thanks for sharing that comment.
To clarify, my point was not that we would not have energy itself (a point I was trying to make by giving the percentage of energy usage it would be in the US and China) but, as you point out, the cost of energy itself and whether companies will seek out more energy-efficient forms of computing to reduce this cost. That is, if that amount of computing even becomes necessary for the capabilities you mentioned and more.
Scott Aaronson (Quantum computing expert) also works at OpenAI. But I don’t think he’s doing any quantum computer-related research there. As far as I know, he’s on the superalignment team.
Yeah, I knew that Scott was at OpenAI, but I wasn’t sure if he was still there. And his focus is on AI Safety though I don’t know if he’s on the superalignment team (does that include all OpenAI AI Safety researchers now?). However, it looks like he is (and may be leaving soon?):
For the 2022-2023 and 2023-2024 academic years, I’m on leave to work at OpenAI on the theoretical foundations of AI safety.
Photonic devices can perform certain operations, such as matrix multiplications (obviously important for deep learning), more efficiently than electronic processors.
In practice, no, they can’t. Optical transistors are less efficient. Analog matrix multiplies using light are less efficient. There’s no recent lab-scale approach that’s more efficient than semiconductor transistors either.
OK. Why would you consider it a realistic enough prospect to study it or write this post? I know there were people doing analog multiplies with light absorption, but even 8-bit analog data transmission with light uses more energy than an 8-bit multiply with transistors. The physics of optical transistors don’t seem compatible with lower energy than electrons. What hope do you think there is?
I think you are assuming optical transistors and photonic computing are the same thing, but they are not. Optical transistors are a component that could be used for photonic computing, but they are not necessary, and companies may have a better shot at getting photonic computing to work at scale without them.
Optical transistors try to function similarly to electronic transistors but use photons instead of electrons for signal processing. You are correct that optical transistors are not currently not great and it’s an active area of research to get it to work.
However, photonic computing is a broader concept that may or may not involve optical transistors as some of its components. Given the limitations of current optical transistors (as you point out), I understand that companies working on this typically use alternative photonic techniques to make it more feasible and practical for deep learning matrix multiplication.
Optical transistors are just not as technologically mature (and may never be) as photonic components like modulators and waveguides. For example, the paper I linked in the post is titled “Experimentally realized in situ backpropagation for deep learning in photonic neural networks”, they do not use optical transistors. Instead, they use some of the following components: Mach-Zehnder interferometers, thermo-optic phase shifters, Photonic integrated circuits, and Silicon photonic waveguides.
The final setup allows for matrix operations for backpropagation.
This post feels like many words in it came from a language model. I’d like to know whose opinion is whose.
You’re right, I used Claude in some parts like the energy bottleneck section (fed some of the Zuck interview transcript for example to digest the info for the post, for example), but rewrote what I thought should be rewritten and agree with the resulting post. Though, as I said, I didn’t want to spend more time on this to clarify things further so I just posted it.
Overall, I wanted to put something out quickly regarding photonic computing (and share thoughts on the OAI hiring) because I rarely hear compute and governance people talk about it. (However, someone working on governance did message me to say the same thing and they’ve recently been starting to look into this, so I’m happy it got me connected with them!)
Energy requirements are an issue locally, when you need to build a single large datacenter on short notice. With distributed training, you only need to care about a global energy budget. The world generates about 20,000 GW of power, the H100s fueled with 1% of that would cost trillions of dollars.
Agreed and thanks for sharing that comment.
To clarify, my point was not that we would not have energy itself (a point I was trying to make by giving the percentage of energy usage it would be in the US and China) but, as you point out, the cost of energy itself and whether companies will seek out more energy-efficient forms of computing to reduce this cost. That is, if that amount of computing even becomes necessary for the capabilities you mentioned and more.
Scott Aaronson (Quantum computing expert) also works at OpenAI. But I don’t think he’s doing any quantum computer-related research there. As far as I know, he’s on the superalignment team.
Yeah, I knew that Scott was at OpenAI, but I wasn’t sure if he was still there. And his focus is on AI Safety though I don’t know if he’s on the superalignment team (does that include all OpenAI AI Safety researchers now?). However, it looks like he is (and may be leaving soon?):
In practice, no, they can’t. Optical transistors are less efficient. Analog matrix multiplies using light are less efficient. There’s no recent lab-scale approach that’s more efficient than semiconductor transistors either.
I agree, I meant that this is the promise (which has yet to be realized).
OK. Why would you consider it a realistic enough prospect to study it or write this post? I know there were people doing analog multiplies with light absorption, but even 8-bit analog data transmission with light uses more energy than an 8-bit multiply with transistors. The physics of optical transistors don’t seem compatible with lower energy than electrons. What hope do you think there is?
I think you are assuming optical transistors and photonic computing are the same thing, but they are not. Optical transistors are a component that could be used for photonic computing, but they are not necessary, and companies may have a better shot at getting photonic computing to work at scale without them.
Optical transistors try to function similarly to electronic transistors but use photons instead of electrons for signal processing. You are correct that optical transistors are not currently not great and it’s an active area of research to get it to work.
However, photonic computing is a broader concept that may or may not involve optical transistors as some of its components. Given the limitations of current optical transistors (as you point out), I understand that companies working on this typically use alternative photonic techniques to make it more feasible and practical for deep learning matrix multiplication.
Optical transistors are just not as technologically mature (and may never be) as photonic components like modulators and waveguides. For example, the paper I linked in the post is titled “Experimentally realized in situ backpropagation for deep learning in photonic neural networks”, they do not use optical transistors. Instead, they use some of the following components: Mach-Zehnder interferometers, thermo-optic phase shifters, Photonic integrated circuits, and Silicon photonic waveguides.
The final setup allows for matrix operations for backpropagation.