Could you describe the experiment you ran on all theses models? Like ‘if there are three boxes side by side in a line and each can hold one item and the red triangle is not in the middle and the blue circle is not in the box next to the box with a red triangle in it where is the green circle? ’. Chatgpt was not able to solve logic puzzles a year ago and can do it now.
I don’t really “run experiments” on models, in a systemic personal capacity. Other people are much better at that, and I believe I’d linked a few examples in the post. I do replicate the occasional experiment, and run some myself if there’s something I’d like to check… But broadly, at this point, I don’t expect any compact, self-contained puzzle to be a good measure of “are we getting AGIer yet?”.
My direct engagement with models mostly consists of feeding them research papers to process them faster, asking clarifying questions about math/physics, using Deep Research for varyingly targeted literature surveys, and chatting with them about whatever theoretical/philosophical problems I happen to be working on at a given moment. Those function pretty well as a measure of insight/innovativeness: of whether the AI is assembling a precise model of what’s happening and what we’re doing, and then runs internal queries on that model to move the interaction in the direction of greater understanding, vs. producing very sophisticated remixes of existing templates in a fundamentally sleepwalk-y manner.
Could you describe the experiment you ran on all theses models? Like ‘if there are three boxes side by side in a line and each can hold one item and the red triangle is not in the middle and the blue circle is not in the box next to the box with a red triangle in it where is the green circle? ’. Chatgpt was not able to solve logic puzzles a year ago and can do it now.
I don’t really “run experiments” on models, in a systemic personal capacity. Other people are much better at that, and I believe I’d linked a few examples in the post. I do replicate the occasional experiment, and run some myself if there’s something I’d like to check… But broadly, at this point, I don’t expect any compact, self-contained puzzle to be a good measure of “are we getting AGIer yet?”.
My direct engagement with models mostly consists of feeding them research papers to process them faster, asking clarifying questions about math/physics, using Deep Research for varyingly targeted literature surveys, and chatting with them about whatever theoretical/philosophical problems I happen to be working on at a given moment. Those function pretty well as a measure of insight/innovativeness: of whether the AI is assembling a precise model of what’s happening and what we’re doing, and then runs internal queries on that model to move the interaction in the direction of greater understanding, vs. producing very sophisticated remixes of existing templates in a fundamentally sleepwalk-y manner.
It’s been that second one every time so far.