I’m putting this here rather than in the collapsed thread, but I really think the initial post (before the edit) was at the very least careless. There is a widespread habit in tech publications, especially in IA, to pretend results are better than what they actually are—I would hope that Lesswrong, with its commitment to truth-seeking and distrust of medias, would do better...
So, the edit says “However the Yudkowsky lines were also cherry picked. I ran several iterations, sometimes modifying my prompts, until I got good responses.”. So, how were they cherry-picked exactly ? Did you take the best one out of 2 ? Out of 10 ? Out of 100 ? Did you picked half an answer, then completed with half an answer from an other prompt ? How bad were the rejected answers ?
I don’t see the answer that eventually made it to the article in the answers to prompt 2 in your comment with the un-curated answers. How was it obtained ?
Without this kind of information, it is just impossible to evaluate how good GPT-3 is at what it does (it is good certainly, but how good ?).
So, how were they cherry-picked exactly ? Did you take the best one out of 2 ? Out of 10 ? Out of 100 ?
I wasn’t counting. Rarely more than 10. Sometimes the first answer just worked. Never did I come anywhere close to 100.
More important than throwing out answers was how often I changed prompts. Some prompts prompt much better input than others.
Did you picked half an answer, then completed with half an answer from an other prompt ?
No. Though I did sometimes keep half an answer I liked and then autocompleted from there instead of rerolling the dice from scratch. Sometimes the answer kept going and going and I truncated it early.
There were lots of edge cases. In one instance, Robin Hanson butted in, which I edited out.
…your comment with the un-curated answers. How was it obtained ?
I didn’t keep the answers I threw out. The uncurated answers were created specially for that comment.
This is true inasmuch as posts written with help from GPT-3 are meant to be evidence about the capabilities of GPT-3.
Sometimes posts are primarily intended to be fun, and success is measured by how fun they are, and then I don’t care how much iteration you put into it, I just want it to be fun.
I guess this was a combo, because it’s about simulation? So your question is reasonable. FYI from having played with GPT-3 myself, I assumed something like Lsusr had run multipl (3-15) iterations and partial iterations on each segment and thrown bits out and thrown whole other segments out. That said it was probably clearer to me because I’ve written with GPT-3 myself, and someone who hasn’t could’ve been under the impression this was just the first pass.
I’m putting this here rather than in the collapsed thread, but I really think the initial post (before the edit) was at the very least careless. There is a widespread habit in tech publications, especially in IA, to pretend results are better than what they actually are—I would hope that Lesswrong, with its commitment to truth-seeking and distrust of medias, would do better...
So, the edit says “However the Yudkowsky lines were also cherry picked. I ran several iterations, sometimes modifying my prompts, until I got good responses.”. So, how were they cherry-picked exactly ? Did you take the best one out of 2 ? Out of 10 ? Out of 100 ? Did you picked half an answer, then completed with half an answer from an other prompt ? How bad were the rejected answers ?
I don’t see the answer that eventually made it to the article in the answers to prompt 2 in your comment with the un-curated answers. How was it obtained ?
Without this kind of information, it is just impossible to evaluate how good GPT-3 is at what it does (it is good certainly, but how good ?).
I wasn’t counting. Rarely more than 10. Sometimes the first answer just worked. Never did I come anywhere close to 100.
More important than throwing out answers was how often I changed prompts. Some prompts prompt much better input than others.
No. Though I did sometimes keep half an answer I liked and then autocompleted from there instead of rerolling the dice from scratch. Sometimes the answer kept going and going and I truncated it early.
There were lots of edge cases. In one instance, Robin Hanson butted in, which I edited out.
I didn’t keep the answers I threw out. The uncurated answers were created specially for that comment.
This is true inasmuch as posts written with help from GPT-3 are meant to be evidence about the capabilities of GPT-3.
Sometimes posts are primarily intended to be fun, and success is measured by how fun they are, and then I don’t care how much iteration you put into it, I just want it to be fun.
I guess this was a combo, because it’s about simulation? So your question is reasonable. FYI from having played with GPT-3 myself, I assumed something like Lsusr had run multipl (3-15) iterations and partial iterations on each segment and thrown bits out and thrown whole other segments out. That said it was probably clearer to me because I’ve written with GPT-3 myself, and someone who hasn’t could’ve been under the impression this was just the first pass.