Yep, aside from running forward prop n times to generate an output of length n, we can just optimise the mean probability of the target tokens at each position in the output—it’s already implemented in the code. Although, it takes way longer to find optimal completions.
Yep, aside from running forward prop n times to generate an output of length n, we can just optimise the mean probability of the target tokens at each position in the output—it’s already implemented in the code. Although, it takes way longer to find optimal completions.