Do you mean model’s policy as it works on a query, or learning as it works on a dataset? Or something specific to stable diffusion? What is the sample space here, and what are the actions that decisions choose between?
You’re talking about the score function, right? Which is the derivative of the log probability density function. I dunno how to get from there to a utility function interpretation. Like, we don’t produce samples from the model by globally maximizing over the PDF (at worst, trying that might produce an adversarial example, and at best, that would sample the “most modal” image).
Do you mean model’s policy as it works on a query, or learning as it works on a dataset? Or something specific to stable diffusion? What is the sample space here, and what are the actions that decisions choose between?
score based models, such as diffusion, work by modeling the derivative of the utility function (density function) over examples, I believe?
see, eg, https://lilianweng.github.io/posts/2021-07-11-diffusion-models/ or any of the other recommended posts at the top.
actions are denoising steps. sample space is output space, ie image space for stable diffusion.
You’re talking about the score function, right? Which is the derivative of the log probability density function. I dunno how to get from there to a utility function interpretation. Like, we don’t produce samples from the model by globally maximizing over the PDF (at worst, trying that might produce an adversarial example, and at best, that would sample the “most modal” image).
ah, okay. yup, you’re right, that’s what I was referring to. I am now convinced I was wrong in my original comment!