This seems like a major case study for interpretability.
What you’d really want is to be able to ask the network “In what ways is this woman similar to the prompt?” and have it output a causality chain or something.
This seems like a major case study for interpretability.
What you’d really want is to be able to ask the network “In what ways is this woman similar to the prompt?” and have it output a causality chain or something.