Pretty fun! The feature browsing ui did slightly irritate after a while, but hey, it’s a research project. I think this is about 15 to 20 features, each of which I added at high magnitude to test, then turned down to background levels. Was hoping for something significantly more surreal, but eventually got bored of trying to browse features. Would be cool to have a range of features of similarity ranging from high to low in the browse view.
As is typical for current work, the automatic feature labeling and previewing seems to leave something to be desired in terms of covering the variety of meanings of a feature, and Gemini’s views on some of the features were… well, I found one feature that seemed to be casual photos of trans ladies in nice dresses, and Gemini had labeled it “Harmful stereotypes”. Thanks google, sigh. Anyway, I might suggest Claude next time; I’ve been pretty happy with Claude’s understanding of sensitive issues, as well as general skill. Though of course none of these AIs really are properly there yet in terms of not being kind of dorks about this stuff.
If this were steering, eg, a diffusion planning model controlling a high strength robot, it wouldn’t be there yet in terms of getting enough nines of safety; but being able to control the features directly, bypassing words, is a pretty fun capability, definitely fun to see it for images. The biggest issue I notice is that trying to control with features has a tendency to rapidly push you way out of distribution and break the model’s skill. Of course I also have my usual reservations about anything being improved without being part of a backchained plan. I wouldn’t call this mutation-robust alignment or ambitious value learning or anything, but it’s pretty fun to play with.
A section I was writing but then removed due to time constraints involved setting inference time rules. I found that they can actually work pretty well and you could ban features entirely or ban features conditionally and some other feature being present. For instance, to not show natural disasters when some subjects are in the image. But I thought this was pretty obvious, so I got bored of it.
Pretty fun! The feature browsing ui did slightly irritate after a while, but hey, it’s a research project. I think this is about 15 to 20 features, each of which I added at high magnitude to test, then turned down to background levels. Was hoping for something significantly more surreal, but eventually got bored of trying to browse features. Would be cool to have a range of features of similarity ranging from high to low in the browse view.
As is typical for current work, the automatic feature labeling and previewing seems to leave something to be desired in terms of covering the variety of meanings of a feature, and Gemini’s views on some of the features were… well, I found one feature that seemed to be casual photos of trans ladies in nice dresses, and Gemini had labeled it “Harmful stereotypes”. Thanks google, sigh. Anyway, I might suggest Claude next time; I’ve been pretty happy with Claude’s understanding of sensitive issues, as well as general skill. Though of course none of these AIs really are properly there yet in terms of not being kind of dorks about this stuff.
If this were steering, eg, a diffusion planning model controlling a high strength robot, it wouldn’t be there yet in terms of getting enough nines of safety; but being able to control the features directly, bypassing words, is a pretty fun capability, definitely fun to see it for images. The biggest issue I notice is that trying to control with features has a tendency to rapidly push you way out of distribution and break the model’s skill. Of course I also have my usual reservations about anything being improved without being part of a backchained plan. I wouldn’t call this mutation-robust alignment or ambitious value learning or anything, but it’s pretty fun to play with.
Thanks for trying it out!
A section I was writing but then removed due to time constraints involved setting inference time rules. I found that they can actually work pretty well and you could ban features entirely or ban features conditionally and some other feature being present. For instance, to not show natural disasters when some subjects are in the image. But I thought this was pretty obvious, so I got bored of it.
Definitely right on the Gemini point!