They’re terrible at hands though (which has ruined many otherwise good images for me). That post used Stable Diffusion 1.5, but even the latest SD 3.0 (with versions 2.0, 2.1, XL, Stable Cascade in between) is still terrible at it.
Don’t really know how relevant this is to your point/question about fragility of human values, but thought I’d mention it since it seems plausibly as relevant as AIs being able to generate photorealistic human faces.
I think it actually points to convergence between human and NN learning dynamics. Human visual cortices are also bad at hands and text, to the point that lucid dreamers often look for issues with their hands / nearby text to check whether they’re dreaming.
One issue that I think causes people to underestimate the degree of convergence between brain and NN learning is to compare the behaviors of entire brains to the behaviors of individual NNs. Brains consist of many different regions which are “trained” on different internal objectives, then interact with each other to collectively produce human outputs. In contrast, most current NNs contain only one “region”, which is all trained on the single objective of imitating certain subsets of human behaviors.
We should thus expect NN learning dynamics to most resemble those of single brain regions, and that the best match for humanlike generalization patterns will arise from putting together multiple NNs that interact with each other in a similar manner as human brain regions.
Scale basically solves this too, with some other additions (not part of any released version of MJ yet) really putting a nail in the coffin, but I can’t say too much here w/o divulging trade secrets.
I can say that I’m surprised to hear that SD3 is still so much worse than Dalle3, Ideogram on that front—I wonder if they just didn’t train it long enough?
They’re terrible at hands though (which has ruined many otherwise good images for me). That post used Stable Diffusion 1.5, but even the latest SD 3.0 (with versions 2.0, 2.1, XL, Stable Cascade in between) is still terrible at it.
Don’t really know how relevant this is to your point/question about fragility of human values, but thought I’d mention it since it seems plausibly as relevant as AIs being able to generate photorealistic human faces.
I think it actually points to convergence between human and NN learning dynamics. Human visual cortices are also bad at hands and text, to the point that lucid dreamers often look for issues with their hands / nearby text to check whether they’re dreaming.
One issue that I think causes people to underestimate the degree of convergence between brain and NN learning is to compare the behaviors of entire brains to the behaviors of individual NNs. Brains consist of many different regions which are “trained” on different internal objectives, then interact with each other to collectively produce human outputs. In contrast, most current NNs contain only one “region”, which is all trained on the single objective of imitating certain subsets of human behaviors.
We should thus expect NN learning dynamics to most resemble those of single brain regions, and that the best match for humanlike generalization patterns will arise from putting together multiple NNs that interact with each other in a similar manner as human brain regions.
Scale basically solves this too, with some other additions (not part of any released version of MJ yet) really putting a nail in the coffin, but I can’t say too much here w/o divulging trade secrets. I can say that I’m surprised to hear that SD3 is still so much worse than Dalle3, Ideogram on that front—I wonder if they just didn’t train it long enough?