I took an hour or so this morning to play with Stable Diffusion. Stable Diffusion is the most recent example of the text-to-image AI models. It’s free & open source and runnable without a beefy computer or writing code. In short, I think just about everyone should give it a shot.
I followed this tutorial to get it up and running in a free Google Colab session.
Here are a few of the prompts I used, the images that were generated, and some thoughts on each:
an elevated scenic overlook of a wheat field in the style of monet
I’m a sucker for Monet. 🙂 I was trying to get something a little more of a scenic vista, but this seemed lovely. Stable Diffusion was trained on the body of work of lots and lots of famous artists. You can get great looking output by replacing Monet with Rembrandt, Picasso, Van Gogh, etc.
a magic the gathering card of a flying hippo
This isn’t what I expected. I wanted to see if it would generate something that looks like a MtG card, with the border, frame, and game text. But, no dice. Stylistically, the art does look like something that could be on a MtG card.
a photo of a ninja turtle statue by michelangelo
It took me several tries to get something out of this prompt. Stable Diffusion refuses to return images that trigger a pornography/nsfw filter. I guess it generated several of these sans fig leaf. Another note – I only mentioned the statue. I’m not sure where the brick building in the background came from.
a dogfight in space in the style of battlestar galactica with a blackhole in the background
Again, not what I was expecting. I think that if I had spent more time figuring out the prompt engineering process I could get something that is closer to the input, but this is still a pretty neat image. I think it is fascinating to see the explosion and the planet in the background since those are not mentioned at all in the prompt.
a photo of a green logo that looks like a mile marker for a company named mile two
I love that it looks like a sticker, sitting on the grass. 🙂 Stable Diffusion (like the other text-to-image models does not do well with including text in the image. Still, I sorta like this logo. 🙂
a bright futuristic cityscape during the day in the style of a soviet propaganda poster
Another style that I love! The rich pastels and stark lines look lovely.
a 1950s style rocketship in space with saturn in the background in the style of picasso
I fiddled around with this prompt for a while. Originally, I had Van Gogh instead of Picasso, but the images all seemed super derivative of Starry Night. This doesn’t scream Picasso to me, but I thought it was a fun, almost child-like image.
a color photo of godzilla destroying dayton ohio
I generated a bunch of images from this prompt. It look several tries to get something that looks like both Godzilla and Dayton. That is not the Dayton skyline, but a couple of those buildings look similar to buildings in Dayton. I like the almost collage look of this image.
a photo of a board game box about collecting butterflies and building railroads
Another example of non-sensical text. I thought a board game box might be easier to generate than a Magic card. It was. I generated several of these; I was surprised by the variety of the results. The box was at different angles, some had the box staged behind some game components on a table, etc. I want to play this game now. 🙂
My kids are at a school musical rehearsal this morning. I plan on having them generate some images later today.
How long until you think we’ll have prompt-generated metaverses for AR?