Theory and Application of Image Generation.

def plot_label_clusters(vae, data, labels): # display a 2D plot of the digit classes in the latent space z_mean, _, _ = vae.encoder.predict(data) plt.figure(figsize=(12, 10)) plt.scatter(z_mean[:, 0], z_mean[:, 1], c=labels) plt.colorbar() plt.xlabel("z[0]") plt.ylabel("z[1]") plt.show() (x_train, y_train), _ = keras.datasets.mnist.load_data() x_train = np.expand_dims(x_train, -1).astype("float32") / 255 plot_label_clusters(vae, x_train, y_train)

class Sampling(layers.Layer): """Uses (z_mean, z_log_var) to sample z, the vector encoding a digit.""" def call(self, inputs): z_mean, z_log_var = inputs batch = tf.shape(z_mean)[0] dim = tf.shape(z_mean)[1] epsilon = tf.keras.backend.random_normal(shape=(batch, dim)) return z_mean + tf.exp(0.5 * z_log_var) * epsilon

from tensorflow import keras import keras_cv keras.mixed_precision.set_global_policy("mixed_float16") model = keras_cv.models.StableDiffusion(jit_compile=True) images = model.text_to_image( "Teddy bears conducting machine learning research", batch_size=4, ) plot_images(images)

Step 1: collect 3-5 images of your object

urls = [
    "https://i.imgur.com/VIedH1X.jpg",
    "https://i.imgur.com/iLkM4Ar.jpg",
    "https://i.imgur.com/eBw13hE.png",
]
files = [tf.keras.utils.get_file(origin=url) for url in urls]
# Resize images
resize = keras.layers.Resizing(height=512, width=512, crop_to_aspect_ratio=True)
images = [keras.utils.load_img(img) for img in files]
images = [keras.utils.img_to_array(img) for img in images]
images = np.array([resize(img) for img in images])
visualization.plot_gallery(images, value_range=(0, 255), rows=1, cols=3)

your_token = '<any-special-name>' templates = [ "a photo of a {}", "a rendering of a {}", "a cropped photo of the {}", "the photo of a {}", # ... ] templates = [t.format(your_token) for t in templates] # Construct a TensorFlow dataset of the images + tokens image_dataset = tf.data.Dataset.from_tensor_slices(images) text_dataset = tf.data.Dataset.from_tensor_slices(templates) # ... there is a bit more boilerplate to pre-process the text train_ds = tf.data.Dataset.zip( (image_dataset.shuffle(), text_dataset.shuffle()) )

stable_diffusion.diffusion_model.trainable = False stable_diffusion.decoder.trainable = False stable_diffusion.text_encoder.trainable = True trainer = StableDiffusionFineTuner(stable_diffusion, name="trainer") optimizer = keras.optimizers.SGD(learning_rate=5e-4) trainer.compile(optimizer=optimizer, loss="mse") # trainer trains the StableDiffusion model for you. trainer.fit( train_ds, epochs=10, steps_per_epoch=200 )

Theory and Application of Image Generation

The Code, Slides, Demos

About me

Background in Generative Modeling

Generative modeling, why should you care...

Historically you could....

Generate fake shoe pictures

Learn the latent space of a dataset!

Generate DeepFakes

All quite interesting...

Until... DALL-E 2!

And then... StableDiffusion!

But that's not all!

Now that I have your attention...

Representations & Continuity

AutoEncoders

Flash forward to the 2010s

AutoEncoders are a form of compression

Caveats

... but what happens in between real samples?

Generate new images!

Continuity!

A quick aside on Variational AutoEncoders (VAEs)...

Any Questions?

Congratulations!

Diffusion Models

Super-resolution

Push super resolution to the limit!

More reading on keras.io

Any questions?

Latent diffusion models

CLIP

We just need the text encoder

CLIP

the Final Piece...

Conditioning

That's All!

How do I use it?

Text to Image Generation

Code:

Variation generation

Switch it out!

Textual Inversion

Step 1: collect 3-5 images of your object

Step 2: add a special token to the model vocabulary

Step 3: construct an image-caption dataset

Step 4: Fine Tune the TextEncoder with your new dataset!

Results

Results

Demo Time

Conclusions

More Workflows Coming Soon

Other links

Thank you!

References: