Project #3: Aesthetic Selection in AI Image Generation
GitHub Repo
Built With: HPC (Slurm) | Python 3 | PyTorch | Diffusers (SDXL) | Matplotlib | X11 Forwarding
Practical Objective: Create an interactive workshop for the 2025 Envisioning AI at Yale Symposium applying evolutionary principles to AI image generation.
Learning Objective: Build fluency with high-performance computing and deployment of pre-built generative AI models.
Generative AI as an Evolutionary System
To bridge a conceptual gap between machine learning and evolution, I designed this pipeline to apply selection to the latent space of a diffusion model. The core loop of the exhibit:
- txt2img: I feed in three prompts and generate three images each (nine total), displayed on a large television
- For example: "a beach at sunset", "a river in a mountain valley", "a futuristic cityscape at night"
- Voting: Symposium attendees use a keyboard to vote for their favorite image in each category
- img2img: Once an image reaches a vote threshold, it is fed into an img2img diffusion to create three new variants for that prompt
As the loop continues, we theoretically generate increasingly pleasing images (at least, according to the crowd). We can map the parameters directly to evolutionary principles:
- Genotype and Inheritance: The seed image acts as the genetic code. We use
img2imggeneration to pass phenotypic traits to the next generation. - Mutation Rate: The denoising strength functions as the mutation rate. Too low and the image is visually identical; too high and the image loses its lineage.
- Selection Pressure: Symposium attendees act as the environment. Using a voting interface, they applied selective pressure, determining which phenotypes survived to reproduce.
HPC Architecture and Headless Interaction
Running Stable Diffusion XL on several images in real-time requires VRAM-rich compute that exceeds standard local capabilities. I deployed this workflow on Yale’s McCleary High-Performance Computing cluster, which presented unique engineering challenges regarding resource allocation and interactive visualization in a headless environment.
While I built the backend, YCRC's Sam Friedman helped me set up X11 forwarding so attendees had an interactive display:
- Backend (Compute): A Python script utilizing
accelerateandxformersintegrates heavy tensor operations and memory-efficient attention on A100 GPU nodes with the core logic described above. - Frontend (Visualization): A lightweight Matplotlib viewer script utilizes X11 forwarding to project the generated gallery from the headless cluster directly to the local displays via SSH, allowing for real-time audience feedback loops.
The biggest engineering hurdle was dependency management. Getting the "shifting sands" of modern AI libraries to play nicely with the rigid environment of an HPC cluster was a crash course in dependency hell.