This Bird Illustration Does Not Exist: Using Machine Learning and BHL Flickr Images to Produce “New” Bird Images
A version of this post was originally published on the Cogapp blog on 30 March 2021.
I work as a web developer for the agency Cogapp, which is based in Brighton, UK. We create websites and other digital services for museums, art galleries, archives and the like, but every couple of months we hold a “hack day.” A hack day involves spending a day working on projects which generally revolve around a particular theme and which ideally we can do in one day. This allows us to get the creative juices flowing and to further our agenda of innovation.
The theme this past hack day at Cogapp was “Museum APIs,” but the looser interpretation was that we were to use open data provided by museums in our projects. I was inspired by the Biodiversity Heritage Library’s Flickr, which is a massive collection of free-to-use scientific images. I immediately knew I wanted to utilise this resource as I love scientific illustrations of nature.
I’ve also had an interest in Machine Learning (ML) for a while, and I recently discovered Derrick Schultz and his YouTube channel Artificial Images. Here, he publishes videos of his Machine Learning courses which he runs for people who want to use ML for creative purposes.
I watched Derrick’s tutorials on training a StyleGAN Neural Network and the things he was saying made a degree of sense to me, plus he had published a handy Google Colab notebook with step-by-step code, so I decided it was something I might be able to have a go at.
What is a StyleGAN?
StyleGAN stands for Style Generative Adversarial Network and was originally developed by NVLabs. The purpose of a StyleGAN is to train and learn from input images and produce its own “fake” images back. You may have seen the site ThisPersonDoesNotExist.com, which features fake photos of people which a StyleGAN has produced. This network has been trained on thousands of images of people’s faces and has in turn learned to produce new images which (for the most part) look like real photographs of people’s faces.
A StyleGAN does this by processing through the set of real images it is given numerous times during which it also produces its own images. The “adversarial” part of the name refers to the fact that a StyleGAN actually comprises two Neural Networks which work against each other. The first network produces the fake images and the second will look at those images and try to determine whether it thinks they are “real” images or “fake.” The longer each network trains, the better each gets at its job, and the more likely the images produced will trick the second network and ultimately also humans into thinking they’re real.
StyleGAN has been updated a few times and in 2020 StyleGAN2-ADA was released, which now allows us to train a network on very few images, actually as few as 500–1000 as opposed to 10,000s just a couple of years ago, and it will learn to produce good quality fakes in a very short period of time (a few hours of training). This advancement allowed me to feel like I could attempt to train my own, as 1000 images was something I could get and I felt like I could train the model during the course of a hack day and get some kind of result to show at the end.
Creating the dataset
The biggest (active) part of this project time-wise was collecting the images and making them look roughly uniform by cropping them to the same size and having the subject roughly in the same spot in each one. This was important as when training this model I wanted the network to focus on the subject of the image itself and not, for example, where in the image the subject was, as that would waste precious training time.
I looked through the BHL Flickr and decided to use scientific illustrations of birds. I love birds and the style of scientific illustrations, and I was excited at the idea of the new and weird birds my StyleGAN model might produce! I chose illustrations in which the bird was in the centre of the page and the rest of the space was ideally empty. Due to time constraints, I did allow some illustrations with more than one bird in them and some with background features such as grass. This wasn’t ideal for the above reasons, but I decided to give it a go and see what the results would be like anyway.
Training my model
Once I had my dataset ready to go, I used Derrick’s Google Colab notebook to train my model. In order to run StyleGAN2-ADA, you need “1–8 high-end NVIDIA GPUs with at least 12 GB of GPU memory,” which is more memory than most home computers have, so using something like Google Colab where you’re essentially borrowing their super powerful computers means that even an average joe like me can have a go at running memory-heavy code like this and it’s free for the basic version, which is enough for this purpose! The other benefit is that you can connect it to a Google Drive for storage. This was useful in my case because I needed to store 1000 images to train my model with, and the model outputs large images itself.
The code I was using was set to train a new StyleGAN model based off of one that had already been trained for many hours. A model in Machine Learning is essentially a file that has been trained to recognise patterns. Using a pre-trained model as a starting point like this is called transfer-learning, and it can drastically speed up the time it takes for your StyleGAN to learn from your images and start producing something which looks good. The model you’re using as a base for training a StyleGAN doesn’t have to be similar in subject matter to the images you will be training it with, and in this case I used the ffhq model by NVlabs, which is trained from a set of images of faces. This meant that when it started to train, it initially produced images which looked like a mixture of faces and my illustrations.
After each few iterations through the dataset, the model produced a 7 by 4 grid of sample images so I could see how it was getting on. The image below is the first sample image I got, which shows the original faces mixing with some features of my images!
As time went on, the shapes it was producing started looking more and more bird-like. By the end of the hack day, I had only managed to train my model for about 3–4 hours. But, by that time I was starting to see the faces disappear and bird features appear. Pretty impressive for just a few hours, but also still pretty abstract!
I wanted to see whether the model could produce more recognisable bird images, so I continued to train it after the hack day ended and the images started to look better and better…
I left the model training over the weekend and the fakes actually did start looking like they could be illustrations of birds in many cases! Below is the final set of sample images I produced with my model. I would guess I spent about 40 hours training to get to this point.
Okay, so there are a few images in there that maybe wouldn’t fool a human into thinking they were real illustrations of birds… ahem…
But the majority kind of do look like birds. I’m pretty sure if you didn’t know this was generated by a Neural Network you would have no trouble believing it was a real bird illustration.
Here are a few more images produced which I think look very convincing.
This is so impressive considering I only had 1000 images and it was around a couple of days worth of training!
Creating interpolation animations
While learning from the given dataset, a StyleGAN will look for patterns or “features” in the images and organise them into a multi-dimensional graph. This is called latent space.
When I talk about “features,” in the simplest terms I mean things like brown vs white feathers, for example. So, brown feathers might be point A in this space and white feathers is point B. If you move from point A to point B and produce images every so often in between, you would see the feathers turn from brown to white over time. This is called interpolation.
Using another of Derrick Schultz’s Google Colab notebooks, I generated a couple of videos of interpolations through my model. I used the “circular” interpolation which gives a nice “walk” through a few diverse points in the latent space and you see different birds morph into one another. Here is one of the outcomes:
As it “walks” through the latent space, different features morph from one to another.
The fact that with StyleGAN2-ADA you can use such a limited dataset and need so little time, and that you can use a free service like Google Colab to do the training and it still produces results like this is incredibly impressive and opens the word of Machine Learning and StyleGANs to many more people than previously. It is also very exciting to see and think about all of the creative, as well as practical, applications of this technology. Practical applications of GANs include: “image-to-image translation,” which might involve converting a photograph taken in the day time to night time for example, so as a sort of AI-led photoshop; being able to generate higher resolution versions of images; or even generating missing parts of an incomplete image.
I have to obviously shout out to Derrick Schultz for his Colab notebooks and tutorials which also helped lower the barrier. It meant that I was able to get training much quicker than I would have otherwise but the code is still there for me to learn from, amend and dissect. I encourage you to check out his YouTube channel if you’re interested in learning more about training StyleGANs. He also runs regular courses on Machine Learning for creative purposes and you can find out more about them on his website.
I also encourage you to have a look through the Biodiversity Heritage Library’s 250,000+ images on their Flickr. It’s an impressive and incredibly interesting collection of images from the natural world, organised into different categories. It’s the perfect place to find datasets to use in a StyleGAN model or just for inspiration!
I’m looking forward to using this technology with more datasets and seeing what new and weird flora, fauna and other images I might be able to create!