This Bird Illustration Does Not Exist: Using Machine Learning and BHL Flickr Images to Produce “New” Bird Images

A version of this post was originally published on the Cogapp blog on 30 March 2021.

A set of six images on a grid. Each looks like a scientific illustration of a different bird.

AI generated illustrations of birds created with illustrations from the BHL Flickr using machine learning with a StyleGAN.

I work as a web developer for the agency Cogapp, which is based in Brighton, UK. We create websites and other digital services for museums, art galleries, archives and the like, but every couple of months we hold a “hack day.” A hack day involves spending a day working on projects which generally revolve around a particular theme and which ideally we can do in one day. This allows us to get the creative juices flowing and to further our agenda of innovation.

The theme this past hack day at Cogapp was “Museum APIs,” but the looser interpretation was that we were to use open data provided by museums in our projects. I was inspired by the Biodiversity Heritage Library’s Flickr, which is a massive collection of free-to-use scientific images. I immediately knew I wanted to utilise this resource as I love scientific illustrations of nature.

I’ve also had an interest in Machine Learning (ML) for a while, and I recently discovered Derrick Schultz and his YouTube channel Artificial Images. Here, he publishes videos of his Machine Learning courses which he runs for people who want to use ML for creative purposes.

I watched Derrick’s tutorials on training a StyleGAN Neural Network and the things he was saying made a degree of sense to me, plus he had published a handy Google Colab notebook with step-by-step code, so I decided it was something I might be able to have a go at.

What is a StyleGAN?

StyleGAN stands for Style Generative Adversarial Network and was originally developed by NVLabs. The purpose of a StyleGAN is to train and learn from input images and produce its own “fake” images back. You may have seen the site ThisPersonDoesNotExist.com, which features fake photos of people which a StyleGAN has produced. This network has been trained on thousands of images of people’s faces and has in turn learned to produce new images which (for the most part) look like real photographs of people’s faces.

A set of three photos of faces. The first two photos look like real photos, the third one however looks like they might have a headscarf or hat on but when you look at it you realise it’s more of a weird melty abstract shape and the head is floating in space.

Three images from ThisPersonDoesNotExist.com; two successful, one less so…

A StyleGAN does this by processing through the set of real images it is given numerous times during which it also produces its own images. The “adversarial” part of the name refers to the fact that a StyleGAN actually comprises two Neural Networks which work against each other. The first network produces the fake images and the second will look at those images and try to determine whether it thinks they are “real” images or “fake.” The longer each network trains, the better each gets at its job, and the more likely the images produced will trick the second network and ultimately also humans into thinking they’re real.

StyleGAN has been updated a few times and in 2020 StyleGAN2-ADA was released, which now allows us to train a network on very few images, actually as few as 500–1000 as opposed to 10,000s just a couple of years ago, and it will learn to produce good quality fakes in a very short period of time (a few hours of training). This advancement allowed me to feel like I could attempt to train my own, as 1000 images was something I could get and I felt like I could train the model during the course of a hack day and get some kind of result to show at the end.

Creating the dataset

The biggest (active) part of this project time-wise was collecting the images and making them look roughly uniform by cropping them to the same size and having the subject roughly in the same spot in each one. This was important as when training this model I wanted the network to focus on the subject of the image itself and not, for example, where in the image the subject was, as that would waste precious training time.

I looked through the BHL Flickr and decided to use scientific illustrations of birds. I love birds and the style of scientific illustrations, and I was excited at the idea of the new and weird birds my StyleGAN model might produce! I chose illustrations in which the bird was in the centre of the page and the rest of the space was ideally empty. Due to time constraints, I did allow some illustrations with more than one bird in them and some with background features such as grass. This wasn’t ideal for the above reasons, but I decided to give it a go and see what the results would be like anyway.

A contact sheet type layout with 28 real illustrations of different birds of different shapes, colours and positions. Some perching on branches, some on grass.

A contact sheet of “reals” or real images taken from my dataset from the BHL Flickr. This is produced by the StyleGAN at the beginning of training.

Training my model

Once I had my dataset ready to go, I used Derrick’s Google Colab notebook to train my model. In order to run StyleGAN2-ADA, you need “1–8 high-end NVIDIA GPUs with at least 12 GB of GPU memory,” which is more memory than most home computers have, so using something like Google Colab where you’re essentially borrowing their super powerful computers means that even an average joe like me can have a go at running memory-heavy code like this and it’s free for the basic version, which is enough for this purpose! The other benefit is that you can connect it to a Google Drive for storage. This was useful in my case because I needed to store 1000 images to train my model with, and the model outputs large images itself.

The code I was using was set to train a new StyleGAN model based off of one that had already been trained for many hours. A model in Machine Learning is essentially a file that has been trained to recognise patterns. Using a pre-trained model as a starting point like this is called transfer-learning, and it can drastically speed up the time it takes for your StyleGAN to learn from your images and start producing something which looks good. The model you’re using as a base for training a StyleGAN doesn’t have to be similar in subject matter to the images you will be training it with, and in this case I used the ffhq model by NVlabs, which is trained from a set of images of faces. This meant that when it started to train, it initially produced images which looked like a mixture of faces and my illustrations.

After each few iterations through the dataset, the model produced a 7 by 4 grid of sample images so I could see how it was getting on. The image below is the first sample image I got, which shows the original faces mixing with some features of my images!

A contact sheet style set of 28 images. You can recognise faces in most of them, but the faces are washed out in colour and are starting to morph into more abstract shapes. Some features are being exaggerated by the StyleGAN as something which might look like a feature on a bird image, like for example curly hair texture turning into almost like branches on a tree. Other features are being washed out or removed over time as they are not useful for producing images of birds.

Totally not terrifying image produced by the StyleGAN model after one of the first cycles of training.

As time went on, the shapes it was producing started looking more and more bird-like. By the end of the hack day, I had only managed to train my model for about 3–4 hours. But, by that time I was starting to see the faces disappear and bird features appear. Pretty impressive for just a few hours, but also still pretty abstract!

The same contact sheet style 28 images. You can now see almost no face-like features at all. At this stage the images look very abstract. You can see the background colour is a pale beige like on a lot of the illustrations of birds. You can see the model learning that the birds are generally in the centre of the image. But we have abstract blobby shapes in the foreground at the moment, rather than anything that looks like birds. The colours of these blobs are starting to look bird-like, though.

The fakes produced by the end of the hack day in around 3–4 hours of training.

I wanted to see whether the model could produce more recognisable bird images, so I continued to train it after the hack day ended and the images started to look better and better…

The contact sheet of images. Here the foreground shapes in a lot of cases are starting to look bird-shaped! There is no detail like feathers or eyes or legs, but the outlines look like birds in about half of the images. Some are still abstract, though, like just some wavy lines.

The fakes produced after about 3–4 hours more training.

I left the model training over the weekend and the fakes actually did start looking like they could be illustrations of birds in many cases! Below is the final set of sample images I produced with my model. I would guess I spent about 40 hours training to get to this point.

The contact sheet of images. Here if you don’t look closely you think it’s just a collection of real illustrations of birds. Most of them look like birds and we now have details like legs and eyes. In some cases a bird is a bit warped like it’s melting, and in other cases a bird might have three legs or eyes. But for the most part it looks like real bird illustrations.

The final set of sample images generated by the StyleGAN after around 40 hours of training.

Okay, so there are a few images in there that maybe wouldn’t fool a human into thinking they were real illustrations of birds… ahem…

Two screenshots of images the StyleGAN has produced where the birds look a weird shape almost like they’re melting. The legs and tree branches and textures of the birds look okay, but they have many eyes and their shape is all wrong.

A single screenshot of a bird image in which the bird has two legs which look real but the body is tubular and upright unlike any bird I’ve seen. It also has no eyes and one beak on one side of its head and another beak on the other side.

But the majority kind of do look like birds. I’m pretty sure if you didn’t know this was generated by a Neural Network you would have no trouble believing it was a real bird illustration.

A single screenshot of an image produced by the StyleGAN. This looks like a real bird illustration. Its body is almost duck-like and it has two eyes and a beak and its feathers look like a real feather texture. It has two legs and it’s standing on some grass. It does look very happy and alert though…

It does maybe look overly happy for a bird though?

Here are a few more images produced which I think look very convincing.

A set of six images on a grid. Each looks like a scientific illustration of a different bird.

AI generated illustrations of birds created with illustrations from the BHL Flickr using machine learning with a StyleGAN.

This is so impressive considering I only had 1000 images and it was around a couple of days worth of training!

Creating interpolation animations

While learning from the given dataset, a StyleGAN will look for patterns or “features” in the images and organise them into a multi-dimensional graph. This is called latent space.

When I talk about “features,” in the simplest terms I mean things like brown vs white feathers, for example. So, brown feathers might be point A in this space and white feathers is point B. If you move from point A to point B and produce images every so often in between, you would see the feathers turn from brown to white over time. This is called interpolation.

Using another of Derrick Schultz’s Google Colab notebooks, I generated a couple of videos of interpolations through my model. I used the “circular” interpolation which gives a nice “walk” through a few diverse points in the latent space and you see different birds morph into one another. Here is one of the outcomes:

As it “walks” through the latent space, different features morph from one to another.

Conclusion

The fact that with StyleGAN2-ADA you can use such a limited dataset and need so little time, and that you can use a free service like Google Colab to do the training and it still produces results like this is incredibly impressive and opens the word of Machine Learning and StyleGANs to many more people than previously. It is also very exciting to see and think about all of the creative, as well as practical, applications of this technology. Practical applications of GANs include: “image-to-image translation,” which might involve converting a photograph taken in the day time to night time for example, so as a sort of AI-led photoshop; being able to generate higher resolution versions of images; or even generating missing parts of an incomplete image.

I have to obviously shout out to Derrick Schultz for his Colab notebooks and tutorials which also helped lower the barrier. It meant that I was able to get training much quicker than I would have otherwise but the code is still there for me to learn from, amend and dissect. I encourage you to check out his YouTube channel if you’re interested in learning more about training StyleGANs. He also runs regular courses on Machine Learning for creative purposes and you can find out more about them on his website.

I also encourage you to have a look through the Biodiversity Heritage Library’s 250,000+ images on their Flickr. It’s an impressive and incredibly interesting collection of images from the natural world, organised into different categories. It’s the perfect place to find datasets to use in a StyleGAN model or just for inspiration!

I’m looking forward to using this technology with more datasets and seeing what new and weird flora, fauna and other images I might be able to create!

A woman with short black hair, blonde tips, in a yellow shirt
Written by

Web Developer @ Cogapp 👩‍💻 She/her