Of Seals, Dogs and Dolphins with Python
Every now and then I have come across prints of image collages or ‘mosaics’. Often it’s been ads, say a Greenpeace collage of small images portraying poorly treated puppies - that is, if you’re able to squint your eyes real hard - making up for the overall image of a happy dolphin, or so.
Anyhow! - I’ve almost always found these collages to be aesthetically displeasing. At the same time they somehow caught and kept my attention. The last time I saw one of them I wondered how one would go about creating one of these. I figured some kind of a nearest neighbor search would come in handy.
Here we go!
Method
The way I went about it is conceptually very simple.
First, the reference collection of component images (e.g. 10,000 images of puppies) should be preprocessed. On the one hand, they should be scaled down to a suitable size, e.g. 20x20 pixels. On the other hand, an index over these images is created. On a high level the index should help answer to the following query fast:
Given an actual pixel in the target image (e.g. the eye of the dolphin), which image from the reference collection (e.g. which puppy) works best as a substitute?
This question could be framed as that of a minimal distance, i.e.
Between which image in the reference collection and the pixel of the target image is the distance least?
This, in turn, bares the question of how one might define a notion of distance. My impression is that both the human eye as well as human perception more generally are serious business. I have no clue whether
- within a channel, average values capture the essence
- in how far channels interact with each other
- the euclidean distance between pixel/patches captures the notion of similitude
when it comes to visual human perception.
Nevertheless, I tried this and it seems to work okay. Slightly more formally, the distance between a reference image \(I^{ref}\) and a target image pixel \(I_{ij}^{target}\) looks as follows:
$$ d(I^{ref}, I_{ij}^{target}) = (\sum_{c \in \{r, g, b\}} (I_{ijc}^{target} - avg(I_c^{ref}))^2)^{1/2} $$
Given this notion of a distance, an index on the reference images can be built.
Given that our queries and image representations only rely on three dimensions, an exact nearest neighbor method, such as k-d trees, certainly works just fine. I opted to use annoy, a library for approximate nearest neighbor retrieval. Just like k-d trees, annoy relies on building a forest of trees splitting the input space by hyperplanes.
Once the reference images can be efficiently queried, one must only define a way to use the reference images to create a target image. The idea here is to create a 1-to-1 correspondence between an individual pixel in the target image and an entire reference image. Based on aforementioned distance, we simply substitute every target image by the most suitable reference image.
Putting it all together, the process looks as follows:
- Pre-processing of reference images
for every image in refrence collection:
downscale the image
compute average value per rgb channel
insert image into index based on average rgb channel values
- Building of collage imitating target image
for pixel in target image:
pick rgb values of this pixel
find reference image in index that is closest to these rgb values
substitute pixel by retrieved reference image
One might want to extend the approach in many ways. For instance, one might want to impose a reference image specific budget. In other words, one might want that a specific reference is used at most \(k\) times in the collage.
Similarly, one might want to ensure that every image from the reference collection has been used at least once.
Usage
I condensed this into a small python package and CLI utility called pycollage
. Its code
is public on GitHub, see here.
One can install it via PyPI
$ pip install pycollage
or via conda-forge
$ conda install pycollage -c conda-forge
It comes with a fairly simple interface:
$ pycollage process-collection /users/Anne/image_collection
$ pycollage build /users/Anne/index /users/Anne/target_image.png
where e.g. users/Anne/image_collection
is a directory with many images of
puppies and e.g. users/Anne/target_image.png
is the path of the dolphin image.
The readme comes with more information.
Example
Since I’m not Greenpeace and since I like seals slightly better than dolphins, here is a seal with a heart-shaped nose made up of dogs:
https://photos.app.goo.gl/DKbrLQo4p6QaG4Q96
Update 22/10/10
It seems as though it would make more sense to use the CIELAB space, rather than rgb for retrieval.