Choosing Photos

I like to take wildlife photographs. I find it meditative, and over many years I've photographed some interesting and uncommon animals. Some highlights so far include:

For each of these trips I accumulated thousands of photographs, and had to spend a lot of time "deduplicating" and picking favourites. To keep things fun, I built a CLI called Badger to speed up this work.

The algorithm is simple:

I detected image-blur using the variance of a photo's laplacian. This value has no global meaning, but for two very similar images it reliably indicates which of two is blurrier.

To group similar images together, I clustered them using their timestamps and DBSCAN. DBSCAN creates as many clusters as needed, and allow one-off photos to be added to a noise cluster.

After grouping & labeling photos I manually pick favourites in each cluster, using the blur label to more quickly find "keepers".

Badger has a few other features; it handles video (which I don't check for blurriness) and raw images associated with JPG files output by my camera.

Does it work? Yes, with caveats. The library I use computes blur slowly (1 - 2s per image) and it still takes a little time to work through the remaining images manually. The clustering method is imperfect; in future I'll research a better method of clustering similar images. But, on the whole, I've sorted hundreds of thousands of images using this system and for most day-trips I have images sorted within thirty minutes.

Takeaway Points

Addendum, October 2024

Years later, I'm still using this project. It's sorted hundreds of photo albums for me and cuts my processing time down to roughly a quarter. It's due a rewrite (it's a little buggy) but the fundementals are good.