Thursday Apr 10, 2025

Computer Vision - Reconstruction-Free Anomaly Detection with Diffusion Models via Direct Latent Likelihood Evaluation

Hey PaperLedge learning crew, Ernis here! Get ready to dive into some seriously cool tech that's helping computers spot things that are... well, just not quite right. Today, we're unpacking a paper about anomaly detection using something called diffusion models.

Now, diffusion models might sound like something out of a sci-fi movie, but think of them like this: Imagine you have a perfectly clear photo. Then, you slowly add more and more noise – like static on an old TV – until it's completely unrecognisable. That's the "diffusion" part. A diffusion model is then trained to reverse that process - starting from the noisy image and carefully removing the noise step by step to get back to the original, clear picture.

These models are amazing at understanding the normal, everyday stuff they're trained on. So, what happens when you show them something that's not normal – something anomalous? That's where the anomaly detection magic happens.

The old way of doing this, called reconstruction-based anomaly detection, was kind of clunky. It involved taking the anomalous image, adding a bunch of noise, and then having the diffusion model try to "reconstruct" the original. The idea was that if the model struggled to rebuild the image perfectly, it was probably because something was wrong. The bigger the difference between the original and the reconstructed image (the "reconstruction error"), the more likely it was an anomaly.

But, there were two big problems with this: First, you had to be super careful about how much noise you added. Too little, and you wouldn't get a good reconstruction. Too much, and the model would just give up. Second, it took a lot of computational power because the model had to run the reconstruction process over and over for each image. Imagine having to rewind and replay a VHS tape (remember those?) ten times just to check if something looks off. Slow, right?

"The old way was like trying to fix a broken vase by smashing it into even smaller pieces and then gluing it back together. It's messy, time-consuming, and you might not even get a perfect result."

This new research paper comes up with a much smarter approach. Instead of trying to rebuild the image, they go straight to the source: the latent variables. Think of latent variables as the hidden DNA of an image – the core information that defines what it is, but in a compressed, abstract form. Every image can be represented by a list of numbers, and these numbers are normally arranged in a standard way.

So, instead of reconstructing, they take the anomalous image, add a little bit of noise (only 2-5 steps!), and then figure out what those latent variables are. Then, they check to see if those variables "fit" the normal distribution. It's like checking if someone's DNA matches the standard human genome. If the latent variables are way outside the norm, that's a big red flag – anomaly detected!

This is super clever because it skips the whole reconstruction process, making it much faster. And, because it focuses on the underlying structure of the image, it's also incredibly accurate. In fact, they got state-of-the-art results on a benchmark dataset called MVTecAD, achieving an AUC of 0.991 at 15 FPS. That means they were able to detect anomalies with amazing accuracy and at a very fast speed.

So, why does this matter? Well, imagine you're building self-driving cars. You need to be able to quickly and accurately detect anything unusual on the road – a pedestrian stepping out, a fallen object, etc. Or, think about manufacturing. You want to be able to spot defects in products before they ship to customers. This technology could also be used for medical imaging, fraud detection, and all sorts of other applications where spotting something out of the ordinary is critical.

Here are some things that pop into my mind:

Could this approach be used to detect anomalies in other types of data, like audio or text?
How can this technology be made even more robust to adversarial attacks, where someone intentionally tries to fool the system?
What are the ethical implications of using AI to detect anomalies, and how can we ensure that it's used responsibly?

This is just the tip of the iceberg, learning crew! But hopefully, this gives you a good sense of how diffusion models can be used for anomaly detection and why this research is so exciting. Until next time, keep learning and stay curious!

Credit to Paper authors: Shunsuke Sakai, Tatsuhito Hasegawa

Comment (0)

No comments yet. Be the first to say something!