Why is it necessary?
AD methods can automatically identify defective or atypical inputs or outputs in industrial or other processes; furthermore, AL methods can point out the location of the anomaly. Unsupervised methods are particularly valuable as they require only normal (non-anomalous) data as training examples. Discrepancy Scaling is a computationally light unsupervised AD and AL method that nevertheless provides good accuracy.
How does it work?
Discrepancy Scaling builds upon the Student-Teacher Feature Pyramid Matching (STFPM) [1] method for AD and AL. In STFPM, two convolutional neural networks (CNNs) of identical architecture are used. One CNN, the teacher, is pre-trained and frozen, while the other, known as the student, is trained to mimic the activations of the teacher on normal data. When the model is shown an anomalous image in inference, the student is unable to mimic the teacher in the activations that correspond to the anomalous region; this information is used to determine both the anomalousness of the image as a whole and the location and extent of the anomaly.
We have demonstrated that STFPM’s way of calculating student-teacher discrepancies leaves information on the table. Namely, when calculated on normal data, each element of the array of discrepancies may have a non-zero mean and different elements may have different variances. In Discrepancy Scaling, we calculate the mean and standard deviation of each discrepancy array element on normal training data. In inference, we use these statistics to standardize the student-teacher discrepancy values, producing more accurate anomaly scores.
[1] Guodong Wang et al. “Student-Teacher Feature Pyramid Matching for Anomaly Detection”, arXiv:2103.04257, 2021.