Do We Train on Test Data? The Impact of Near-Duplicates on License Plate Recognition

04/10/2023
by   Rayson Laroca, et al.
0

This work draws attention to the large fraction of near-duplicates in the training and test sets of datasets widely adopted in License Plate Recognition (LPR) research. These duplicates refer to images that, although different, show the same license plate. Our experiments, conducted on the two most popular datasets in the field, show a substantial decrease in recognition rate when six well-known models are trained and tested under fair splits, that is, in the absence of duplicates in the training and test sets. Moreover, in one of the datasets, the ranking of models changed considerably when they were trained and tested under duplicate-free splits. These findings suggest that such duplicates have significantly biased the evaluation and development of deep learning-based models for LPR. The list of near-duplicates we have found and proposals for fair splits are publicly available for further research at https://raysonlaroca.github.io/supp/lpr-train-on-test/

READ FULL TEXT

page 1

page 3

page 5

page 6

research
02/01/2019

Do we train on test data? Purging CIFAR of near-duplicates

We find that 3.3 sets, respectively, have duplicates in the training set...
research
06/22/2022

Independent evaluation of state-of-the-art deep networks for mammography

Deep neural models have shown remarkable performance in image recognitio...
research
07/14/2021

Deduplicating Training Data Makes Language Models Better

We find that existing language modeling datasets contain many near-dupli...
research
02/26/2022

Visual Speech Recognition for Multiple Languages in the Wild

Visual speech recognition (VSR) aims to recognise the content of speech ...
research
11/06/2019

Fair Meta-Learning: Learning How to Learn Fairly

Data sets for fairness relevant tasks can lack examples or be biased acc...
research
11/19/2020

Sentiment Classification in Bangla Textual Content: A Comparative Study

Sentiment analysis has been widely used to understand our views on socia...
research
03/03/2023

Benchmarking White Blood Cell Classification Under Domain Shift

Recognizing the types of white blood cells (WBCs) in microscopic images ...

Please sign up or login with your details

Forgot password? Click here to reset