Publications Details
RECON Label Quality Report
Eldridge, Bryce D.; Porter-Garcia, Brisa M.
The final quality of any AI/ML system is directly related to the quality of the input data used to train the system. In this case, we are trying to build a reliable image classifier that can correctly identify electrical components in x-ray images. The classification confidence is directly related to the quality of the labels in the training data, which are used in developing the AI/ML classifier. Incorrect or incomplete labels can substantially hinder the performance of the system during the training process, as it tries to compensate for variations that should not exist. Image labels are entered by subject matter experts, and in general can be assumed to be correct. However, this is not a guarantee, so developing ways to measure label quality and help identify or reject bad labels is important, especially as the database continues to grow. Given the current size of the database, a full manual review of each component is not feasible. This report will highlight the current state of the “RECON” x-ray image database and summarize several recent developments to try to help ensure high quality labeling both now and in the future. Questions that we hope to answer with this development include: 1) Are there any components with incorrect labels? 2) Can we suggest labels for components that are marked “Unknown”? 3) What kind of overall confidence do we have in the quality of the existing labels? 4) What systems or procedures can we put in place to maximize label quality?