Evaluating Generatively Synthesized Diabetic Retinopathy Imagery
Publicly available data for the training of diabetic retinopathy classifiers is unbalanced. Generative adversarial networks can successfully synthesize retinal fundus imagery. In order for synthetic imagery to be of benefit, images need to be of high quality and diverse. Presently, several evaluation metrics are used to evaluate the quality and diversity of imagery synthesized from generative adversarial networks. This work contributes, the first of its kind, empirical assessment for the suitability of evaluation metrics used in the literature for the evaluation of generative adversarial networks for generating retinal fundus images in the context of diabetic retinopathy. Frechet Inception Distance, Peak Signal-to-Noise Ratio and Cosine Distance's capacity to assess the quality and diversity of synthetic proliferative diabetic retionpathy imagery is investigated. A quantitative analysis is performed to enable an improved methodology for selecting the synthetic imagery to be used for augmenting a classifier's training dataset. Results indicate that Frechet Inception Distance is suitable for evaluating the diversity of synthetic imagery, and for identifying if the imagery has features corresponding to its class label. Peak Signal-to-Noise Ratio is suitable for indicating if the synthetic imagery has valid diabetic retinopathy lesions and if its features correspond to its class label. These results demonstrate the importance of performing such empirical evaluation, especially in the context of biomedical domains where utilisation in applied setting is intended.
READ FULL TEXT