Generating Sex- and Age-Transformed Retinal Images Using Diffusion Models
Abstract
Purpose :
This study evaluates the ability of a conditioned diffusion model to generate synthetic retinal images simulating transformations in sex (female to male) and age (under 40 to over 40). The generated images were assessed for clinical relevance, realism, and structural fidelity using a classifier, human evaluation, and quantitative metrics.
Methods :
Synthetic 256×256 retinal images were created using a conditioned diffusion model. Two datasets were generated: 128 images for sex transformation (female to male) and 256 for age transformation (under 40 to over 40). Validation was performed using an internally developed ConvNeXt-based classifier with multitask learning, trained on 620,864 images for sex and 1,220,134 images for age prediction. Baseline accuracy on real images was 96.9% for detecting female images and 86.7% for under 40. A blinded human evaluation test was conducted with three ophthalmologists and three other physicians, who reviewed 50 randomized images (25 synthetic, 25 real) to identify synthetic images. Quantitative evaluation included Fréchet Inception Distance (FID) and Structural Similarity Index Measure (SSIM).
Results :
The classifier evaluation indicated that 88.3% of synthetic male images were correctly classified as male (95% CI: 82.7-93.9). Similarly, 69.1% of synthetic aged-over-40 images were correctly identified as over 40 (95% CI: 63.5-74.8). Human evaluation yielded 57% overall accuracy, with ophthalmologists at 56.7% and other physicians at 57.4%, underscoring the difficulty of distinguishing synthetic from real images. Quantitative metrics showed high realism (FID: 10.52) and strong structural fidelity (mean SSIM: 0.84).
Conclusions :
Diffusion models show strong potential for generating realistic retinal images with sex and age transformations, as supported by classifier, human evaluation, and quantitative metrics. Enhancing conditioning processes and incorporating additional clinical and demographic variables could further improve image fidelity and variability. Beyond sex- and age-related changes, this approach may reveal retinal alterations linked to clinical features (e.g., blood pressure increases), providing insights into subtle phenotypic variations that existing feature-based knowledge may not fully capture.
This abstract was presented at the 2025 ARVO Annual Meeting, held in Salt Lake City, Utah, May 4-8, 2025.