# Sketch of the curriculum schedule. Full code in the repo.
stages = [
{"task": "quadrant_localization", "epochs": 30, "data": "quadrant_labels"},
{"task": "tooth_enumeration", "epochs": 40, "data": "tooth_labels"},
{"task": "disease_diagnosis", "epochs": 60, "data": "disease_labels"},
]Curriculum Learning for Dental Disease Detection
A three-stage YOLOv8 pipeline on the DENTEX 2023 dataset, and an honest negative result.
Summary. A three-stage curriculum learning framework (quadrant localization, then tooth enumeration, then disease diagnosis) on the DENTEX 2023 panoramic X-ray dataset (2,032 hierarchically labeled images) using YOLOv8m segmentation models. Against a matched single-stage baseline, the curriculum approach achieved mAP@0.5 of 0.394 versus 0.417, a small but real regression. The empirical takeaway is that on this size of dataset, additional weakly-related supervision didn’t help fine-grained detection. Class imbalance was the dominant limitation, not the training schedule.
This was my final project for DSAN 6600, Neural Networks & Advanced Deep Learning at Georgetown (Spring 2026).
The question
Curriculum learning, training models on easier sub-tasks before harder ones, has a strong intuitive appeal, especially for hierarchical labels. Dental panoramic X-rays are a near-perfect test bed. Every tooth lives in a quadrant, has a number, and may or may not have one of several conditions. Does staging the supervision in that order actually help fine-grained disease detection on a small medical dataset?
Approach
[TODO 1 to 2 paragraphs on data prep, augmentation, model config. Pull from the report. Keep it concrete around image sizes, batch size, loss, and schedule.]
Results
[TODO drop in the table comparing curriculum versus single-stage baseline across mAP@0.5, precision, recall, and per-class F1. If the predictions are saved as CSV, render the table here from a pd.read_csv() cell so it stays in sync with the source data.]
What I learned
The interesting part of this project wasn’t the architecture. It was sitting with a result that didn’t go the way I expected and figuring out why. Two things stood out.
- The class distribution was doing more work than the schedule. A small handful of disease classes dominated. A curriculum that doesn’t address that imbalance just front-loads the easy stages without solving the actual problem.
- “More supervision” is not a free lunch on small datasets. Each curriculum stage adds variance from its own labels. If those labels are only weakly related to the downstream task, you can pay the variance cost without earning the bias reduction.
What I’d do differently
[TODO for example focal loss or class-rebalanced sampling, pretraining on a related larger dataset, ablating which curriculum stages help versus hurt.]