<?xml version="1.0" encoding="UTF-8"?>
<rss  xmlns:atom="http://www.w3.org/2005/Atom" 
      xmlns:media="http://search.yahoo.com/mrss/" 
      xmlns:content="http://purl.org/rss/1.0/modules/content/" 
      xmlns:dc="http://purl.org/dc/elements/1.1/" 
      version="2.0">
<channel>
<title>Bryce Grover</title>
<link>https://brycegrover.com/projects/</link>
<atom:link href="https://brycegrover.com/projects/index.xml" rel="self" type="application/rss+xml"/>
<description>Data scientist. Georgetown M.S., Chapman B.S. ML, computer vision, and statistical modeling.</description>
<generator>quarto-1.7.32</generator>
<lastBuildDate>Wed, 15 Apr 2026 04:00:00 GMT</lastBuildDate>
<item>
  <title>Curriculum Learning for Dental Disease Detection</title>
  <link>https://brycegrover.com/projects/dental-curriculum.html</link>
  <description><![CDATA[ 




<blockquote class="blockquote">
<p><strong>Summary.</strong> A three-stage curriculum learning framework (quadrant localization, then tooth enumeration, then disease diagnosis) on the DENTEX 2023 panoramic X-ray dataset (2,032 hierarchically labeled images) using YOLOv8m segmentation models. Against a matched single-stage baseline, the curriculum approach achieved mAP@0.5 of 0.394 versus 0.417, a small but real regression. The empirical takeaway is that on this size of dataset, additional weakly-related supervision didn’t help fine-grained detection. Class imbalance was the dominant limitation, not the training schedule.</p>
</blockquote>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note
</div>
</div>
<div class="callout-body-container callout-body">
<p>This was my final project for <strong>DSAN 6600, Neural Networks &amp; Advanced Deep Learning</strong> at Georgetown (Spring 2026).</p>
</div>
</div>
<section id="the-question" class="level2">
<h2 class="anchored" data-anchor-id="the-question">The question</h2>
<p>Curriculum learning, training models on easier sub-tasks before harder ones, has a strong intuitive appeal, especially for hierarchical labels. Dental panoramic X-rays are a near-perfect test bed. Every tooth lives in a quadrant, has a number, and may or may not have one of several conditions. Does staging the supervision in that order actually help fine-grained disease detection on a small medical dataset?</p>
</section>
<section id="approach" class="level2">
<h2 class="anchored" data-anchor-id="approach">Approach</h2>
<p>[TODO 1 to 2 paragraphs on data prep, augmentation, model config. Pull from the report. Keep it concrete around image sizes, batch size, loss, and schedule.]</p>
<div id="training-config" class="cell" data-execution_count="1">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Sketch of the curriculum schedule. Full code in the repo.</span></span>
<span id="cb1-2">stages <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [</span>
<span id="cb1-3">    {<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"task"</span>: <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"quadrant_localization"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"epochs"</span>: <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">30</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"data"</span>: <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"quadrant_labels"</span>},</span>
<span id="cb1-4">    {<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"task"</span>: <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"tooth_enumeration"</span>,     <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"epochs"</span>: <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">40</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"data"</span>: <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"tooth_labels"</span>},</span>
<span id="cb1-5">    {<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"task"</span>: <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"disease_diagnosis"</span>,     <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"epochs"</span>: <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">60</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"data"</span>: <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"disease_labels"</span>},</span>
<span id="cb1-6">]</span></code></pre></div>
</div>
</section>
<section id="results" class="level2">
<h2 class="anchored" data-anchor-id="results">Results</h2>
<p>[TODO drop in the table comparing curriculum versus single-stage baseline across mAP@0.5, precision, recall, and per-class F1. If the predictions are saved as CSV, render the table here from a <code>pd.read_csv()</code> cell so it stays in sync with the source data.]</p>
</section>
<section id="what-i-learned" class="level2">
<h2 class="anchored" data-anchor-id="what-i-learned">What I learned</h2>
<p>The interesting part of this project wasn’t the architecture. It was sitting with a result that didn’t go the way I expected and figuring out <em>why</em>. Two things stood out.</p>
<ol type="1">
<li><strong>The class distribution was doing more work than the schedule.</strong> A small handful of disease classes dominated. A curriculum that doesn’t address that imbalance just front-loads the easy stages without solving the actual problem.</li>
<li><strong>“More supervision” is not a free lunch on small datasets.</strong> Each curriculum stage adds variance from its own labels. If those labels are only weakly related to the downstream task, you can pay the variance cost without earning the bias reduction.</li>
</ol>
</section>
<section id="what-id-do-differently" class="level2">
<h2 class="anchored" data-anchor-id="what-id-do-differently">What I’d do differently</h2>
<p>[TODO for example focal loss or class-rebalanced sampling, pretraining on a related larger dataset, ablating which curriculum stages help versus hurt.]</p>
</section>
<section id="code" class="level2">
<h2 class="anchored" data-anchor-id="code">Code</h2>
<p><a href="https://github.com/brycegrover/%5BTODO-repo-name%5D">Repository on GitHub</a></p>


</section>

 ]]></description>
  <category>computer vision</category>
  <category>deep learning</category>
  <category>medical imaging</category>
  <category>YOLOv8</category>
  <category>PyTorch</category>
  <guid>https://brycegrover.com/projects/dental-curriculum.html</guid>
  <pubDate>Wed, 15 Apr 2026 04:00:00 GMT</pubDate>
  <media:content url="https://brycegrover.com/assets/projects/dental-curriculum.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>Predictive Modeling of U.S. Oral Health Outcomes</title>
  <link>https://brycegrover.com/projects/nhanes-oral-health.html</link>
  <description><![CDATA[ 




<blockquote class="blockquote">
<p><strong>Summary.</strong> With a team of three, I led the shallow-learning analysis on NHANES 2017–2018 (n=5,265 adults), benchmarking logistic regression, random forests, and XGBoost across two binary classification tasks and one regression task. Best models hit a 5-fold CV ROC-AUC of 0.849 (self-rated oral health) and 0.844 (clinician-recommended care). A two-stage regression cut DMFT mean absolute error from 6.98 to 4.67 teeth (33%) using only socioeconomic predictors.</p>
</blockquote>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note
</div>
</div>
<div class="callout-body-container callout-body">
<p>DSAN 5300, Statistical Learning, Spring 2026. I owned the data preprocessing pipeline and co-authored the manuscript.</p>
</div>
</div>
<section id="the-setup" class="level2">
<h2 class="anchored" data-anchor-id="the-setup">The setup</h2>
<p>[TODO 1 paragraph framing. Why NHANES, why these three tasks, what makes oral-health prediction interesting from a public-health standpoint. The economic angle (predicting need for care from socioeconomic features alone) is the strongest hook.]</p>
</section>
<section id="data-and-preprocessing" class="level2">
<h2 class="anchored" data-anchor-id="data-and-preprocessing">Data and preprocessing</h2>
<p>[TODO describe the merged NHANES tables (oral exam, demographics, SES), the imputation strategy, and the train/test splitting decisions. If you can render a sample DataFrame here it’s a great signal of the data wrangling work.]</p>
</section>
<section id="models" class="level2">
<h2 class="anchored" data-anchor-id="models">Models</h2>
<div id="model-grid" class="cell" data-execution_count="1">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># The three model families benchmarked across all three tasks.</span></span>
<span id="cb1-2">models <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> {</span>
<span id="cb1-3">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"logistic"</span>:      LogisticRegression(...),</span>
<span id="cb1-4">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"random_forest"</span>: RandomForestClassifier(...),</span>
<span id="cb1-5">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"xgboost"</span>:       XGBClassifier(...),</span>
<span id="cb1-6">}</span></code></pre></div>
</div>
<p>[TODO a few sentences on hyperparameter tuning approach (grid versus random versus Bayesian) and any cross-validation specifics.]</p>
</section>
<section id="results" class="level2">
<h2 class="anchored" data-anchor-id="results">Results</h2>
<p>[TODO a results table, ideally rendered from saved CSV so it stays accurate. Highlight the headline numbers, ROC-AUC of 0.849 and 0.844, and the 33% MAE reduction.]</p>
</section>
<section id="what-surprised-me" class="level2">
<h2 class="anchored" data-anchor-id="what-surprised-me">What surprised me</h2>
<p>[TODO 1 to 2 specific surprises. Examples to consider include which predictors mattered most, where XGBoost beat or didn’t beat logistic regression, and what the residuals told you about who the model misses.]</p>
</section>
<section id="caveats" class="level2">
<h2 class="anchored" data-anchor-id="caveats">Caveats</h2>
<p>A model that predicts oral-health outcomes from socioeconomic predictors is also, implicitly, a model of structural inequity. The accuracy is real, and so is the responsibility to think hard about how a result like this gets used.</p>
</section>
<section id="code" class="level2">
<h2 class="anchored" data-anchor-id="code">Code</h2>
<p><a href="https://github.com/brycegrover/%5BTODO-repo-name%5D">Repository on GitHub</a></p>


</section>

 ]]></description>
  <category>statistical learning</category>
  <category>classification</category>
  <category>regression</category>
  <category>public health</category>
  <category>XGBoost</category>
  <guid>https://brycegrover.com/projects/nhanes-oral-health.html</guid>
  <pubDate>Wed, 01 Apr 2026 04:00:00 GMT</pubDate>
  <media:content url="https://brycegrover.com/assets/projects/nhanes.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>Residential Electricity Demand Forecasting from Weather</title>
  <link>https://brycegrover.com/projects/electricity-demand.html</link>
  <description><![CDATA[ 




<blockquote class="blockquote">
<p><strong>Summary.</strong> Solo end-to-end project. I built a pipeline that integrates DSGrid synthetic residential demand profiles with ERA5 daily weather data via the Open-Meteo API for New York City, producing a multi-year aligned dataset. I benchmarked supervised regression and classification baselines (linear, logistic, gradient boosting) alongside unsupervised methods (PCA, t-SNE, K-means, DBSCAN, hierarchical clustering), and published the full reproducible workflow.</p>
</blockquote>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note
</div>
</div>
<div class="callout-body-container callout-body">
<p>DSAN 5000, Data Science &amp; Analytics, Fall 2025. My first end-to-end project at Georgetown.</p>
</div>
</div>
<section id="why-this-project" class="level2">
<h2 class="anchored" data-anchor-id="why-this-project">Why this project</h2>
<p>[TODO one paragraph on the practical motivation. Utility planning, demand response, the tension between weather-driven peaks and grid stability. Make it about a real-world question, not just “I wanted to learn the pipeline.”]</p>
</section>
<section id="data-engineering" class="level2">
<h2 class="anchored" data-anchor-id="data-engineering">Data engineering</h2>
<p>The unglamorous half of this project was getting two data sources with very different shapes to align cleanly. Synthetic demand profiles at one resolution, ERA5 daily weather at another, all keyed to NYC.</p>
<p>[TODO quick paragraph on the joining strategy, time-zone handling, and missing-data treatment.]</p>
</section>
<section id="models" class="level2">
<h2 class="anchored" data-anchor-id="models">Models</h2>
<p>[TODO brief tour through the supervised baselines, then the unsupervised clustering, and what each was for. The interesting story is usually the <em>contrast</em> between supervised performance and what the clusters revealed about the residuals.]</p>
</section>
<section id="findings" class="level2">
<h2 class="anchored" data-anchor-id="findings">Findings</h2>
<p>[TODO 2 to 3 concrete results. Lead with effect sizes, not p-values.]</p>
</section>
<section id="what-this-project-taught-me" class="level2">
<h2 class="anchored" data-anchor-id="what-this-project-taught-me">What this project taught me</h2>
<p>[TODO honest reflection. This was your first big end-to-end project. What did you do right, what would you do differently now that you know more?]</p>
</section>
<section id="code" class="level2">
<h2 class="anchored" data-anchor-id="code">Code</h2>
<p><a href="https://github.com/brycegrover/%5BTODO-repo-name%5D">Repository on GitHub</a></p>


</section>

 ]]></description>
  <category>time series</category>
  <category>regression</category>
  <category>clustering</category>
  <category>energy</category>
  <category>end-to-end</category>
  <guid>https://brycegrover.com/projects/electricity-demand.html</guid>
  <pubDate>Wed, 10 Dec 2025 05:00:00 GMT</pubDate>
  <media:content url="https://brycegrover.com/assets/projects/electricity.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>Geospatial Crime Pattern Analysis</title>
  <link>https://brycegrover.com/projects/geospatial-crime.html</link>
  <description><![CDATA[ 




<blockquote class="blockquote">
<p><strong>Summary.</strong> Statistical and geospatial analysis of urban crime data, looking for correlations between residential density, commercial zoning, and crime incidence. Built a Python pipeline (Pandas, scikit-learn, GeoPandas) that handles normalization, K-means clustering, and choropleth visualization.</p>
</blockquote>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note
</div>
</div>
<div class="callout-body-container callout-body">
<p>Coursework at Chapman University.</p>
</div>
</div>
<section id="the-question" class="level2">
<h2 class="anchored" data-anchor-id="the-question">The question</h2>
<p>[TODO 1 paragraph framing the question. Be specific. Which city, which crime categories, what years.]</p>
</section>
<section id="pipeline" class="level2">
<h2 class="anchored" data-anchor-id="pipeline">Pipeline</h2>
<p>[TODO walk through the steps. Geocoding and spatial join, the normalization choice (per-capita or per-area), and the clustering decision (why K-means rather than DBSCAN here, or vice versa).]</p>
</section>
<section id="findings" class="level2">
<h2 class="anchored" data-anchor-id="findings">Findings</h2>
<p>[TODO 1 to 2 concrete patterns the analysis revealed, with a choropleth or scatter to back them up.]</p>
</section>
<section id="what-id-do-differently-now" class="level2">
<h2 class="anchored" data-anchor-id="what-id-do-differently-now">What I’d do differently now</h2>
<p>[TODO this is one of the earlier projects. A short reflection on what an upgraded version would look like is a great signal of growth.]</p>
</section>
<section id="code" class="level2">
<h2 class="anchored" data-anchor-id="code">Code</h2>
<p><a href="https://github.com/brycegrover/%5BTODO-repo-name%5D">Repository on GitHub</a></p>


</section>

 ]]></description>
  <category>geospatial</category>
  <category>clustering</category>
  <category>urban analytics</category>
  <category>GeoPandas</category>
  <guid>https://brycegrover.com/projects/geospatial-crime.html</guid>
  <pubDate>Sun, 01 Dec 2024 05:00:00 GMT</pubDate>
  <media:content url="https://brycegrover.com/assets/projects/geospatial.jpg" medium="image" type="image/jpeg"/>
</item>
</channel>
</rss>
