Research

Papers and ongoing investigations

Peer-reviewed work, work-in-progress papers, and research-grade builds.

Recent

Match Your Loss to Your Cost

Submitted

CNSM 2026 Submission

Decision-aware traffic forecasting for backbone capacity planning. Asymmetric losses and conformal capacity bands trained against operator cost, not RMSE. Three real backbones, 20 seeds, paired-bootstrap CIs.

  • Cusp-linear loss matched to operator ratio: +76% Abilene, +75% GÉANT, +54% CESNET vs MSE at top operator asymmetry. L1 is the canonical consistent scoring rule for the τ-quantile (Gneiting 2011); squared asymmetric collapses on heavy-tailed GÉANT.
  • Cross-architecture: matched 5:1 win reproduces on DLinear (+30 to +97%) and iTransformer (+28 to +79%) across Abilene/GÉANT/CESNET.
  • ACI vs split CQR: overload rate 155× lower on Abilene, 9.1× lower on GÉANT, 3.8× lower on CESNET. ACI's across-seed coverage variance is 30 to 200× smaller.

Scheduled Partial-Credit RL for Reliable Code Generation with Small Language Models (WIP)

Published

LCTES 2026

Reliability-first RL for small language models in code generation. Joint reward R = 0.6·R_func + 0.4·R_sec with a five-stage partial-credit functional ladder.

  • On DeepSeek-Coder-1.3B over 100 APPS+ prompts: SFT 44% syntax / 3% ≥1-pass. Binary-reward PPO degrades to 18% / 0%. Partial-credit from scratch reaches 27% / 2%.
  • Binary-to-partial-credit schedule (PPO-continue) wins: 63% syntax, 9% ≥1-pass, 2% all-pass (single attempt). Curriculum learning the schedule matters more than the reward shape alone.
  • LoRA r=16 (6.3M trainable params, 0.47%), single V100 16GB, Bandit-graded R_sec. Security null on APPS+ (algorithmic); CWE-mapped partial credit is the next step.

Fixing Performance Bugs Through LLM Explanations

Published

IEEE AITest 2025

Using LLM explanations as a training signal (not just labels) to detect Java performance bugs. Peer-reviewed at IEEE AITest 2025.

  • Curated dataset of 490 performance bugs across 17 Defects4J projects, 5-category taxonomy (algorithmic, memory, CPU, redundant, I/O).
  • Fine-tuned GPT-4o-mini to produce explanations alongside predictions. Detection accuracy 67.3% → 83.7%, F1 64.6% → 82.3%.
  • Full reproduction stack public: extraction, categorization, fine-tuning, evaluation harness.