Traditional tools (static analyzers, profilers) are limited to known patterns and struggle with complex code.
Fine-tune a large language model to not only fix performance bugs but also generate human-readable explanations that developers can understand.
| Category | Count | Percentage | Common Patterns |
|---|---|---|---|
| Algorithmic Inefficiency | 165 | 33.7% | Nested loops, wrong data structures |
| Memory Usage | 116 | 23.7% | Memory leaks, large allocations |
| CPU Overhead | 98 | 20.2% | Redundant computations |
| Redundant Computation | 54 | 11.0% | Repeated calculations |
| I/O Inefficiency | 56 | 11.4% | Excessive file operations |
Using multiple contextual signals (code diffs, comments, bug reports) to improve both accuracy and interpretability
Multiple signals improve both accuracy & interpretability
Bugs correctly detected out of total
Generated explanations matching actual issue (0.75+ weighted score)
Harmonic mean of precision & recall
| Metric | Base GPT-4o-mini | Fine-tuned Model | Improvement |
|---|---|---|---|
| Accuracy | 67.3% | 83.7% | +16.4% |
| Precision | 65.1% | 83.0% | +17.9% |
| Recall | 64.2% | 81.8% | +17.6% |
| F1 Score | 64.6% | 82.3% | +17.7% |
// Buggy: O(n²) complexity - unnecessary full table scan
protected void removeEntry(...) {
// Step 1: Normal O(1) removal
if (previous == null) {
data[hashIndex] = entry.next;
} else {
previous.next = entry.next;
}
// Step 2: Problematic full scan (lines 11-20)
for (HashEntry element : data) {
// Redundant search through entire table
}
}
| Actual \ Predicted | Algorithmic | Memory | Redundant | CPU | I/O |
|---|---|---|---|---|---|
| Algorithmic | 30 | 1 | 1 | 1 | 0 |
| Memory | 2 | 19 | 0 | 2 | 0 |
| Redundant | 0 | 1 | 9 | 1 | 0 |
| CPU Overhead | 2 | 1 | 1 | 16 | 0 |
| I/O Inefficiency | 1 | 0 | 1 | 1 | 8 |
First to use multiple contextual signals (diffs, comments, bug reports) for performance bug LLMs
Largest public Java performance bug dataset with 5 categories and detailed metadata
83.7% detection rate with 90.2% explanation quality - bridging automated tools and developer understanding
Primary: sss6371@psu.edu
Suryansh Singh Sijwali
Research Team: A.M. Colom, A. Guo, S. Saha
Feel free to reach out to any team member