BreachT5: Ensembling CodeT5+ Models for Multi-Label Vulnerability Detection in Smart Contracts

Abstract

Ethereum smart contracts manage billions in digital assets, and vulnerability detection is critical given the immutability of deployed code and the irreversible nature of transactions. However, exist- ing tools such as Slither rely on rigid, rule-based analysis, and general-purpose language models like ChatGPT often miss rare or context-dependent bugs. To address these limitations, this paper presents BreachT5, an ensemble of two fine-tuned CodeT5+ models designed for multi-label vulnerability detection in Solidity contracts. We first fine-tune a 220M parameter model on over 67,000 real con- tracts labeled with the Smart Contract Weakness Classification (SWC), revealing intrinsic detection differences across vulnerability types. We then explore the performance of a 770M variant, which improves accuracy on frequent classes but underperforms on rare ones. To balance this trade-off, BreachT5 combines both models via soft voting with per-class thresholds. Our results on the BCCC-SCsVuls2024 dataset show that BreachT5 achieves 0.556 Macro-F1 and 0.612 Micro-F1, outperforming the two standalone models, Slither, and GPT-5 in multi-label vulnerability detection.

Type
Publication
The 4th Workshop on Emerging Trends in Software Engineering for Blockchain (WETSEB)