Academic
Academic
Home
Projects
Talks
Publications
Contact
Light
Dark
Automatic
Meta-heuristics
A Systematic Comparison of Search-Based Approaches for LDA Hyperparameter Tuning
Context: Latent Dirichlet Allocation (LDA) has been successfully used in the literature to extract topics from software documents and support developers in various software engineering tasks. While LDA has been mostly used with default settings, previous studies showed that default hyperparameter values generate sub-optimal topics from software documents. Objective: Recent studies applied meta-heuristic search (mostly evolutionary algorithms) to configure LDA in an unsupervised and automated fashion. However, previous work advocated for different meta-heuristics and surrogate metrics to optimize. The objective of this paper is to shed light on the influence of these two factors when tuning LDA for SE tasks. Method: We empirically evaluated and compared seven state-of-the-art meta-heuristics and three alternative surrogate metrics (i.e., fitness functions) to solve the problem of identifying duplicate bug reports with LDA. The benchmark consists of ten real-world and open-source projects from the Bench4BL dataset. Results: Our results indicate that (1) meta-heuristics are mostly comparable to one another (except for random search and CMA-ES), and (2) the choice of the surrogate metric impacts the quality of the generated topics and the tuning overhead. Furthermore, calibrating LDA helps identify twice as many duplicates than untuned LDA when inspecting the top five past similar reports. Conclusion: No meta-heuristic and/or fitness function outperforms all the others, as advocated in prior studies. However, we can make recommendations for some combinations of meta-heuristics and fitness functions over others for practical use. Future work should focus on improving the surrogate metrics used to calibrate/tune LDA in an unsupervised fashion.
Annibale Panichella
JCOMIX: A Search-Based Tool to Detect XML Injection Vulnerabilities in Web Applications
Dimitri Michel Stallenberg
,
Annibale Panichella
Code
Video
Search-Based-LDA
R Scripts to configure LDA using meta-heuristics
Jun 26, 2019
A Test Case Prioritization Genetic Algorithm guided by the Hypervolume Indicator
Dario Di Nucci
,
Annibale Panichella
,
Andy Zaidman
,
Andrea De Lucia
PDF
Dataset
Search-Based Crash Reproduction and Its Impact on Debugging
Mozhan Soltani
,
Annibale Panichella
,
Arie van Deursen
Preprint
PDF
Code
Dataset
Project
A Search-based Approach for Accurate Identification of Log Message Formats
Salma Messaoudi
,
Annibale Panichella
,
Domenico Bianculli
,
Lionel Briand
,
Raimondas Sasnauskas
Automatic Generation of Tests to Exploit XML Injection Vulnerabilities in Web Applications
Sadeeq. Jan
,
Annibale Panichella
,
Andrea Arcuri
,
Lionel Briand
Preprint
PDF
A Machine Learning- Driven Evolutionary Approach for Testing Web Application Firewalls
Dennis Appelt
,
Cu D. Nguyen
,
Annibale Panichella
,
Lionel Briand
Parameterizing and Assembling IR-based Solutions for Software Engineering Tasks using Genetic Algorithms
Annibale Panichella
,
Bogdan Dit
,
Rocco Oliveto
,
Massimiliano di Penta
,
Denys Poshyvanyk
,
Andrea De Lucia
A Search-based Training Algorithm for Cost-aware Prediction
Annibale Panichella
,
Carol V. Alexandru
,
Sebastiano Panichella
,
Alberto Bacchelli
,
Harald Gall
Code
Dataset
Cite
×