I have very broad interests in Software Engineering and Artificial Intelligence but in particular focus on Empirical Software Engineering, Mining Software Respositories, AI4SE and SE4AI.
Research tools5
Open datasets3
Years of focus2016-2025
Research areas
Tip: Selecting an area filters the projects below.
Click All areas to reset the list.
Empirical Software Engineering
Advancing evidence‑based software practice through rigorous empirical methods—large‑scale mining, experiments and mixed‑method studies of developer workflows, artefacts and socio‑technical systems—to evaluate and improve tools and processes.
Mining Software Repositories
Analysing version control, issue and code review repositories, Q&A archives, CI/build servers and runtime telemetry with data science, ML/AI and qualitative methods to surface actionable insights that improve software engineering practice and guide project evolution.
AI4SE
dvancing AI‑driven Software Engineering through human‑centred, trustworthy, sustainable and collaborative methods - spanning LLM‑enabled automation, AI‑assisted design, recommender systems for code and repair, prompt engineering, and rigorous efficacy measurement beyond traditional metrics.
SE4AI
Engineering reliable, reproducible AI‑enabled systems—spanning SE for models and AI‑infused systems; AI code, libraries and datasets; autonomic and self‑healing systems with automated model repair; rigorous testing, verification, validation and user‑based evaluation; and requirements engineering.
Ongoing projects
Since 2025
Design Pattern Identification and Summarisation
A feature-based and LLM-based design pattern summarisation system that parses Java systems with JavaParser, produces JSON knowledge graphs and turns them into readable English generated narratives.
Captures both the structural context and usage intent for every detected pattern.
A machine-learning model that detects Gang-of-Four patterns inside Java projects, combining structural fingerprints with semantic cues so teams can inventory reuse opportunities.
Ships both a classic pipeline and a Python 3 refactor for modern toolchains.
Provides a reproducible corpus for benchmarking new detection heuristics.
Feeds summaries and diagrams into the Design Pattern Summariser pipeline.
A web-based annotation environment where researchers and practitioners label Java design pattern instances and summaries to bootstrap supervised learning datasets.
Streamlines the end-to-end labeling workflow for machine-learning-ready corpora.
Supports collaborative review cycles so multiple experts can converge on gold data.
Exports datasets compatible with the detection and summarisation pipelines below.