Research

I have very broad interests in Software Engineering and Artificial Intelligence but in particular focus on Empirical Software Engineering, Mining Software Respositories, AI4SE and SE4AI.

Research tools 5
Open datasets 3
Years of focus 2016-2025

Research areas

Tip: Selecting an area filters the projects below.

Click All areas to reset the list.

Empirical Software Engineering

Advancing evidence‑based software practice through rigorous empirical methods—large‑scale mining, experiments and mixed‑method studies of developer workflows, artefacts and socio‑technical systems—to evaluate and improve tools and processes.

Mining Software Repositories

Analysing version control, issue and code review repositories, Q&A archives, CI/build servers and runtime telemetry with data science, ML/AI and qualitative methods to surface actionable insights that improve software engineering practice and guide project evolution.

AI4SE

dvancing AI‑driven Software Engineering through human‑centred, trustworthy, sustainable and collaborative methods - spanning LLM‑enabled automation, AI‑assisted design, recommender systems for code and repair, prompt engineering, and rigorous efficacy measurement beyond traditional metrics.

SE4AI

Engineering reliable, reproducible AI‑enabled systems—spanning SE for models and AI‑infused systems; AI code, libraries and datasets; autonomic and self‑healing systems with automated model repair; rigorous testing, verification, validation and user‑based evaluation; and requirements engineering.

Ongoing projects

Since 2025

Design Pattern Identification and Summarisation

A feature-based and LLM-based design pattern summarisation system that parses Java systems with JavaParser, produces JSON knowledge graphs and turns them into readable English generated narratives.

  • Captures both the structural context and usage intent for every detected pattern.
  • Exports enriched JSON artifacts, enabling downstream reasoning pipelines.
  • Automates summary text generation so reviewers can skim complex codebases quickly.
AI4SE Empirical SE MSR

Completed projects

2022-2024

Feature-based Design Pattern Detection

A machine-learning model that detects Gang-of-Four patterns inside Java projects, combining structural fingerprints with semantic cues so teams can inventory reuse opportunities.

  • Ships both a classic pipeline and a Python 3 refactor for modern toolchains.
  • Provides a reproducible corpus for benchmarking new detection heuristics.
  • Feeds summaries and diagrams into the Design Pattern Summariser pipeline.
Empirical SE MSR
2023-2024

CodeLabeller

A web-based annotation environment where researchers and practitioners label Java design pattern instances and summaries to bootstrap supervised learning datasets.

  • Streamlines the end-to-end labeling workflow for machine-learning-ready corpora.
  • Supports collaborative review cycles so multiple experts can converge on gold data.
  • Exports datasets compatible with the detection and summarisation pipelines below.
Empirical SE MSR
2016-2017

Source Code Fragment Summarisation (CFS)

The first effort to blend supervised learning with crowdsourcing for summarising source code fragments harvested from Eclipse and NetBeans FAQs.

  • Builds a 127-fragment corpus plus feature lists curated by nine expert volunteers.
  • Applies SVM and Naive Bayes classifiers trained on crowd-sourced features.
  • Makes both the corpus and feature sets openly available for replication.
MSR Empirical SE
2017

PRST: PageRank-based Bug Report Summaries

A PageRank-inspired approach for summarising duplicate-heavy bug reports drawn from Eclipse, Mozilla, KDE, and Gnome tracker conversations.

  • Constructs the modified BRC corpus (28 reports) and OSCAR corpus (59 reports).
  • Engages human annotators to label sentence-level extracts for evaluation.
  • Releases both the raw bug corpus and the manually annotated summaries.
Empirical SE MSR