MSc Defences Spring 2024

See the list of MSc defences at DIKU this winter. The list will be updated continuously.

Information about the thesis, supervisor, location of the defence, etc. can be found on the respective events below.

Computer Science

 

Name of student(s)  

Maria Elkjær Montgomery

Study Programme  

Computer Science

Title  

Prediction of Breast Cancer Risk using Transformer Model

Abstract  

Breast cancer is the most common cancer in women worldwide. Early diagnosis of the disease significantly decreases mortality rates, driving the widespread adoption of mammography screening. In Denmark, screening is restricted to women aged 50 to 70, targeting those at highest risk. However, a growing body of research suggests initiating screening at age 40. To address this, we developed a series of BERT-based models using Electronic Health Record (EHR) data to identify women aged 40 to 50 with
elevated breast cancer risk, aiming to facilitate their referral for
mammography screening.
The models were developed utilising data describing the diagnostic, medication, and lab test history of approximately 1.8 million Danish patients from 2016 to 2022. Four main variants of the model were implemented, each incorporating progressively more data. Model training consisted of two main phases: pretraining and fine-tuning. Additionally, three different baselines were implemented: a random forest classifier utilising the embeddings from the pre-trained BERT, an XGBoost classifier, and a RETAIN model.
Performance was evaluated using the area under the receiver operating characteristic curve (ROC AUC). Results revealed that the model incorporating diagnostic codes, medicinal codes, and the presence of unique daily lab tests achieved the highest ROC AUC scores of 0.629 ± 0.029 for individuals aged 40-50, 0.639 ± 0.012 for those aged 50-69, and 0.785 ± 0.005 for the entire cohort. Two thresholds were selected for the model: the threshold where the average risk for the high-risk 40-50-year-olds matched the average risk for 50-55-year-olds and the threshold where the
average risk for the 40-50-year-olds had doubled. This resulted in
a precision and recall of 0.037 and 0.248 for the higher threshold
and 0.032 and 0.267 for the lower threshold.
In conclusion, the BERT-based model showed promising performance as a surveillance tool for the detection of breast cancer risk for women in the 40 to 50 age range, warranting further development and validation on a larger dataset.

Supervisor(s)  

Mads Nielsen

External examiner(s)  

Rasmus Reinhold Paulsen

Date and time  

03.04.2024 13:15-14:15

Room  

P1, Øster Voldgade 3

 

 

 

 

Name of student(s)  

Anne Holst Padkjær

Study Programme  

Computer Science

Title  

Combining the MWEM Mechanism and the Matrix Mechanism

Abstract  

This thesis explores the intersection of differential privacy and data utility,
focusing on leveraging the Matrix Mechanism to extend the utility bound
of the Multiplicative Weight - Exponential Mechanism (MWEM) Mechanism,
by employing the MWEM Mechanism as the foundational differential privacy mechanism within the Matrix Mechanism’s framework.
The primary contributions from this master thesis are:
• A modified version of the MWEM Mechanism, modified to generate
synthetic data vectors with respect to linear counting queries.
• A thorough assessment of the MWEM Mechanism and its limitations.
• An exploration of how the MWEM Mechanism can be used as the differentially private mechanism in the Matrix Mechanism, including a discussion of the limitations of this approach.

The empirical investigation of the modified MWEM Mechanism reveal notable limitations, especially in cases where the sum of the data vectors have lower values. A critical discovery is that there is a significant divergence between the theoretical maximum absolute error bound and the much lower actual error observed, suggesting a potential overestimation of error in theoretical models. The utility of the modified MWEM Mechanism was found to heavily depend on the query set available when generating the synthetic data vector, highlighting an important consideration for practical applications.

Integrating the MWEM Mechanism as the differentially private mechanism
in the Matrix Mechanism, revealed no significant benefits regarding the expanded utility bound, and limited the MWEM Mechanism’s scope to a simpler histogram approach.

Overall, the research in this thesis indicates that the combination of the
MWEM Mechanism and Matrix Mechanism does not consistently enhance
their utility when compared to their isolated applications.

Supervisor(s)  

Rasmus Pagh

External examiner(s)  

Daniele Dell Aglio

Date and time  

04.04.2024 9:30-10:30

Room  

Store UP1

 

 

Name of student(s)  

Lukas Uhrenholdt

Study Programme  

Computer Science

Title  

Geometric Packing Algorithms: An Examination

Abstract  

In computational geometry, packing problems ask whether a set of rigid pieces can be placed inside a target region such that no two pieces overlap. These optimization problems are known to be NP-hard [7] and the field of study has a long history, with numerous contributions made over the years from packing axis-parallel rectangles using shelf algorithms to packing convex polygons with more complex algorithms. There are multiple categories of packing problems such as BIN-PACKING, STRIP-PACKING, AREA-MINIMIZATION, PERIMETER-MINIMIZATION and MINIMUM-SQUARE all of which I will examine. Because the problems are NP-hard, they must be solved using heuristics and approximation algorithms rather than exact algorithms. In this thesis, I will delve into an examination of the contributions from several papers in the field. I will investigate,
visualize, and compare the evolution of their main ideas and the respective approximation bounds they have obtained.

Supervisor(s)  

Mikkel Vind Abrahamsen

External examiner(s)  

Inge Li Gørtz

Date and time  

08.04.2024 10:00 - 11:00

Room  

HCØ, Aud. 8

 

 

 

 

Name of student(s)  

Mathias Niebuhr Bjerregaard

Study Programme  

Computer Science

Title  

Autonomous Digitization of Museum Objects

Abstract  

The Natural History Museum of Denmark (NHMD) has embarked on a journey to digitize its museum collection, which includes an estimated 14 million samples. This thesis outlines the design and experimental evaluation of a system that aims to autonomously ensure focused
and centered images of a museum object. The system uses a single camera mounted on a UR5e robot arm. The system explored the use of a contrast-based method to measure the focus of an image and developed two methods for measuring the centering of an object within an image.
This project implemented two simple control principles and three optimization algorithms as control principles. The three optimization algorithms implemented were the direct search Pattern Search algorithm, the stochastic Covariance Matrix Adaptation Evolutionary Strategy
algorithm, and a gradient-based naive Quasi-Newton algorithm.

The system implemented in this project has an expected throughput of 3.7 museum objects per operator per hour, which does not make it competitive with alternative system designs.
Furthermore, the image quality generated by the system is not considered sufficiently consistent for its intended use. The experiments also revealed decreasing image quality for objects with little texture. Nonetheless, a system using a camera mounted on a robot arm still holds enormous potential because it can relieve human operators and generate potentially
infinite camera views.

Supervisor(s)  

Kim Steenstrup Pedersen and Hang Yin

External examiner(s)  

Rasmus Reinhold Paulsen

Date and time  

08.04.2024 14:00-15:00

Room  

SCI-DIKU-UP1-2-0-06

 

 

Name of student(s)  

Nóra Püsök

Study Programme  

Computer Science

Title  

FEDT: Fabrication Experiment Design Tool

Abstract  

A subset of fabrication research in human-computer interaction involves
characterization-type experiments. Despite identifiable common computer-controlled elements present, there is a large variation throughout design space and reporting of results. There exists no automated support to ensure that these experiments are conducted in a consistent, easy to follow and reproducible manner. To help researchers design and run experiments in digital fabrication, we seek to answer the question whether we can express these experiments in a unified way. We develop FEDT, Fabrication Experimentation Design Tool, with focus on the 3d printing experiment workflow. The FEDT tool is an automated system for digital fabrication experiments and it produces files necessary for the components (3d model geometry, machine code, recording of measurement, interaction, post-processing and statistical tests) of the experiment. These files enable replicability of experiments, and ease with a future pre-registration. We evaluate the toolkit by demonstrating three workflows of classic experiments in the FEDT architecture. The evaluation reveals that FEDT’s modular architecture, exposed to the user together with
supporting documentation, ensures extendability within the class of experiments sharing this dual nature of automated fabrication involving human intervention. Finally, we discuss the expressivity of the FEDT system.

Supervisor(s)  

Valkyrie Savage

External examiner(s)  

Nanna Inie Strømberg-Derczynski

Date and time  

09.04.2024 9:00-10:00

Room  

Sigurdsgade 41, 0-11 conference room

 

 

Name of student(s)  

Michael Antonios Kruse Ayoub

Study Programme  

Computer Science

Title  

Information Retrieval with Large Language Models

Abstract  

Classical query expansion methods such as Pseudo Relevance Feedback (PRF) have shown promising capabilities of improving the recall of search systems in information retrieval. Recently, Large Language Models (LLMs), such as Chat-GPT, have gained increased popularity across various domains due to the generative abilities they possess. This thesis aims to investigate how LLMs can be leveraged for query expansion to improve the effectiveness of both sparse and dense information retrieval systems. PRF relies heavily on the accuracy of the initially obtained ranking of documents that can be used for query expansion. This dependency does not exist in the proposed LLM query expansion approach. LLM can instead be used as a black-box model that transforms queries into for example a paraphrased query or an answer to the query (pseudo document), depending on the prompting strategy used. It is believed that the LLM-generated outputs can help minimize the query-document mismatch that is inherent
in information retrieval and therefore improve the effectiveness of various types of retrieval systems. The experimental results show that LLM query expansion can significantly improve recall and top-heavy ranking metrics for both sparse and dense IR systems compared to applying no
query expansion. Furthermore, the results indicate that query expansion using LLMs can outperform traditional PRF methods, such as RM3, across various ad-hoc IR datasets including MS-MARCO Dev Passage, Natural Questions, and TREC.

Supervisor(s)  

Qiuchi Li

External examiner(s)  

Troels Andreasen

Date and time  

18.04.2024 10:00-11:00

Room  

UP1-2-0-06

 

Mathematics

 

Name of student(s)  

Mathias Schack Rabing

Study Programme  

Mathematics

Title  

Nonuniform (co)datatypes and (co)recursion in Isabelle

Abstract  

Nonuniform (which is also known as nested or heterogeneous) (co)datatypes are recursive types, where the type argument can vary. Currently, there are no supported way of instantiating these nonuniform
(co)datatypes in Isabelle, and therefore the user needs to do a significant amount of work in order to reason about nonuniform (co)datatypes.
The theoretical background for implementing nonuniform (co)datatypes in Isabelle has been completed, and significant parts of it has been implemented [2]. In this thesis we completed the implementation of
nonuniform codatatypes. Additionally, we continued the work on implementing nonuniform (co)recursive functions.

Supervisor(s)  

Dmitriy Traytel

External examiner(s)  

Jesper Andreas Bengtson

Date and time  

22.03.2024 10:00-11:00

Room  

Sigurdsgade 41, room 2-03

 

 

Name of student(s)  

Mikkel Kristian Rathmann Mølbak

Study Programme  

Mathematics

Title  

Euler-Poincaré equations with generalisation to Vortex sheets and diffeomorphism groupoids

Abstract  

In this thesis we show the Euler-Poincaré reduction of equations describing motion physical systems. We give proof of this reduction in the Lagrangian of Hamiltonian formulation, and apply the theory to give the motion of a manifold under action of the full diffeomorphism group. We then give a generalisation of this theory to motion of manifolds with discontinuity along a hypersurface under the full diffeomorphism group. For this
we introduce the Lie groupoid of discontinuous diffeomorphism and it’s corresponding Lie algebroid. By constructing a Poisson structure on the dual Lie algebroid can deduce the equations of motion called the groupoid Euler-Arnold equations.

Supervisor(s)  

Stefan Sommer

External examiner(s)  

David Brander

Date and time  

05.04.2024 13:00-14:00

Room  

Seminar room, Østervoldgade 3