In the same section
- Hexagonal Fourier transform for Compression of Plenoptic video
-
Promotor, co-promotor, advisor : gauthier.lafruit@ulb.be, - , sarah.fernandes.pinto.fachada@ulb.be, daniele.bonatto@ulb.be, eline.soetens@ulb.be
Research Unit : LABORATORY OF IMAGE SYNTHESIS AND ANALYSIS - VIRTUAL REALITY (LISA-VR)
Description
Description
Plenoptic cameras (such as Raytrix) possess a main lens, a sheet of micro-lens, and a CMOS sensor. This special design offers the possibility to capture directional light rays and thus, 3D information about the scene. These cameras are called Light field cameras and are theoretically more suitable for 3D and VR applications than conventional cameras. Due to their structure, they capture an image composed of many micro-images placed in a hexagonal grid, creating patterns that are non-optimal to compress using the JPEG algorithm, even if the image itself presents redundancies that are not exploited.
The JPEG algorithm divides the image in blocks then uses the Fourier transform to compute the blocks in the frequency domain. Then, only the most significant frequencies for the human eye are encoded, creating a low-storage representation of the image. To decompress the image, the inverse operation is performed.
Context
The aim of this thesis is to design a compression scheme using a hexagonal lattice for images in plenoptic format, and explore its efficiency. Using block sizes corresponding to the micro-images will simplify the encoding of the hexagonal image structure. Several datasets captured
with different plenoptic cameras (in micro-image size, resolution, depth of field) will be tested and compared with the MPEG explorations of lenslet video coding activities.
Objective
At the end of the year, the student must present
- An implementation of a hexagonal block-based adaptation of the JPEG compression
- Compute its efficiency compared to classical image compression framework used in MPEG LVC activities
Prerequisite
· Good knowledge of C++ programming · Any multimedia course (INFOH502, INFOH503, or similar courses) · Compression knowledge (INFO-H516)
Contact person
gauthier.lafruit@ulb.be, mehrdad.teratani@ulb.be, sarah.fernandes.pinto.fachada@ulb.be, daniele.bonatto@ulb.be, eline.soetens@ulb.be
References
Hexagonal image processing : L. Middleton et J. Sivaswamy, Hexagonal image processing: a practical approach, Springer. in Advances in pattern recognition, no. Advances in pattern recognition. London: Springer, 2005.
Plenoptic camera : C. Perwass et L. Wietzke, Single lens 3D-camera with extended depth-of-field, IS&T/SPIE Electronic Imaging, Burlingame, California, USA, 2012, p. 829108. doi: 10.1117/12.909882.
- View synthesis with Gaussian splatting and Plenoptic cameras
-
Promotor, co-promotor, advisor : gauthier.lafruit@ulb.be, - , sarah.dury@ulb.be, daniele.bonatto@ulb.be, eva.dubar@ulb.be
Research Unit : LABORATORY OF IMAGE SYNTHESIS AND ANALYSIS - VIRTUAL REALITY (LISA-VR)
Description
â¯Description
View synthesis aims to generate novel viewpoints of a real-world scene from a limited set of input images.
Plenoptic cameras [1], equipped with a micro-lens array, capture both spatial and angular information in a single shot - providing depth cues without requiring multiple viewpoints. This makes them a compact and promising solution for 3D reconstruction and view synthesis.
However, the current state-of-the-art in view synthesis - 3D Gaussian Splatting [2] - has been developed for traditional cameras and requires many images to perform well. It does not natively support the plenoptic camera model.
The goal of this thesis is to adapt Gaussian Splatting to plenoptic camera data, enabling realistic view synthesis from fewer inputs. This work will bridge a gap between modern rendering techniques and emerging camera technologies, and contribute to making efficient, high-quality 3D capture more accessible.
â¯Context
The project will take place in the LISA-VR research unit, which focuses on volumetric rendering and view synthesis. You will have access to the labâs infrastructure, including plenoptic cameras and datasets specifically captured for this kind of research.
â¯Objective
The main objective is to implement Gaussian Splatting for Plenoptic cameras.
By the end of the project, the system should be able to optimize a scene using one or more plenoptic cameras, and optionally also integrate traditional images. Although a single plenoptic camera theoretically provides all the required data for 3D reconstruction, its limited spatial resolution may require additional views to improve accuracy.
A comparative study will be conducted to evaluate reconstruction quality between plenoptic and conventional cameras. This includes analyzing how many input images are needed to achieve high quality depending on scene characteristics such as lighting, material properties, and geometric complexity.
â¯Methods
Standard Gaussian Splatting implementations [2] use rasterization, which assumes a simple pinhole camera model and makes it difficult to incorporate plenoptic data directly.
Recent work on ray tracing-based Gaussian rendering [3] provides a more flexible framework that supports arbitrary camera models. This ray tracing approach will be the starting point for adapting the method to handle plenoptic cameras.
â¯Prerequisites
- Proficient in C++
- Experience with CUDA (preferred but not required, INFO-H503)
â¯Contact
For more information, please contact: Sarah Dury sarah.dury@ulb.be
â¯References
[1] C. Perwass and L. Wietzke, âSingle lens 3D-camera with extended depth-of-field,â IS&T/SPIE Electronic Imaging, 2012. [2] B. Kerbl, G. Kopanas, T. Leimkuehler, and G. Drettakis, â3D Gaussian Splatting for Real-Time Radiance Field Rendering,â ACM Transactions on Graphics, 2023. [3] N. Moenne-Loccoz et al., â3D Gaussian Ray Tracing: Fast Tracing of Particle Scenes,â ACM Transactions on Graphics (SIGGRAPH Asia), 2024.
- Gaussian splat semantic segmentation
-
Promotor, co-promotor, advisor : gauthier.lafruit@ulb.be, - , Patrice Rondao Alface
Research Unit : LABORATORY OF IMAGE SYNTHESIS AND ANALYSIS - VIRTUAL REALITY (LISA-VR)
Description
Project title
Segmentation in 3D scenes represented by Gaussian Splats
Context
The project is done in collaboration with Nokia (Antwerp).
Objective
Gaussian splats have recently revolutionized the way to capture and represent 3D scenes and models in real-time with photorealsitic quality on devices such as mobile phones. This opens the way for truly 3D immersive communication in augmented reality, where one can view the world with the navigation freedom of a video game. Gaussian splats are primitives that locally model the geometry and the appearance of a region in space represented as a 3D Gaussian, and are represented with a number of attributes describing their position, shape, transparency, color and reflectivity. These Gaussian splats are typically learned from camera views using approaches as simple as gradient descent, which allow fast learning times with high quality.
Such Gaussian splats can be rendered in real time and are able to represent complex visual effects due to transparent materials such as glass or reflective surfaces such as mirrors. One of the issues with such representation is the lack of connectivity that we can find in meshes, it is challenging to animate such Gaussian splats without additional structural information. Another issue is the size of the data these Gaussian splats represent in GPU memory or for data transfers.
To address these issues, this master thesis will investigate the semantic segmentation of Gaussian splats representing scenes, people and objects. This can be performed by segmenting the capture images used to learn the Gaussian splats and federating this information or directly segmenting the Gaussian splats themselves for example extending point cloud segmentation approaches. The next step may include rendering a Gaussian splat scene by enabling or removing segmented objects or parts of the scene.
Prerequisite
- C/C++
Contact person
For more information please contact : gauthier.lafruit@ulb.be
- Dynamic Gaussian splat learning and tracking
-
Promotor, co-promotor, advisor : gauthier.lafruit@ulb.be, - , Patrice Rondao Alface
Research Unit : LABORATORY OF IMAGE SYNTHESIS AND ANALYSIS - VIRTUAL REALITY (LISA-VR)
Description
Project title
The project aims at solving temporal stability in dynamic scenes represented with Gaussian Splats.
Context
The project is done in collaboration with Nokia (Antwerp).
Objective
Gaussian splats have recently revolutionized the way to capture and represent 3D scenes and models in real-time with photorealsitic quality on devices such as mobile phones. This opens the way for truly 3D immersive communication in augmented reality, where one can view the world with the navigation freedom of a video game. Gaussian splats are primitives that locally model the geometry and the appearance of a region in space represented as a 3D Gaussian, and are represented with a number of attributes describing their position, shape, transparency, color and reflectivity. These Gaussian splats are typically learned from camera views using approaches as simple as gradient descent, which allow fast learning times with high quality.
While capturing static datasets and learning Gaussian splats is well mastered and can be done using a mobile device, the capture of dynamic scenes faces some challenges. In this master thesis, the goal is to explore existing solutions that able to track Gaussian splats learned on the first temporal frame of the capture through the duration of the capture. The focus will be set on testing different strategies that increase the reconstruction quality of the learned dynamic Gaussian splats by using, adapting and extending an existing framework.
Prerequisite
- C/C++
Contact person
For more information please contact : gauthier.lafruit@ulb.be
- Weakly Supervised Segmentation of Malignant Epithelium in Digital Breast Pathology
-
Promotor, co-promotor, advisor : olivier.debeir@ulb.be, jennifer.dhont@hubruxelles.be, younes.jourani@hubruxelles.be
Research Unit : LISA - IMAGE
Description
Project title
The project aims to solve an open issue in a certain domain of application.
Background
Tumor segmentation in digital pathology plays a crucial role in breast cancer diagnosis and prognosis [1], [2]. Precise delineation of malignant epithelial regions in hematoxylin and eosin (H&E)-stained or immunohistochemistry (IHC)-stained slides enables downstream analyses, such as cellularity estimation and biomarker quantification for diagnostic pathological examination, therapeutic response assessment, treatment selection, and survival prediction [3]â[8]. Deep learning-based segmentation approaches overcome the inefficiency of manual assessment, enabling high-throughput analysis of histopathological datasets. However, current approaches predominantly rely on supervised learning, which requires labor-intensive pixel-level manual annotations that are impractical at scale [9]â[11].
Weakly supervised learning has emerged as a promising alternative, leveraging coarse-grained labels to reduce annotation burdens. Yet, existing solutions are constrained by the restriction to whole-slide image (WSI)-level classification [12], reliance on partial cell-level annotations [13], and unproven generalizability across diverse breast cancer cohorts and staining protocols [14], [15]. These challenges underscore the need for a weakly supervised segmentation method that is trained using only image-level annotations while achieving pixel-level precision in malignant epithelium delineation and generalizing to heterogeneous breast cancer datasets.
Specific tasks
Literature study to get familiar with the different topics.
Perform data preprocessing, including extracting patches from whole slide images, applying color deconvolution to separate the Hematoxylin stain from H&E and IHC images using ImageJ, and applying data augmentation techniques such as flipping, rotation, and adjusting brightness and contrast to address class imbalance.
Implement prevalent convolutional neural network (CNN) and Transformer models, as described in Table 4 and Table 5 of Ref. [16], and conduct training and inference of these models using Python, preferably with PyTorch.
Validate the segmentation results predicted by these models across various breast cancer datasets, including H&E and IHC images, by comparing them to the ground truth segmentation mask (e.g., on the MHCI and BCSS datasets) or the ground truth cellularity (e.g., on the BreastPathQ and Post-NAT-BRCA datasets).
[Optional] Develop multiple instance learning (MIL) techniques to improve segmentation performance across diverse breast cancer datasets, aiming to achieve accuracy comparable to that of supervised semantic segmentation methods.
Resources
BreastPathQ dataset: a public dataset consisting of 69 H&E stained WSI collected from the resection specimens of 37 post-neoadjuvant therapy patients with invasive residual breast cancer. 2579 image patches with ROI of 512 Ã 512 pixels are manually annotated with estimated cellularity ranging between [0, 1].
Other public datasets: https://github.com/maduc7/Histopathology-Datasets
IHC datasets in NEOCHECKRAY. There are 109 IHC patches stained with an MHC-I antibody with pixel-level manual annotations.
Prerequisite
- Python
Contact persons
Dr. Ir. Jennifer Dhont (jennifer.dhont@hubruxelles.be), Head of Data Science & AI Research Unit at Hopital Universitaire de Bruxelles (Erasme campus)
Pr O. Debeir (olivier.debeir@ulb.be)
references
[1] D. Yan, X. Ju, et al., âTumour stroma ratio is a potential predictor for 5-year disease-free survival in breast cancer,â BMC Cancer, vol. 22, no. 1, p. 1082, Oct. 2022.
[2] L. Priya C V, B. V G, V. B R, and S. Ramachandran, âDeep learning approaches for breast cancer detection in histopathology images: A review,â Cancer Biomarkers, vol. 40, no. 1, pp. 1â25, May 2024.
[3] M. Peikari, S. Salama, et al., âAutomatic Cellularity Assessment from Post-Treated Breast Surgical Specimens,â Cytometry A, vol. 91, no. 11, pp. 1078â1087, Nov. 2017.
[4] S. Akbar, M. Peikari, et al., âAutomated and Manual Quantification of Tumour Cellularity in Digital Slides for Tumour Burden Assessment,â Sci Rep, vol. 9, no. 1, p. 14099, Oct. 2019.
[5] X. Catteau, E. Zindy, et al., âComparison Between Manual and Automated Assessment of Ki-67 in Breast Carcinoma: Test of a Simple Method in Daily Practice,â Technol Cancer Res Treat, vol. 22, p. 15330338231169603, Jan. 2023.
[6] E. H. Allott, S. M. Cohen, et al., âPerformance of Three-Biomarker Immunohistochemistry for Intrinsic Breast Cancer Subtyping in the AMBER Consortium,â Cancer Epidemiology, Biomarkers & Prevention, vol. 25, no. 3, pp. 470â478, Mar. 2016.
[7] T. Vougiouklakis, B. J. Belovarac, et al., âThe diagnostic utility of EZH2 H-score and Ki-67 index in non-invasive breast apocrine lesions,â Pathology - Research and Practice, vol. 216, no. 9, p. 153041, Sep. 2020.
attached pdf document - RAG (Retrieval-Augmented Generation) for Patents
-
Promotor, co-promotor, advisor : olivier.debeir@ulb.be, Julien.Cabay@ulb.be, Thomas.Vandamme@ulb.be
Research Unit : LISA-IMAGE
Description
RAG (Retrieval-Augmented Generation) for Patents
This project consists in the design, development, and testing of a RAG system (an AI chatbot with a specific knowledge base) for a dataset of patents.
Context
Patents are an invaluable economic asset, enabling inventors to protect their inventions for a set duration of time. Those invaluable assets, in the form of patent documents, represent an enormous challenge for the administrations responsible with the protection processes (i.e. Intellectual Property Offices). Those documents are highly technical, composed of different modalities (text and schematics), and are particularly numerous (there were more than 35 million patents in force worldwide as of 2023, source WIPO statistics database).
Recent technological advancements in the field of Artificial Intelligence (AI), namely Large Language Models (LLMs) and the chatbots that these power, carry enormous promises of automation for these complex tasks. One of those, Retrieval-Augmented Generation, is frequently branded as a solution to hallucination in LLMs, as well as enabling a relatively easy specialization of the model using a knowledge library.
Objective
In this project, you will design, develop and test such a solution on a large corpus of patents.
Methods
Different open-source LLMs can be used and benchmarked, as well as the different RAG techniques. The dataset can be sourced from Google Patents.
Prerequisite
- Python
- Machine Learning / Deep Learning
Contact person
For more information please contact : Thomas.Vandamme@ulb.be
- Design and Implementation of a viewer for IP (Intellectual Property) datasets
-
Promotor, co-promotor, advisor : olivier.debeir@ulb.be, Julien.Cabay@ulb.be, Thomas.Vandamme@ulb.be
Research Unit : LISA-IMAGE
Description
Design and Implementation of a viewer for IP (Intellectual Property) datasets
This project consists in the design, development, and implementation of a viewer website/software for IP datasets (Trade Marks, Patents, ...). The viewer will enable users, developers and researchers to search, label and extract different relevant aspects of the datasets.
Context
Current dataset viewer tools, such as label studio (https://labelstud.io/), have demonstrated their relevance in research and development ecosystems, especially those related to data and deep learning. However, those tools are not perfect, and several panes of datasets, such as those related to the legal field (esp. text documents) are left out of such solutions.
Objective
In this project, you will develop a complete tool (ideally web-based), or an open-source plug-in for another viewer/labelizer (such as label studio, for example), capable of handling multimodal informations, such as those relative to IP (e.g. images, 3D volumes, schematics, text, sound, ...).
Prerequisite
- Web Technologies
- Python
Contact person
For more information please contact : Thomas.Vandamme@ulb.be
- Automated web scraping for dataset compilations
-
Promotor, co-promotor, advisor : olivier.debeir@ulb.be, Julien.Cabay@ulb.be, Thomas.Vandamme@ulb.be
Research Unit : LISA-IMAGE
Description
Automated web scraping for dataset compilations
This project consists in the design, development, implementation and testing of a series of automated web scrapers. The ultimate goal is to develop a series of tools to enable the acquisition and synchronisation of different datasets.
Context
Deep Learning relies on voluminous and (ideally) good-quality datasets. Those are, unfortunately, hard to gather and label.
In the field of Intellectual Property (IP, including, e.g. Patents, Trade Marks, Designs), some relevant informations are curated by public IP offices (tasked with the administration of the different associated rights; registration, protection, ...). Those public bodies make publicly available a series of information through various search engines (for example, see https://www.euipo.europa.eu/en/search and https://ipportal.wipo.int/home). The offices do not allow for bulk download, and curating by hand these datasets is a particularly tedious task (there are millions of registered rights, for example).
Objective
In this project, you will develop tools to enable the fast development of web scrapers, implementing various measures to disable anti-scraping protections on the websites. A new, untested use case will be chosen to illustrate the tools capabilities.
Methods
You will create web-based applications, interface with (simple) databases and external providers if needed. Your end-product will enable non-developers to choose the elements they want to retrieve automatically in a webpage, and various other settings.
Prerequisite
- Web Technologies
- Python
- (optional) Selenium or other automation software
Contact person
For more information please contact : Thomas.Vandamme@ulb.be
- Assessing legally-relevant similarities between goods and services of Trade Marks using Large Language Models (LLMs)
-
Promotor, co-promotor, advisor : olivier.debeir@ulb.be, Julien.Cabay@ulb.be, Thomas.Vandamme@ulb.be
Research Unit : LISA-IMAGE
Description
Assessing legally-relevant similarities between goods and services of Trade Marks using Large Language Models (LLMs)
In this project, you will evaluate the capabilities of current state-of-the-art AI models to assess similarity between goods and services in the field of Trade Marks.
Context
In the field of Trade Mark (TM) Law, similarity is paramount. TMs that are too similar (under a legal criterion called the "Likelihood of Confusion", LoC) to another already registered TM may not be registered.
The LoC test, a complex and multifactorial assessment performed by judicial authorities, involves two sub-assessments: the similarities between the signs (the images or text of the TM, e.g. "Coca-Cola", or the "ULB" logo), and the similarities between the goods and services for which those are registered (e.g. "cold beverages", and "Higher Education").
Technological solutions have been developed to alleviate the administrative burden caused by the massive amount of registered TMs combined with ever-increasing TM applications. In the case of estimating the similarities between 2D signs, the arguably most promising solution is search engines, while in the case of assessing such similarities between goods and services (an equally important test), this solution could take the form of a classifier or of a semantic search system.
Objective
For this project, you will investigate the extent to which current AI LLMs are able to assess this legal test of similarities of goods and services, conclude on the state of the technology for these tasks, and propose avenues to increase the performances.
Methods
A dataset of decisions, of which you will be able to extract the portion relevant to the assessment of similarity between goods and services, is already available.
A possible course of action for the experimentation involves a first approach of zero-shot learning, possibly combined with Retrieval-Augmented Generation (RAG) and/or mixture of experts. Strong of your early experiments, you will investigate hands-on the capacity of those tools by designing and training your own AI models to solve this task specifically, potentially based on pre-trained LLMs.
Prerequisite
- Machine Learning / Deep Learning
- Python
Contact person
For more information please contact : Thomas.Vandamme@ulb.be
- AI Perception of Design Similarity: 2D Views versus 3D Designs
-
Promotor, co-promotor, advisor : olivier.debeir@ulb.be, Julien.Cabay@ulb.be, Thomas.Vandamme@ulb.be
Research Unit : LISA-IMAGE
Description
AI Perception of Design Similarity: 2D Views versus 3D Designs
In this project, you will investigate technico-legal considerations relative to AI algorithms and how they perceive objects. To do this, you will design, implement, train and test various algorithms and AI models to perform a variety of atomic tasks.
Context
Designs are an important part of Intellectual Property, protecting the appearance of goods (e.g. the shape of the Coca-Cola bottle). These designs must be registered to be granted protection, and their registration is done through a series of 2D views (of these 3D objects), for administrative reasons.
In an infringement context, a judicial authority must assess whether two designs differ sufficiently or not. This authority is capable of inferring, from several 2D views, the general shape of the object, in order to perform this assessment.
Recent advances in AI have enabled various tools in this field, most importantly a design search tool, that enables individuals seeking registration of their design to search for potentially infringing designs. Those private tools are probably based on these 2D views, and may therefore lack a 3D understanding.
Objective
In this master thesis, you will investigate whether Design search tools indeed suffer from this 2D/3D semantic gap, and if they are able to accurately implement the legal rules pertaining to this field. Additionnally, you will inform on whether judicial authorities do indeed base themselves on the 3D view, or if the 2D views are sufficient for those tasks.
Methods
You will develop different AI models capable of inferring 2D views from a 3D mesh object (1), of creating a 3D mesh from different 2D views (2), and finally, implement a research protocol to test the hypotheses.
Prerequisite
- Machine Learning / Deep Learning
- Python
Contact person
For more information please contact : Thomas.Vandamme@ulb.be
- Evaluation of a Feature-Based Registration Pipeline for Whole-Slide Images
-
Promotor, co-promotor, advisor : olivier.debeir@ulb.be, adrien.foucart@ulb.be, arthur.elskens@ulb.be
Research Unit : LISA-IMAGE
Description
Evaluation of a Feature-Based Registration Pipeline for Whole-Slide Images
This project focuses on the evaluation of various configurations of a feature-based registration pipeline, with particular attention to their robustness under varying initial conditions in the domain of Whole-Slide Image (WSI) registration.
Context
In digital pathology, the integration of information from multiple WSIs is often required to obtain a comprehensive understanding of complex biological processes, such as cancer development and progression. This integration requires accurate spatial alignment of corresponding tissue regions across WSIs, a process referred to as WSI registration. Due to challenges such as image artefacts, varying staining, and large tissue deformations, WSI registration remains a difficult problem in computer vision.
Recent challenges, such as ANHIR [Borovec2020] and ACROBAT [Weitz2024], have brought attention to the state-of-the-art in WSI registration. These challenges evaluate the performance of registration algorithms on real-world datasets comprising multi-stained slides, such as Hematoxylin & Eosin and Immunohistochemistry. Successful approaches generally follow a two-stage process: an initial low-resolution rigid or affine alignment, followed by a high-resolution non-rigid (deformable) registration. This two-step approach effectively constrains the parameter space within which the non-linear elastic transformation is subsequently estimated, thereby significantly improving the quality of the final alignment.
Among the strategies for estimating initial alignment, feature-based methods have emerged as the most widely used, with six out of eight top-performing teams in the ACROBAT challenge employing such techniques [Weitz2024]. Feature-based registration typically involves three key stages: (i) pre-processing, (ii) local feature extraction, and (iii) robust matching. Each of these stages can be performed using a variety of algorithms, ranging from traditional approaches to deep learning-based methods [Marzahl2021, Gatenbee2023, Elskens2023, Wodzinski2024, Elskens2025].
Objectives
The primary objective of this project is to:
- Evaluate different configurations (robust matching stage) of feature-based registration pipeline.
- Assess the robustness of different configurations of feature-based registration pipelines to diverse initial conditions, including large spatial displacements and staining artefacts.
Methods
The project will involve a comparative study of several state-of-the-art feature detection and matching algorithms found in recent literature. These may include: SuperPoint [DeTone2018], LightGlue [Lindenberger2023], LoFTR [Sun2021] and OmniGlue [Jiang2024]. Each algorithm will be implemented and evaluated under controlled experimental settings.
A novel evaluation protocol will be developed, grounded in existing metrics from the literature while integrating innovative criteria specifically tailored to the tasks explored in this project. The evaluation will include, but not be limited to, assessments of robustness to large displacements (e.g., rotation, translation, and shear) as well as the ability of each pipeline configuration to effectively filter outliers during robust matching. The goal is to design a comprehensive and reproducible benchmarking approach focused on critical aspects of feature-based registration pipelines within the context of digital pathology.
Prerequisites
Candidates should have:
- A solid foundation in Python programming
- A willingness to work with containerization tools such as Docker (prior experience is a plus but not mandatory)
- Successfully completed INFO-H500 or an equivalent course in image processing, computer vision, or machine learning
Contact person
For further information or to express interest in this project, please contact: arthur.elskens@ulb.be, adrien.foucart@ulb.be and olivier.debeir@ulb.be.
References
[Borovec2020] Borovec, J., Kybic, J., Arganda-Carreras, I., Sorokin, D.V., Bueno, G., Khvostikov, A.V., . . . Munoz-Barrutia, A. (2020, October). ANHIR: Automatic Non-Rigid Histological Image Registration Challenge IEEE Transactions on Medical Imaging, 39 (10), 3042-3052, https://doi.org/10.1109/TMI.2020.2986331
[Weitz2024] Weitz, P., Valkonen, M., Solorzano, L., Carr, C., Kartasalo, K., Boissin, C., . . . Rantalainen, M. (2024, October). The ACROBAT 2022 challenge: Automatic registration of breast cancer tissue. Medical Image Analysis, 97, 103257, https://doi.org/10.1016/j.media.2024.103257
[Marzahl2021] Marzahl, C., Wilm, F., Dressler, F., Tharun, L., Perner, S., Bertram, C., . . . Breininger, K. (2021). Robust Quad-Tree based Registration on Whole Slide Images. Proceedings of Machine Learning Research (Vol. 156, pp. 181-190).
[Gatenbee2023] Gatenbee, C.D., Baker, A.-M., Prabhakaran, S., Swinyard, O., Slebos, R.J.C., Mandal, G., . . . Anderson, A.R.A. (2023, July). Virtual alignment of pathology image series for multi-gigapixel whole slide images. Nature Communications, 14 (1), 4502, https://doi.org/10.1038/s41467-023-40218-9
[Elskens2023] Elskens, A., Foucart, A., Zindy, E., Debeir, O., Decaestecker, C. (2023, November). Assessing Local Descriptors for Feature-Based Registration of Whole-Slide Images. 2023 19th International Symposium on Medical Information Processing and Analysis (SIPAIM) (pp. 1-4). Mexico City, Mexico: IEEE. https://doi.org/10.1109/SIPAIM56729.2023.10373514
[Wodzinski2024] Wodzinski, M., Marini, N., Atzori, M., Müller, H. (2024, June). RegWSI: Whole slide image registration using combined deep feature- and intensity-based methods: Winner of the ACROBAT 2023 challenge. Computer Methods and Programs in Biomedicine, 250, 108187, https://doi.org/10.1016/j.cmpb.2024.108187
[Elskens2025] Elskens, A., Foucart, A., Debeir, O., Decaestecker, C. (2025, May). Impact of Pre processing and Local Feature Extraction on Feature Based Registration of Whole Slide Images. Preprint at http://dx.doi.org/10.13140/RG.2.2.29399.79521
[DeTone2018] DeTone, D., Malisiewicz, T., Rabinovich, A. (2018, June). SuperPoint: Self-Supervised Interest Point Detection and Description. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (pp. 337-33712). Salt Lake City, UT, USA: IEEE. https://doi.org/10.1109/CVPRW.2018.00060
[Lindenberger2023] Lindenberger, P., Sarlin, P.-E., Pollefeys, M. (2023, October). LightGlue: Local Feature Matching at Light Speed. 2023 IEEE/CVF International Conference on Computer Vision (ICCV) (pp. 17581-17592). https://doi.org/10.1109/ICCV51070.2023.01616
[Sun2021] Sun, J., Shen, Z., Wang, Y., Bao, H., Zhou, X. (2021, June). LoFTR: Detector-Free Local Feature Matching with Transformers. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 8918-8927). Nashville, TN, USA: IEEE. https://doi.org/10.1109/CVPR46437.2021.00881
[Jiang2024] Jiang, H., Karpur, A., Cao, B., Huang, Q., Araujo, A. (2024, June). OmniGlue: Generalizable Feature Matching with Foundation Model Guidance. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 19865-19875). Seattle, WA, USA: IEEE. https://doi.org/10.1109/CVPR52733.2024.01878
- Automated Interpretation of P&ID Drawings
-
Promotor, co-promotor, advisor : olivier.debeir@ulb.be, feras.almasri@ulb.be,
Research Unit : LISA-IMAGE
Description
Automated Interpretation of P&ID Drawings
The project aims to address a key challenge in industrial automation by developing AI and computer vision methods to automatically interpret and digitize Piping and Instrumentation Diagrams (P&IDs).
Context
The project is carried out in collaboration with Engie and TRACTEBEL.
Annotated P&ID drawings are already available for model training and validation.
Objective
The objective is to develop a robust pipeline capable of detecting and recognizing symbols and text, associating them based on their spatial and contextual relationships, and detecting and tracking lines within P&ID drawings. The ultimate goal is to produce a structured digital representation of the diagrams for use in automation and further analysis.
Methods
Various methods will be explored, including convolutional neural networks (CNNs) for symbol and text recognition, and specialized algorithms for line and path detection. Post-processing techniques will be designed to accurately associate and structure the extracted elements.
Prerequisite
Image processing
Deep Learning
Python
Contact person
For more information please contact : feras.alamsri@ulb.be, olivier.debeir@ulb.be
- User Behavior Analysis via Tweet Detection and Interaction Tracking in Mobile Screen Recordings
-
Promotor, co-promotor, advisor : olivier.debeir@ulb.be, feras.almasri@ulb.be,
Research Unit : LISA-IMAGE
Description
User Behavior Analysis via Tweet Detection and Interaction Tracking in Mobile Screen Recordings
The project aims to explore user interaction patterns on social media by detecting tweets in screen recordings, recognizing tweet content, and tracking user actions such as clicks and follows.
Context
The project is conducted in collaboration with the faculty of psychology.
A dataset of mobile screen recordings is available, capturing real user interactions on platforms like Twitter.
### Objective
The goal is to develop a system that can detect tweets within mobile screen recordings, apply text recognition to extract tweet content, and track user interactions (e.g., clicks, scrolling, and follows). This information will help study behavior patterns and engagement strategies on social media.
### Methods
Multiple techniques will be tested, including object detection for tweet localization, OCR methods for text recognition, and temporal tracking algorithms to monitor user interactions over time. Event detection and classification will also be explored to analyze engagement behavior.
Prerequisite
Image processing
Deep Learning
Python
Flask/FAST API
Docker
Contact person
For more information please contact: feras.almasri@ulb.be, olivier.debeir@ulb.be
- Multimodal Deep Learning for Deauville Score Prediction in Lymphoma Using [18F]FDG-PET/CT Imaging and Clinical Reports
-
Promotor, co-promotor, advisor : olivier.debeir@ulb.be, erwin.woff@ulb.be,
Research Unit : LISA-IMAGE
Description
Multimodal Deep Learning for Deauville Score Prediction in Lymphoma Using [18F]FDG-PET/CT Imaging and Clinical Reports
Subject:
Advancing Automated Deauville Scoring Through Vision-Language Models
Description:
Context and Aim of the Project:
The Deauville Score (DS) is a critical tool in the assessment of treatment response in Hodgkin and non-Hodgkin lymphoma, based on [18F]FDG-PET/CT imaging. In current clinical practice, DS assignment is done visually by physicians, which is time-consuming and subject to interobserver variability. While deep learning models trained on PET maximum intensity projections (MIPs) have shown strong potential for binary classification of DS (1â3 vs. 4â5), challenges remain in achieving higher granularity, robustness, and clinical interpretability.
Recent advances in large language models and domain-adapted transformers (e.g., BioClinicalBERT, RadBERT) have shown promise in extracting Deauville scores directly from nuclear medicine reports (Hueman et al., 2023). Building on this, the goal of this project is to explore multimodal deep learning architectures that integrate PET images and their corresponding medical reports to enhance Deauville score classification. However, using image-only or text-only architectures remains a viable and simpler baseline, the student will have the flexibility to pursue a purely visual or a multimodal path depending on interest and feasibility.
Ultimately, the core objective is to improve the robustness and granularity of Deauville score predictions, potentially moving from binary to full five-class classification. The thesis also opens opportunities to explore interpretability techniques, such as lesion localization or clinical explanation generation, and sets the stage for external validation and eventual clinical deployment.
Objectives:
Improve Deauville score prediction:
- Move beyond binary classification toward full five-class DS prediction by combining PET imaging and text reports.
Integrate vision and language modalities:
- Extend the current framework with transformer-based encoders (e.g., BioClinicalBERT) for the report and CNN or ViT-based encoders for the image, evaluating different fusion strategies. Pretrained models from Huemann et al. are available as a starting point.
Enhance interpretability and clinical insight (bonus)
- Add auxiliary tasks such as lesion localization or lesion description generation to improve model explainability and potentially accuracy.
Evaluate Generalization and Clinical Usefulness:
- Possibility to test the model performance on potential external cohorts for validation.
- Compare predictions with expert-assigned scores and evaluate agreement.
Methodology:
- Use existing anonymized FDG-PET/CT data and matched reports from the Institut Jules Bordet.
- Preprocess MIP images and radiology reports using standardized pipelines.
- Fine-tune pre-trained BERT-based models on nuclear medicine language, guided by work such as Huemann et al. (2023) or directly use their domain adapted model available on Hugging face.
- Experiment with text and image-text models using MIP images and transformer-encoded reports.
- Evaluate performance on internal and (if available) external validation cohorts.
Available Resources:
- Annotated dataset with Deauville scores and clinical reports (in French).
- Existing codebase and models from the prior thesis with preprocessing scripts and baseline models for PET MIPs (available on GitHub: Alichnikof/Deauville_DeepLearning).
- Huemann et al.âs code for report processing and fine-tuning (zhuemann/NuclearMedicineDomain_Adaptation).
- Preprocessed anonymized dataset with corresponding clinical reports.
- Scripts for visualization, evaluation, and data augmentation.
- Potential future datasets enabling external validation.
Expected Outcomes:
- A refined and possibly multimodal AI model for Deauville score classification.
- Deeper understanding of the complementarity of PET imaging and narrative reports.
- Foundations for future applications in AI-assisted lymphoma response assessment and model interpretability.
Supervision and Collaboration:
This project will be conducted under the supervision of Prof. Olivier Debeir (ULB) and Prof. Erwin Woff (Institut Jules Bordet), fostering interdisciplinary collaboration between biomedical engineering and nuclear medicine.
Key References
- Mezher A. Deep Learning for Binary Deauville Scoring in Lymphoma [18F]FDG-PET/CT: Transfer Learning and External Validation, ULB 2025.
- Huemann Z. et al., Domain-adapted large language models for classifying nuclear medicine reports, NPJ Digit. Med. 2023.
- Häggström I. et al., Deep learning for [18F]FDG-PET/CT classification in patients with lymphoma, Lancet Digit. Health, 2023.
- Transformers and Attention Maps for Semantic Analysis of Vectorized Engineering Blueprints
-
Promotor, co-promotor, advisor : olivier.debeir@ulb.be, - ,
Research Unit : LISA-IMAGE
Description
Transformers and Attention Maps for Semantic Analysis of Vectorized Engineering Blueprints
Context
This project addresses an ongoing challenge in the field of automatic document analysis. While Optical Character Recognition (OCR) is now widely regarded as a solved problem, the automated interpretation of more complex documentsâparticularly technical and engineering drawingsâremains difficult.
Recent advances in machine learning have enabled higher-level abstraction in the analysis of image-based documents. Most existing methods for blueprint interpretation adopt raster-based approaches, analyzing 2D pixel representations of documents. However, many technical drawings are also available in vector format (e.g., as embedded objects in PDF files). Although these vector representations are typically designed for visualization and printing, and thus contain limited semantic metadata, they may offer structural advantages that can be exploited for semantic analysis.
Objective
The primary objective of this project is to evaluate the potential advantages of vector-based analysis over traditional raster-based approaches in the semantic interpretation of engineering blueprints.
Methods
A recent study by Carrara et al. [Carrara2024] proposes a novel vector-based framework leveraging Convolutional Neural Networks (CNNs) and Graph Attention Networks (GATs). This project will build upon that work to further investigate the role of attention mechanisms and transformer architectures in processing vectorized technical documents.
A publicly available annotated dataset will serve as the foundation for experiments. Additional real-world examples provided through industrial collaboration will be used for further testing and validation.
Prerequisites
- Proficiency in Python programming
- Knowledge of deep neural networks (DNNs) and attention-based models
Contact
For further information, please contact: olivier.debeir@ulb.be
References
Carrara, Andrea, Stavros Nousias, and André Borrmann. âVectorGraphNET: Graph Attention Networks for Accurate Segmentation of Complex Technical Drawings.â arXiv preprint arXiv:2410.01336 (2024). https://arxiv.org/pdf/2410.01336
- GDPR-Compliant People Counting Device
-
Promotor, co-promotor, advisor : olivier.debeir@ulb.be, - ,
Research Unit : LISA-IMAGE
Description
GDPR-Compliant People Counting Device
Context
This project addresses the challenge of estimating the real-time occupancy of public indoor spaces, such as auditoriums or laboratory rooms.
Several methods for people counting currently exist. Some rely on the detection and tracking of mobile devices, while others use machine learning techniques for direct visual detection of individuals. Due to significant privacy concerns associated with monitoring technologies, any proposed solution must adhere strictly to legal and ethical standards, particularly those outlined by the General Data Protection Regulation (GDPR).
Objective
The goal of this project is to identify and implement an appropriate machine learning model for real-time people detection in indoor environments. The focus will be on ensuring compliance with privacy regulations while maintaining high accuracy under real-world constraints.
The project involves both hardware prototyping and software system integration, targeting a complete, embedded solution for occupancy estimation.
Methods
Numerous deep learning models for human detection are available, as discussed in [Zhang2016]. The main challenge will be to select and optimize a model that performs reliably under various real-world conditions, such as differing lighting environments, variable image resolutions, and limited computational resources available on embedded platforms.
Prerequisites
- Proficiency in Python
- Understanding of deep neural networks (DNNs)
- Familiarity with containerization tools such as Docker
Contact
For more information, please contact: olivier.debeir@ulb.be
References
Zhang, Kaipeng, et al. âJoint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks.â IEEE Signal Processing Letters 23.10 (2016): 1499â1503. https://ieeexplore.ieee.org/abstract/document/7553523
- Benchmarking Compression Techniques for Multi-Layers 3D Video
-
Promotor, co-promotor, advisor : gauthier.lafruit@ulb.be, - , Eline Soetens
Research Unit : LISA-VR
Description
Benchmarking Compression Techniques for Multi-Layers 3D Video
The project aims to evaluate the performance of existing compression techniques when applied to multi-layer video content used in tensor displays.
Context
Tensor displays are glass-free 3D displays composed of multiple stacked LCD panels. A user in front of the display will see 3D content with full parallax thanks to the layered structure. However, each 3D frame requires one 2D image per layer, resulting in significantly higher data volume compared to standard 2D video. Existing compression methods were not designed specifically for this format and traditional metrics (PSNR) might not accurately reflect perceived quality in multi-layers 3D video.
Objective
Compare and evaluate existing 2D and 3D video compression methods when applied to multi-layer video intended for tensor displays. Define or select a set of evaluation metrics that better capture the perceived visual quality on tensor displays.
Key goals:
Evaluate the efficiency, quality, and suitability of different codecs.
Investigate how well these codecs preserve depth perception and visual fidelity in multi-layer 3D video.
Propose relevant and multi-dimensional performance metrics for such evaluations.
Methods
Different methods are to be tested including : * HEVC (standard 2D compression ) [1] * VVC multi-layers (layer-aware compression) [2] * MIV (multi-view compression) [3]
Evaluation metrics will be defined during the thesis, it might include:
- Rate-distortion analysis (bitrate vs. PSNR)
- Structural and perceptual metrics: SSIM, MS-SSIM, VMAF
- Temporal consistency metrics
Experiments will use standard datasets and reference implementations.
Prerequisite
- Programming in C++ and scripting languages (e.g., Python)
- Recommended but not mandatory : familiarity with video coding pipelines and concepts (e.g., INFO-H516 â Visual Media Compression)
Contact person
Eline Soetens (supervisor) : eline.soetens@ulb.be
Bibliography
[1] Sullivan, Gary J., Jens-Rainer Ohm, Woo-Jin Han, et Thomas Wiegand. « Overview of the High Efficiency Video Coding (HEVC) Standard ». IEEE Transactions on Circuits and Systems for Video Technology 22, náµ 12 (décembre 2012): 1649â68. https://doi.org/10.1109/TCSVT.2012.2221191.
[2] Bross, Benjamin, Ye-Kui Wang, Yan Ye, Shan Liu, Jianle Chen, Gary J. Sullivan, et Jens-Rainer Ohm. « Overview of the Versatile Video Coding (VVC) Standard and its Applications ». IEEE Transactions on Circuits and Systems for Video Technology 31, náµ 10 (octobre 2021): 3736â64. https://doi.org/10.1109/TCSVT.2021.3101953.
[3] « Reference software â MPEG Immersive video (MIV) ». https://mpeg-miv.org/index.php/reference-software/.