Interactive Web System for Population Medical Data Analysis
Authorship
S.I.M.G.
University Master in Computer Vision
S.I.M.G.
University Master in Computer Vision
Defense date
02.04.2026 10:30
02.04.2026 10:30
Summary
This work presents a web-based platform for the visualization, analysis, and segmentation of medical imaging and structured clinical data. The system integrates multiple modules, including statistical analysis, interactive image viewing, image management, and automated segmentation, to support exploratory research workflows. The experimental results confirm the correct operation of all modules and demonstrate the platform’s ability to handle heterogeneous medical data in an interactive and user-friendly manner. The overall design is modular and flexible, allowing the system to be extended and adapted for future research and potential clinical use.
This work presents a web-based platform for the visualization, analysis, and segmentation of medical imaging and structured clinical data. The system integrates multiple modules, including statistical analysis, interactive image viewing, image management, and automated segmentation, to support exploratory research workflows. The experimental results confirm the correct operation of all modules and demonstrate the platform’s ability to handle heterogeneous medical data in an interactive and user-friendly manner. The overall design is modular and flexible, allowing the system to be extended and adapted for future research and potential clinical use.
Direction
NUÑEZ GARCIA, MARTA (Tutorships)
NUÑEZ GARCIA, MARTA (Tutorships)
Court
GARCIA TAHOCES, PABLO (Chairman)
BREA SANCHEZ, VICTOR MANUEL (Secretary)
López Martínez, Paula (Member)
GARCIA TAHOCES, PABLO (Chairman)
BREA SANCHEZ, VICTOR MANUEL (Secretary)
López Martínez, Paula (Member)
Few-Shot Segmentation for Medical Imaging Using Foundation Models
Authorship
J.M.G.D.
University Master in Computer Vision
J.M.G.D.
University Master in Computer Vision
Defense date
02.04.2026 10:10
02.04.2026 10:10
Summary
Medical image segmentation is a critical prerequisite for diagnosis and treatment planning. While supervised deep learning models have established state-of-the-art performance, they suffer from a heavy reliance on large-scale, pixel-level annotated datasets. This dependency is a significant bottleneck in medical imaging due to the scarcity of expert annotations and the heterogeneity of image modalities. This thesis proposes a novel Few-Shot Segmentation (FSS) framework designed to address these challenges by leveraging Foundation Models (FMs). The proposed method combines the robust feature extraction of the self-supervised DINOv3 with the Segment Anything Model 3 (SAM 3) final boundary refinement. We evaluate this framework across five distinct medical imaging datasets. The experimental results demonstrate that our approach not only generalizes better to unseen classes in low-data scenarios but also surpasses the Dice similarity coefficient of the standard supervised U-Net, marking a significant step forward in label-efficient medical image analysis.
Medical image segmentation is a critical prerequisite for diagnosis and treatment planning. While supervised deep learning models have established state-of-the-art performance, they suffer from a heavy reliance on large-scale, pixel-level annotated datasets. This dependency is a significant bottleneck in medical imaging due to the scarcity of expert annotations and the heterogeneity of image modalities. This thesis proposes a novel Few-Shot Segmentation (FSS) framework designed to address these challenges by leveraging Foundation Models (FMs). The proposed method combines the robust feature extraction of the self-supervised DINOv3 with the Segment Anything Model 3 (SAM 3) final boundary refinement. We evaluate this framework across five distinct medical imaging datasets. The experimental results demonstrate that our approach not only generalizes better to unseen classes in low-data scenarios but also surpasses the Dice similarity coefficient of the standard supervised U-Net, marking a significant step forward in label-efficient medical image analysis.
Direction
VILA BLANCO, NICOLAS (Tutorships)
CORES COSTA, DANIEL (Co-tutorships)
VILA BLANCO, NICOLAS (Tutorships)
CORES COSTA, DANIEL (Co-tutorships)
Court
GARCIA TAHOCES, PABLO (Chairman)
BREA SANCHEZ, VICTOR MANUEL (Secretary)
López Martínez, Paula (Member)
GARCIA TAHOCES, PABLO (Chairman)
BREA SANCHEZ, VICTOR MANUEL (Secretary)
López Martínez, Paula (Member)
Monocular 6D Object Pose Estimation for Mixed Reality Applications
Authorship
D.P.
University Master in Computer Vision
D.P.
University Master in Computer Vision
Defense date
02.04.2026 09:50
02.04.2026 09:50
Summary
This masters thesis report presents a complete pipeline for the real-time 6D object pose estimation on a standalone mixed reality hardware, specifically the Meta Quest 3 headset. Unlike traditional 2D object detection, 6D pose estimation recovers both 3D position and 3D orientations of an object. It enables presice spatial understanding that is crucial for mixed reality applications, such as assembly guidance, accessibility and entertainment. This work leverages the recently released Meta's Passthrough Camera API to implement computer vision tasks directly on the device. The proposed system consists of three main components: (1) a procedural synthetic data generation pipeline utilizing Python and Blender to create photorealistic training images with pixel-perfect 6D annotations; (2) an implementation of the lightweight YOLOX-6D-Pose architecture optimized for edge inference; and (3) a Unity-based mixed reality application using the Unity Sentis inference engine. Experimental results demonstrate a successful Sim-to-Real transfer, achieving a BOP Mean Average Recall (AR) of 62.79% on real-world data without using any real training images. Ablation study confirms that domain randomization is important and improves performance by over 12%. Furthermore, dynamic INT8 quantization reduced the model size by around 75% and inference latency to 201ms with very little accuracy loss. This work validates the possibility of performing 6D pose estimation on consumer VR headsets, opening the way for spatially aware MR applications in many different applications.
This masters thesis report presents a complete pipeline for the real-time 6D object pose estimation on a standalone mixed reality hardware, specifically the Meta Quest 3 headset. Unlike traditional 2D object detection, 6D pose estimation recovers both 3D position and 3D orientations of an object. It enables presice spatial understanding that is crucial for mixed reality applications, such as assembly guidance, accessibility and entertainment. This work leverages the recently released Meta's Passthrough Camera API to implement computer vision tasks directly on the device. The proposed system consists of three main components: (1) a procedural synthetic data generation pipeline utilizing Python and Blender to create photorealistic training images with pixel-perfect 6D annotations; (2) an implementation of the lightweight YOLOX-6D-Pose architecture optimized for edge inference; and (3) a Unity-based mixed reality application using the Unity Sentis inference engine. Experimental results demonstrate a successful Sim-to-Real transfer, achieving a BOP Mean Average Recall (AR) of 62.79% on real-world data without using any real training images. Ablation study confirms that domain randomization is important and improves performance by over 12%. Furthermore, dynamic INT8 quantization reduced the model size by around 75% and inference latency to 201ms with very little accuracy loss. This work validates the possibility of performing 6D pose estimation on consumer VR headsets, opening the way for spatially aware MR applications in many different applications.
Direction
FLORES GONZALEZ, JULIAN CARLOS (Tutorships)
Glowacki , David Ryan (Co-tutorships)
FLORES GONZALEZ, JULIAN CARLOS (Tutorships)
Glowacki , David Ryan (Co-tutorships)
Court
GARCIA TAHOCES, PABLO (Chairman)
BREA SANCHEZ, VICTOR MANUEL (Secretary)
López Martínez, Paula (Member)
GARCIA TAHOCES, PABLO (Chairman)
BREA SANCHEZ, VICTOR MANUEL (Secretary)
López Martínez, Paula (Member)
Scanpath Prediction from Implicit Cues in Noisy Gaze Data
Authorship
L.U.F.
University Master in Computer Vision
L.U.F.
University Master in Computer Vision
Defense date
02.04.2026 09:30
02.04.2026 09:30
Summary
Eye-tracking is a vital tool for psychological and psychophysiological research; however, obtaining reliable data typically requires expensive equipment and controlled laboratory environments. While more affordable alternatives have been developed, they often lack the precision and sampling rates necessary for rigorous scientific study. In this work, we propose a model that integrates noisy, low-sampling-rate eye-tracking data with stimulus image features to reconstruct sequences of fixation centroids and their corresponding durations. Our approach aims to produce data that maintains the statistical properties of high-end tracking systems. To achieve this, we utilized the CocoFreeView dataset to generate realistic eye-tracking samples and developed a noise model that simulates the characteristics of widely used commercial eye-trackers. Finally, we leverage a Transformer-based architecture featuring a DINOv3 image encoder to recover the original fixation information.
Eye-tracking is a vital tool for psychological and psychophysiological research; however, obtaining reliable data typically requires expensive equipment and controlled laboratory environments. While more affordable alternatives have been developed, they often lack the precision and sampling rates necessary for rigorous scientific study. In this work, we propose a model that integrates noisy, low-sampling-rate eye-tracking data with stimulus image features to reconstruct sequences of fixation centroids and their corresponding durations. Our approach aims to produce data that maintains the statistical properties of high-end tracking systems. To achieve this, we utilized the CocoFreeView dataset to generate realistic eye-tracking samples and developed a noise model that simulates the characteristics of widely used commercial eye-trackers. Finally, we leverage a Transformer-based architecture featuring a DINOv3 image encoder to recover the original fixation information.
Direction
CORES COSTA, DANIEL (Tutorships)
CORES COSTA, DANIEL (Tutorships)
Court
GARCIA TAHOCES, PABLO (Chairman)
BREA SANCHEZ, VICTOR MANUEL (Secretary)
López Martínez, Paula (Member)
GARCIA TAHOCES, PABLO (Chairman)
BREA SANCHEZ, VICTOR MANUEL (Secretary)
López Martínez, Paula (Member)