Please use this identifier to cite or link to this item:
https://olympias.lib.uoi.gr/jspui/handle/123456789/39237
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Papazis, Stergios | en |
dc.contributor.author | Παπάζης, Στέργιος | el |
dc.date.accessioned | 2025-07-29T07:15:06Z | - |
dc.date.available | 2025-07-29T07:15:06Z | - |
dc.identifier.uri | https://olympias.lib.uoi.gr/jspui/handle/123456789/39237 | - |
dc.rights | Attribution-NonCommercial 3.0 United States | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc/3.0/us/ | * |
dc.subject | Computer vision | en |
dc.subject | Keyword spotting | en |
dc.subject | Handwritten document analysis | en |
dc.subject | Segmentation-free retrieval | en |
dc.subject | Vision-language models | en |
dc.subject | Re-ranking | en |
dc.subject | Semantic relevance feedback | en |
dc.subject | Semantic embeddings | en |
dc.subject | Zero-shot learning | en |
dc.title | Post-Retrieval Semantic Re-Ranking via Zero-shot LLMs for Segmentation-Free Document Image Keyword Spotting | en |
dc.title | Σημασιολογική αναδιάταξη αποτελεσμάτων εντοπισμού λέξεων χωρίς Κατάτμηση σε Εικόνες Χειρογράφων με Μεγάλα Γλωσσικά Μοντέλα | el |
dc.type | masterThesis | en |
heal.type | masterThesis | el |
heal.type.en | Master thesis | en |
heal.type.el | Μεταπτυχιακή εργασία | el |
heal.classification | Computer Vision | en |
heal.classification | Content-based Visual Information Retrieval | en |
heal.classification | Handwritten Document Analysis | en |
heal.classification | Keyword Spotting | en |
heal.dateAvailable | 2025-07-29T07:16:06Z | - |
heal.language | en | el |
heal.access | free | el |
heal.recordProvider | Πανεπιστήμιο Ιωαννίνων. Πολυτεχνική Σχολή | el |
heal.publicationDate | 2025-07-20 | - |
heal.bibliographicCitation | S. Papazis, "Post-Retrieval Semantic Re-Ranking via Zero-shot LLMs for Segmentation-Free Document Image Keyword Spotting", M.S. thesis, Dept. of Computer Science and Engineering, Univ. of Ioannina, 2025. | en |
heal.abstract | The digitization of historical handwritten documents plays a crucial role in their preservation. With preservation largely addressed, the focus has shifted toward enhancing accessibility. This has led to the development of Keyword Spotting (KWS), a Content-Based Image Retrieval (CBIR) task that retrieves and ranks word images based on their similarity to a given query, without requiring prior transcription. Traditional KWS methods typically treat words as visual patterns, relying solely on appearance-based features and often neglecting their underlying semantic content. Even when semantics are considered, it is often within the segmentation-based setting, which assumes prior word-level segmentation, a non-trivial and error-prone requirement, particularly for historical manuscripts. To address these limitations, we propose a novel unsupervised mechanism for semantic relevance feedback to re-rank the initial output of segmentation-free KWS systems. Our approach operates in three stages: (1) decoding the retrieved word images into text using a neural decoder; (2) projecting the transcriptions into a semantic space using pre-trained transformer-based language models such as RoBERTa, MPNet, and MiniLM, where semantic similarity is measured by cosine distance; (3) re-ranking the retrieved items by combining visual and semantic similarity. We evaluate our method on the widely used historical George Washington (GW) dataset and the modern IAM Handwriting Database (IAM), using the retrieval ranked lists of two cutting-edge segmentation-free KWS baseline models. We further assess the performance across two decoder architectures and two naive fusion strategies through an extensive ablative analysis. Numerical results show consistent improvements in Mean Average Precision (mAP) across all tested configurations, with gains of up to +2.3% (from 94.31% to 96.59%) on GW and +3% (from 79.15% to 82.12%) on IAM. Notably, even in scenarios with minimal mAP improvement, we observe significant qualitative gains: semantically relevant but inexact matches are retrieved more frequently. This behavior, known as semantic KWS, is particularly beneficial in real-world scenarios wherein users may not know beforehand the precise query needed to locate relevant content. These findings demonstrate the effectiveness of incorporating semantic feedback from large language models into visual keyword spotting pipelines. By complementing appearance-based retrieval with NLP-driven semantic re-ranking, our approach enables more flexible and meaningful document search, even in challenging segmentation-free settings. Moreover, it highlights the potential of hybrid vision-language models to advance document image analysis, especially for noisy, heterogeneous, or low-resource historical archives. | en |
heal.advisorName | Nikou, Christophoros | en |
heal.committeeMemberName | Lykas, Aristeidis | en |
heal.committeeMemberName | Blekas, Konstantinos | en |
heal.academicPublisher | Πανεπιστήμιο Ιωαννίνων. Πολυτεχνική Σχολή. Τμήμα Μηχανικών Ηλεκτρονικών Υπολογιστών και Πληροφορικής | el |
heal.academicPublisherID | uoi | el |
heal.numberOfPages | 79 | el |
heal.fullTextAvailability | true | - |
Appears in Collections: | Διατριβές Μεταπτυχιακής Έρευνας (Masters) - ΜΗΥΠ |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
MSc Papazis 2025.pdf | Master thesis | 3.55 MB | Adobe PDF | View/Open |
This item is licensed under a Creative Commons License