Please use this identifier to cite or link to this item: https://olympias.lib.uoi.gr/jspui/handle/123456789/39237
Full metadata record
DC FieldValueLanguage
dc.contributor.authorPapazis, Stergiosen
dc.contributor.authorΠαπάζης, Στέργιοςel
dc.date.accessioned2025-07-29T07:15:06Z-
dc.date.available2025-07-29T07:15:06Z-
dc.identifier.urihttps://olympias.lib.uoi.gr/jspui/handle/123456789/39237-
dc.rightsAttribution-NonCommercial 3.0 United States*
dc.rights.urihttp://creativecommons.org/licenses/by-nc/3.0/us/*
dc.subjectComputer visionen
dc.subjectKeyword spottingen
dc.subjectHandwritten document analysisen
dc.subjectSegmentation-free retrievalen
dc.subjectVision-language modelsen
dc.subjectRe-rankingen
dc.subjectSemantic relevance feedbacken
dc.subjectSemantic embeddingsen
dc.subjectZero-shot learningen
dc.titlePost-Retrieval Semantic Re-Ranking via Zero-shot LLMs for Segmentation-Free Document Image Keyword Spottingen
dc.titleΣημασιολογική αναδιάταξη αποτελεσμάτων εντοπισμού λέξεων χωρίς Κατάτμηση σε Εικόνες Χειρογράφων με Μεγάλα Γλωσσικά Μοντέλαel
dc.typemasterThesisen
heal.typemasterThesisel
heal.type.enMaster thesisen
heal.type.elΜεταπτυχιακή εργασίαel
heal.classificationComputer Visionen
heal.classificationContent-based Visual Information Retrievalen
heal.classificationHandwritten Document Analysisen
heal.classificationKeyword Spottingen
heal.dateAvailable2025-07-29T07:16:06Z-
heal.languageenel
heal.accessfreeel
heal.recordProviderΠανεπιστήμιο Ιωαννίνων. Πολυτεχνική Σχολήel
heal.publicationDate2025-07-20-
heal.bibliographicCitationS. Papazis, "Post-Retrieval Semantic Re-Ranking via Zero-shot LLMs for Segmentation-Free Document Image Keyword Spotting", M.S. thesis, Dept. of Computer Science and Engineering, Univ. of Ioannina, 2025.en
heal.abstractThe digitization of historical handwritten documents plays a crucial role in their preservation. With preservation largely addressed, the focus has shifted toward enhancing accessibility. This has led to the development of Keyword Spotting (KWS), a Content-Based Image Retrieval (CBIR) task that retrieves and ranks word images based on their similarity to a given query, without requiring prior transcription. Traditional KWS methods typically treat words as visual patterns, relying solely on appearance-based features and often neglecting their underlying semantic content. Even when semantics are considered, it is often within the segmentation-based setting, which assumes prior word-level segmentation, a non-trivial and error-prone requirement, particularly for historical manuscripts. To address these limitations, we propose a novel unsupervised mechanism for semantic relevance feedback to re-rank the initial output of segmentation-free KWS systems. Our approach operates in three stages: (1) decoding the retrieved word images into text using a neural decoder; (2) projecting the transcriptions into a semantic space using pre-trained transformer-based language models such as RoBERTa, MPNet, and MiniLM, where semantic similarity is measured by cosine distance; (3) re-ranking the retrieved items by combining visual and semantic similarity. We evaluate our method on the widely used historical George Washington (GW) dataset and the modern IAM Handwriting Database (IAM), using the retrieval ranked lists of two cutting-edge segmentation-free KWS baseline models. We further assess the performance across two decoder architectures and two naive fusion strategies through an extensive ablative analysis. Numerical results show consistent improvements in Mean Average Precision (mAP) across all tested configurations, with gains of up to +2.3% (from 94.31% to 96.59%) on GW and +3% (from 79.15% to 82.12%) on IAM. Notably, even in scenarios with minimal mAP improvement, we observe significant qualitative gains: semantically relevant but inexact matches are retrieved more frequently. This behavior, known as semantic KWS, is particularly beneficial in real-world scenarios wherein users may not know beforehand the precise query needed to locate relevant content. These findings demonstrate the effectiveness of incorporating semantic feedback from large language models into visual keyword spotting pipelines. By complementing appearance-based retrieval with NLP-driven semantic re-ranking, our approach enables more flexible and meaningful document search, even in challenging segmentation-free settings. Moreover, it highlights the potential of hybrid vision-language models to advance document image analysis, especially for noisy, heterogeneous, or low-resource historical archives.en
heal.advisorNameNikou, Christophorosen
heal.committeeMemberNameLykas, Aristeidisen
heal.committeeMemberNameBlekas, Konstantinosen
heal.academicPublisherΠανεπιστήμιο Ιωαννίνων. Πολυτεχνική Σχολή. Τμήμα Μηχανικών Ηλεκτρονικών Υπολογιστών και Πληροφορικήςel
heal.academicPublisherIDuoiel
heal.numberOfPages79el
heal.fullTextAvailabilitytrue-
Appears in Collections:Διατριβές Μεταπτυχιακής Έρευνας (Masters) - ΜΗΥΠ

Files in This Item:
File Description SizeFormat 
MSc Papazis 2025.pdfMaster thesis3.55 MBAdobe PDFView/Open


This item is licensed under a Creative Commons License Creative Commons