From OCR to Quantitative Analysis: A Multi-Method Framework for AI-Driven Archive Enrichment and Accessibility Conference
Ευάγγελος Βαρθής - Μεταδιδακτορικός Ερευνητής, Τμήμα Αρχειονομίας, Βιβλιοθηκονομίας και Μουσειολογίας, Ιόνιο Πανεπιστήμιο
Τμήμα Αρχειονομίας, Βιβλιοθηκονομίας και Μουσειολογίας

Digital archives hold immense cultural value, yet their utility is often limited by unstructured data, linguistic barriers, and a lack of semantic metadata. This presentation proposes a comprehensive framework that integrates Artificial Intelligence, quantitative textual analysis, and human-in-the-loop crowdsourcing to overcome these barriers. We present three complementary case studies demonstrating this ecosystem: An AI-powered workflow for newspaper archives utilizing multilingual OCR and LLMs to enable automatic summarization, translation, and semantic search without infrastructure overhaul; A statistical analysis of the Patrologia Graeca, applying distributional modeling (Modified Zipf-Mandelbrot) and concentration laws (Bradford's Law) to map biblical referencing patterns and theological affinities across centuries; A cost-effective crowdsourcing platform that combines AI error correction with user-generated metadata to create navigable Tables of Contents for challenging scanned documents. Together, these studies illustrate how hybrid methodologies—blending automated computational power with rigorous quantitative methods and human insight—can significantly enhance the accessibility, usability, and scientific understanding of historical texts. The session concludes with a roadmap for implementing these tools in library and academic settings to promote interdisciplinary collaboration.

gr  pdf.png  Αρχείο αφίσας Ε. Βάρθη
Mέγεθος: 2.23 MB :: Τύπος: Αρχείο PDF

Πρόγραμμα Έκθεση Συμμετέχοντες