Home Industry News Toppan Group develops AI-OCR to decipher medieval Greek manuscripts

Toppan Group develops AI-OCR to decipher medieval Greek manuscripts

Printing Museum, Tokyo to present research findings at special exhibition

59
Toppan
Transcribing medieval Greek to modern Greek using AI-OCR technology

Toppan Holdings and subsidiary Toppan have developed an AI-Optical Character Recognition (OCR) engine that can decipher medieval Greek, a type of writing that is generally difficult to understand.

Moving forward, thanks to a collaborative relationship with the Vatican Apostolic Library, the Printing Museum, Tokyo (operated by Toppan Holdings) plans to scan images and digitized text from Greek manuscripts belonging to the Vatican Library. These will be used as learning data to refine accuracy, with the goal of increasing Toppan’s AI-OCR recognition accuracy to more than 95%.

Visuals showing the findings of this approach will be displayed during the ‘How Masterpieces Are Born: Biblioteca Apostolica Vaticana â…˘+’ special exhibition running from 25 April 2026, at the Printing Museum, Tokyo.

Toppan’s AI-OCR engine development

Old texts hold a myriad of information regarding important historical facts and regional culture. However, many such texts are handwritten and are difficult to read for most people in the modern world. Correctly interpreting these texts to preserve culture is now an important challenge not only in Japan, but also on the global stage.

For the past 30 years or so, the Toppan Group has been collaborating with the Vatican Library on several projects to safeguard culture for future generations. During that time the Vatican Library has released over two million items from part of its collection to the public for research and educational purposes in the form of IIIF1 high-resolution images. 

A total of nine million images were released, and that number continues to grow. Additional information such as transcriptions2 and annotations has been added to some of the Greek manuscripts, but experts in medieval Greek would need to work for a long period of time to complete the entire collection.

Toppan has improved deciphering capabilities for cursive style classical Japanese texts, making script that is difficult for modern Japanese people to read more accessible. This work has supported research and utilization of valuable historical documents all over Japan. 

In 2015, Toppan started its research and development on historical Japanese cursive writing OCR using AI image recognition. Toppan also went on to collaborate with various research bodies and organize related events. Toppan continues to work on its classical Japanese reading service fuminoha launched in 2021, and its mobile application Komonjo Camera launched in 2023, a service that allows anyone to read classical Japanese easily.

With the knowledge and technology gained from working on historical Japanese cursive writing recognition projects such as these, Toppan was well-equipped to develop an AI-OCR engine with the capability of deciphering medieval Greek.

Toppan’s AI-OCR engine characteristics

Deciphering medieval Greek: Not only does the appearance of medieval Greek vary depending on the period and writer, but it can also be fragmented, and spelling can vary from modern Greek. As a result, there is no single standard form. Additionally, in some cases there is no space between the words, making the text extremely difficult for anyone today without expertise to read. However, by preparing a learning database of one million characters, Toppan’s AI-OCR engine has learned how to decipher the medieval Greek alphabet.

Data learning using Vatican Library resources: From among approximately 5,000 pieces of Greek manuscript held by the Vatican Library, Toppan is using 50 annotated items (approx. 400 IIIF images) and transcribed text as learning data. In addition to sophisticated learning using manuscript images and transcribed text, expert reviews are also part of the process. This creates a balance between striving to improve accuracy and ensuring quality in the deciphering process. With this approach Toppan hopes to achieve faster digitization of the enormous Greek manuscript collection while achieving an accuracy rate of over 95% transcribing medieval Greek using its AI-OCR engine.

Initiatives between Toppan & the Vatican Library

Toppan has collaborated with the Vatican Library on several projects over the past 29 years, reaching back to the preparatory stages of establishing the Printing Museum, Tokyo in 1997. Both parties have worked collaboratively on deciphering ancient text and passing down culture on various occasions, including creating a high-resolution digital archive of the Gutenberg 42-line Bible3, the Cicero Project4, and joint exhibitions5 at the Printing Museum, Tokyo. The third collaborative exhibition to be held at the Printing Museum, Tokyo, ‘How Masterpieces Are Born: Biblioteca Apostolica Vatican â…˘+’ will begin 25 April 2026. Visuals of OCR being used for Greek will be displayed alongside items on loan from the Vatican Library.

Future Activities

By building an advanced AI-OCR engine capable of transcribing medieval Greek, a form of writing that is difficult to decipher for non-experts, the Toppan Group is stimulating and contributing to the advancement of Greek language research. Also, by creating digital archives of documents primarily for the Vatican Library but also including cultural heritage worldwide, Toppan hopes to advance technological innovation to make these items accessible to people all over the world, while also helping to pass them on to the next generation.

The fastest growing democracy in the world could be a market for your products !

If you are confused by slow and poor sales to a seemingly large but immensely noisy and fragmented market, you are not alone! If your product is great, or viable, or appropriate, you can find your sweet spot in this more than US$ 4.3 trillion economy. The trick is to understand your potential and addressable markets, which we can help with in light of your direct competition. We understand marketing, communication, and sales strategies for market entry and growth.

If you are an OEM or a supplier with a strategy and budget, talk to us about using our hybrid print, web, video, and social media channels for locating and dominating your addressable markets in India and South Asia. We may be one of the world’s leading B2B publications in the print industry with hands-on practitioner and consulting experience. Our 50 years of domain knowledge observing technological change and understanding of business and financials, includes the best globally recognized technical writers. Apart from our industry award winners, an experienced team is ready to meet you and your customers for content.

India’s fast-growing economy and evolving democracy has considerable headroom for print. Get our 2026 media kit and recalibrate your role in this dynamic market.

Founded in 1979 as a technical newsletter, Indian Printer and Publisher is the oldest B2B trade publication in the multi-platform and multi-channel IPPGroup. IppStar [www.ippstar.org] is our Services, Training and Research organization.

Naresh Khanna – 12 January 2026

Subscribe Now

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Ă—
error: Content is protected !!