Translation - Dealing with a non-Latin alphabet as an amateur historical researcher


Gemini from a 12th century Georgian astronomy manuscript held at the Georgian National Center for Manuscripts

I enjoy my Georgian puzzle but the Georgian language can present problems. I'm long past my professional research days and even then I was never involved in reading journals in anything other than English. Thus it has been a matter of trial and error to come up with an effective methodology for dealing with Georgian and Russian texts.

The initial issues are three fold

1. Scanned pdfs

These are difficult because online translations such as Google Translate in a browser generally can not deal with images such as scanned pages. The general suggestion is to run them through an OCR process but I have not found a free OCR method that works with Georgian and produces useful output. In fact most do not work at all with Georgian.

2. Text pdfs

The feel like they should be easier to work with but the output is almost always useless. Google Translate (and the various other translation websites I have tried) translate Georgian at a character level. This leads to something that is Georgian but written in Latin characters. Such a transliteration can be useful in some cases - I do not know Korean but I can understand a limited amount of Korean when transliterated into Latin characters - but is generally unhelpful for the making sense of an article. Unfortunately I have had no success in finding a way of translating the Georgian in Latin characters into English and learning to read it is a long task.

3. Images

Google Lens, and perhaps other apps, can translate images either on the fly or from a picture. This is great because both scanned and text pdfs are pictures as far as Lens is concerned. It does involve machine translation so it isn't perfect but it is true translation and the output is generally understandable and very consistent. The drawback is that it involves either aiming your phone or tablet as another screen, aiming your phone or tablet at a printed page, or displaying the image on the phone or tablet and taking a screen shot with the Lens app. The translation is then read on the phone or tablet screen.

My workflow is awkward. I pull up one page at a time on screen. If I am using my phone and a desktop I read through the phone. If something looks interesting or I want to take a note I take a picture with the Lens app to freeze the translation on the screen while I put down my phone to type or write a note. If I am using a tablet I take a screen shot of each page and open the screenshot in Lens.

This is surprisingly rapid and robust. As expected with machine translation there are issues when an article or book is talking about how things are named. In Georgian the word kaba is dress. Most of the time Lens will translate the word but in articles about the terms for medieval clothing it has recognized the issue and correctly shown constructions such as "called a kaba which means dress but in medieval times did not indicate gender."

The holy grail would be a script that could automate the process. Take a pdf, break it up into images of single pages, get the Lens translation of each image, and finally stitch the translated images together into one pdf. If anyone knows of a way to do this I'm all ears!

The secondary problem

I call it the secondary problem but real researchers would probably make it the primary issue. A translated text of any form has been filtered by the translation. Most of my work has been modern Georgian texts (20th or 21st century) so Google Lens does a good job of producing readable text. However there are times when you move the camera just a little and the translated text changes. Mostly it is a few words that doesn't change the meaning that much. Sometimes it turns things into nonsense! Manuscript text is of course much harder to deal with.

Even with the best translation you are going to run into issues of connotations, shades of meaning, and outright bias. Some of these issues can be caused by the translation. Others are issues inherent to the source (in my case I am generally always aware of some of the issues related to Georgian nationalism). Unless you read the language you are working with, and are thus doing your own in head translation, there is no way to completely get away from the issues of meaning in translation. Multiple different translations would help of course but as an amateur looking at scholarly sources it is hard to justify paying for a translation!

Davit's Wandering

Search This Blog

Translation - Dealing with a non-Latin alphabet as an amateur historical researcher

Comments

Post a Comment