Foreign Language Text Recognition for the Layman

May 1, 2009

It’s not often that I looked at a foreign text and cannot determine what language it is.  We can all tell French, German, or Spanish, but what about the different Cyrillic languages, or Far Eastern ones like Thai and Vietnamese?  I recently came across something that might help us.

The US Army Field Manual 34-54 on Battlefield Technical Intelligence is freely available on the Web.  Here is a description of this manual as taken from the manual itself:

chp_9_402This field manual provides guidance to commanders and staffs of military intelligence (MI) and other units responsible for technical intelligence (TECHINT) or having an association with TECHINT. It provides general guidance and identifies the tactics, techniques, and procedures (TTP) used in the collection, exploitation, and dissemination of TECHINT in satisfying the warfighter’s requirements.

Appendix G, entitled “Foreign Language Text Recognition”, is a concise and educational lesson on how to recognize a foreign language in unknown text.

Quoting from Appendix G:

When TECHINT personnel are able to correctly identify foreign languages used in documents or equipment, it has two immediate benefits. First, it helps identify the equipment or type of document and where or who is using it. Second, it ensures that TECHINT personnel request the correct linguistic support.

This appendix contains language identification hints that will enable TECHINT personnel to quickly identify some of the many languages used in documents, on equipment plates, and on other materiel. TECHINT personnel can speed up the entire battlefield TECHINT process by following the guidance herein.

For those of us who are, um, a little rusty and have forgotten the difference between a cedilla and a circumflex, this appendix will set you right.  Gone are the excuses for not recognizing a foreign language when you see one. 🙂