There is no dearth of Tamil content on the internet but virtually negligible volume is searchable matter as it evades artificial intelligence (AI) algorithm, unlike in the case of English.
To address this gap, the Tamil Valarchi Kazhagam (Tamil Academy), a non-governmental body of scholars, has taken up a project to convert the content available on the internet into searchable matter, using optical character recognition (OCR).
“The problem with the Tamil content is that it is all in the form of scanned version or images or PDF. You can keep the matter like a photograph. Unless it is converted into word format, you cannot have searchable data in Tamil,” says M. Rajendran, president of the Academy and former Vice-Chancellor of Tamil University.
Pointing out the Academy has teamed up with Wikipedia and the Central Institute of Classical Tamil, Dr. Rajendran says the plan is to train 30 post-graduate students of Tamil, drawn from different institutions in and around Chennai, for three to seven days in converting into word format all that is freely available on the net. This includes materials placed by the State government in the public domain after acquiring copyrights, and those having no copyrights issues.
To be held in October, the training programme will enable the students to correct errors, which may creep in during the process of content conversion. “After a point of time, the selected students can continue doing their assignment from their homes. We will be providing an incentive to them,” he says. If required, the training programme can be extended to districts.
The Academy, which steps into its 80th year later this month, was established by former Education Minister T.S. Avinashilingam Chettiar in 1946 under the Registration of Societies Act. Former Union Minister for Home Affairs and Finance P. Chidambaram is the chairperson of the Board of Trustees of the Tamil Valarchi Kahagam, which has been functioning from the Chepauk campus of the University of Madras since its inception.
Till now, the body has brought out 10 volumes of Tamil encyclopaedia, an equal number of volumes on children’s literature, 13 volumes on medicine, and seven volumes on Siddha medicine. Former Union Minister for Agriculture and Food C. Subramaniam and veteran educationist V.C. Kulandaisamy had headed the Academy. A couple of months ago, Tamil Nadu Chief Minister M.K. Stalin presented a cheque for ₹2.15 crore to the Tamil Valarchi Kazhagam’s chief as the government’s assistance for the body’s smooth functioning.
Besides the project on making Tamil AI-enabled, the Tamil Valarchi Kazhagam is executing six others - a students’ dictionary (Tamil-Tamil-English); a universal catalogue covering Tamil books and publications on Tamilogly; a collection of articles published by the Academy on Tamilogy; coining of new Tamil words; an encyclopedia of thoughts in Tamil and another on Tamil dramaturgy.
Elaborating, Dr. Rajendran says the dictionary is meant for children of the Tamil diaspora. The catalogue will help book lovers to spot where titles of their choice are available in which library and in which country. Regarding the encyclopedia of thoughts in Tamil, he clarifies that “we are not getting into the domain of identifying the age in which a given thought evolved.” Barring the dictionary, the remaining projects will be completed in eight months. The dictionary project may take one year, Dr. Rajendran adds.