Assamese and Bodo Parallel Corpora
The RCILTS-II (Assamese and Bodo) team had built huge Assamese and Bodo parallel corpora of different domains viz. Generic, Health and Tourism and annotated them using a tagger, Sanchay which is developed and designed in International Institute of Information Technology, Hyderabad, India. All the corpora are taken from written data of different sources. As the sources of the Generic Corpus, we have taken the book “It Happened Tomorrow” published by NBT India and its Assamese and Bodo versions translated by Dinesh Chandra Goswami and Aleendra Brahma respectively, “Burhi Aair Sadhu” by Laksminath Bezbarua and its Bodo version translated by D.N. Boro and R. Machahary, and a few sentences are taken from Assamese dailies viz. “Dainik Janambhumi” and “Asomiya Pratidin”. Whereas, all the sources of the Health and Tourism Corpora originated by TDIL and IIT Bombay.
Few screenshot of corpora is given below: