Menu:


Outline of the tutorial of Dr. Sudeshna Sarkar

Title: Search Engines in Indian Languages

Abstract and Outline:
In this talk we will discuss the issues related to search by Indian language users. Issues pertaining to both monolingual search where the query language and the target language are the same, as well as the case of cross language search where the query and target languages may be different will be discussed. I will first give a brief overview of a search engine, and its various components. I will then discuss issues related to monolingual search in Indian languages. For example, we will like to discuss what components of the search engine must be modified for a searcher to query in Hindi, and to be able to retrieve documents in Hindi that are present in the Web. The role of a stemmer, spelling variation rules, dictionary, thesaurus and other modules will be discussed in this connection. The next part of the talk will focus on cross-language retrieval and access. This is the case when the query is given in one language but the information may be retrieved from other languages. Various techniques for effective cross language retrieval will be discussed. This will include dictionary based approaches as well as approaches based on comparable corpora. We will also discuss the status of these systems and components in various Indian languages.