Outline of the tutorial of Dr. Sudeshna Sarkar
Title: Search Engines in Indian Languages
Abstract and Outline:
In this talk we will discuss the issues related to search by Indian language users.
Issues pertaining to both monolingual search where the query language and the target
language are the same, as well as the case of cross language search where the query and
target languages may be different will be discussed. I will first give a brief overview of
a search engine, and its various components. I will then discuss issues related to
monolingual search in Indian languages. For example, we will like to discuss what
components of the search engine must be modified for a searcher to query in Hindi,
and to be able to retrieve documents in Hindi that are present in the Web. The role
of a stemmer, spelling variation rules, dictionary, thesaurus and other modules will
be discussed in this connection. The next part of the talk will focus on cross-language
retrieval and access. This is the case when the query is given in one language but the
information may be retrieved from other languages. Various techniques for effective
cross language retrieval will be discussed. This will include dictionary based approaches
as well as approaches based on comparable corpora. We will also discuss the status of
these systems and components in various Indian languages.