Sentiment analysis across languages
Social media sentiment analysis across 18 official languages in India, without the need for machine translation, could give new insights into public opinion in the world’s second most populous nation.
IITB-Monash Research Academy PhD student Balamurali A.R. is developing a method to make data collection for building sentiment analysis systems in multiple Indian languages much easier. Ultimately, this will allow the opinons of a larger cross-section of the Indian population to be analysed leading to more meaningful information about public sentiment.
Sentiment analysis is the process of determining whether social media publications are positive or negative. Due to ambiguities inherent in language it can be very challenging to program software analysis tools to accurately resolve whether a word is positive or negative. The programs gradually “learn” based on thousands of examples.
Balamurali said this learning process is even more challenging in a large and multi-lingual country like India.
“Most of the sentiment analysis materials available are in English. So, to interpret sentiment in Hindi, for example, which is spoken by approximately 40 per cent of the population, involves a time-consuming and often unreliable process of machine translation before analysis can take place,” Balamurali said.
“My research is addressing this resource constraint by identifying cross-lingual features of languages. By linking our analysis with online dictionaries that are focused on meaning and concept, rather than specific words, my supervisors and I have been able to effectively capture sentiment across many languages.”
The research will have direct application in cross-lingual sentiment analysis – allowing the respective programs to learn much faster. Balamurali said the methods could be applied to other languages outside the initial 18 dialects.
“Ultimately, I feel sophisticated sentiment analysis could be used to predict share market spikes or drops, or to help predict disease outbreaks. My research will prove invaluable in harnessing this information for languages other than English,” Balamurali said.
CEO of the IITB-Monash Research Academy, Professor Mohan Krishnamoorthy, said Balamurali’s work was particularly relevant given the increasing popularity of social media.
“The research being undertaken by Balamurali AR in sentiment analysis is capable of opening up a new world on how interactions on modern mediums can be mined to identify interesting and useful trends with potential uses across a spectrum of topics,” Professor Krishnamoorthy said.