Topics In Demand
Notification
New

No notification found.

74

0

2023 had seen a surge of interest in Large Language Models in India. This thriving curiosity was because of the mass level adoption of ChatGPT whose first version used GPT model. Experts stated that GPT 3.5 was largely trained on English language and vast repositories of books, website articles and publicly available digital data. According to BBC Science Focus, the model was trained using internet databases that included a humongous 570 GB of data sourced from Wikipedia, books, research articles, websites, web texts and other forms of content. To give you an idea, approximately 300 billion words were fed into the system. Then in mid-2023, Sam Altman, CEO of OpenAI expressed his belief that it would be challenging for Indian companies to compete with OpenAI on training foundation models. Many Indian experts discussed the need for Large Language Models (LLMs) within the Indian context, sparking significant debate and conversation.

Indian entrepreneurs emphasized on the need for training LLMs based on local datasets. Many entrepreneurs stated that the country needs to leverage large amounts of data being generated by companies and government bodies to train these models. Vijay Shekhar Sharma, Founder of Paytm said in GPAI summit, “I fundamentally believe that India should have been paying a lot more attention to AI than we have till today. It is more up to the entrepreneurs… and I would say AI … for Indians … will be more important because we lack the resources available for citizen services like education, health, financial services,”.

Although there were no foundational models in H1 CY2023 in India, over 17 distinct models with unique strengths have emerged since then in 2024, according to India’s Gen AI Startup Landscape Report.

  • Seetha Mahalaxmi Healthcare (SML) in partnership with IIT Bombay led Bharat GPT has unveiled Hanooman which is trained on 22 Indic languages. Hanooman shall have multimodal AI capabilities for generating text-to-text, text-to-speech, text-to-video and vice versa content.
  • OdiaLlama, Kannada Llama, Bharat GPT, Krutrim brought Indic LLMs with a perspective of making AI accessible to India irrespective of linguistic diversity and socio-economic background and thus bridging the digital divide.
  • Setu, a leading Indian fintech company and part of the Pine Labs Group launched Sesame, the BFSI Specific LLM leverages India’s digital infrastructure making it domain and region specific.
  • Jivi MedX, which is a medical specific LLM model, beat OpenAI’s GPT-4 and Google’s Med-PaLM 2 and ranked 1 on Open Model LLM Leaderboard. The startup was recently funded by Andrew NG’s AI Fund.
  • Dhenu launched by Kissan AI which is an Agri focussed LLM ideal for crop disease detection. The model is bilingual and can comprehend queries in English, Hindi and Hinglish.  
  • Sarvam 2B is the first Open Small sized Foundational Model built entirely for Indic languages. They support 10 Indic languages and built on 4 trillion tokens.  

Source: Economic Times, India's Gen AI Startup Landscape Report


That the contents of third-party articles/blogs published here on the website, and the interpretation of all information in the article/blogs such as data, maps, numbers, opinions etc. displayed in the article/blogs and views or the opinions expressed within the content are solely of the author's; and do not reflect the opinions and beliefs of NASSCOM or its affiliates in any manner. NASSCOM does not take any liability w.r.t. content in any manner and will not be liable in any manner whatsoever for any kind of liability arising out of any act, error or omission. The contents of third-party article/blogs published, are provided solely as convenience; and the presence of these articles/blogs should not, under any circumstances, be considered as an endorsement of the contents by NASSCOM in any manner; and if you chose to access these articles/blogs , you do so at your own risk.


images
Madhumay
Deputy Manager - Research

Comment

images

The growth of Indic LLMs, as detailed in this blog, is a fascinating leap toward bridging linguistic gaps in AI. The development of localized models like Hanooman and Dhenu shows how India is carving out its own space in generative AI. These models offer much-needed inclusivity, addressing regional needs and language diversity that larger global models might overlook. It's exciting to see how these innovations will reshape sectors from healthcare to agriculture in India, bringing AI closer to real-world applications. A truly promising future!

images

The rise of Indic LLMs like Hanooman and Dhenu is an exciting step in AI localization. They cater to regional languages, creating opportunities for more inclusive and effective AI solutions across India. These innovations promise to enhance everything from education to governance. It's refreshing to see India leading this space.

© Copyright nasscom. All Rights Reserved.