You are logged in
Loading…
You don’t have any Active Subscription.
Subscribed with another email? Logout and Login with that one.
Your active subscription(s)
Account subscription benefits alongside Premium Stories, Editorials, Opinions and more. Unlock these with Subscription
Products you’ve access to
Additional Subscription Benefits
Account Settings
Need help with your subscription?
June 11, 2026e-Paper
The View From India Looking at World Affairs from the Indian perspective.
First Day First Show News and reviews from the world of cinema and streaming.
Today's Cache Your download of the top 5 technology stories of the day.
Science For All The weekly newsletter from science writers takes the jargon out of science and puts the fun in!
Data Point Decoding the headlines with facts, figures, and numbers
THEdge At the cutting edge of education and careers
Health Matters Ramya Kannan writes to you on getting to good health, and staying there
Gender Agenda Stories from beyond the binary.
The Hindu On Books Books of the week, reviews, excerpts, new titles and features.
June 11, 2026e-Paper
Published – December 15, 2023 01:52 pm IST
Artificial Intelligence words are seen in this illustration taken March 31, 2023. REUTERS/Dado Ruvic/Illustration// | Photo Credit: Reuters
Indian AI startup Sarvam AI has released the first open-source Hindi language model called OpenHathi-Hi-0.1. The AI model is the first in a series of models which will “make contributions to the ecosystem with open models and datasets to encourage innovation in Indian language AI.”
Built on Meta AI’s Llama 2-7B model, a blog posted by the company stated that the model was on par with GPT-3.5 for Indic languages.
The blog explained that tokenisation, which is a crucial part of processing text in large language models is much more costly for Hindi compared to English because training text in the Hindi language is very little. Trained in two phases, the team behind the model worked to make this process cheaper.
It was then tested on a variety of benchmarks including standard ones like translation as well as several new ones like toxicity classification and text classification.
The base model has been made available on the Hugging Face platform so developers can finetune it and use it for specific use-cases.
Co-founders Pratyush Kumar and Vivek Raghavan had previously worked with another homegrown AI venture, AI4Bharat. Sarvam AI has partnered with AI4Bharat to use their language resources and benchmarks to train OpenHathi.
Currently employing around 18 people, Sarvam AI wants to build large language models that use voice as the common interface to make them more accessible to the demands of the Indian market.
Last week, the five-month-old startup raised $41 million in Series A funding led by Lightspeed Ventures with participation from Peak XV and Khosla Ventures. The startup is also working on a range of enterprise-grade models on its full stack Generative AI platform which will also release soon.
Published – December 15, 2023 01:52 pm IST
technology (general) / internet / emerging technologies / Artificial Intelligence / India
Copyright© 2026, THG PUBLISHING PVT LTD. or its affiliated companies. All rights reserved.
BACK TO TOP
Terms & conditions | Institutional Subscriber
Comments have to be in English, and in full sentences. They cannot be abusive or personal. Please abide by our community guidelines for posting your comments.
We have migrated to a new commenting platform. If you are already a registered user of The Hindu and logged in, you may continue to engage with our articles. If you do not have an account please register and login to post comments. Users can access their older comments by logging into their accounts on Vuukle.

Leave a Reply