Topics In Demand
Notification
New

No notification found.

Creating India’s DeepSeek moment: Getting the approach and mindset right
Creating India’s DeepSeek moment: Getting the approach and mindset right

102

2

In the journey of large language models, 33k LLMs have been created, as per a recent analysis. One moment on 10th Jan 2025, called the DeepSeek moment, took everyone by shock – OpenAI, Google, Meta, Anthropic – all took note, dismissed initially saying its Chinese, and then acknowledged the innovation it brought.

Since then, this question has gained imminence – should India build its own foundation model? The answer has come with MeitY declaring that by the end of 2025, India will have its own foundation model!

Why has the news not triggered massive excitement then? Are we too late in even analyzing what DeepSeek changed, since it itself has been disrupted?

Transformation in AI has a similar history. Waves of AI springs and AI winters. Except that this time, the spring has shown a hue of rapidly changing colors witnessed in the speed at which models are becoming efficient. Model efficiency is assessed majorly by these two costs:

  1. Cost of compute and store – The cost of a million tokens of output has come down from $60+ for ChatGPT 3 series to $0.14 with DeepSeek, within a span of 2 years. Not to be confused by the scale of compute and store required to make these models, which according to Nvidia chief Jensen Huang, has gone up 100X easily, as mentioned in his GTC opening note. Therefore, low cost per unit of output, but more units of output negates the benefit.

 

  1. Cost of accuracy – This is the cost that DeepSeek addressed, and in that way, it also helped lower the cost of compute. Model accuracy can be managed through model training in three ways as per experts – pre-training, post-training, and time-test scaling. Majority models until DeepSeek focused on pre-training. It is a data, compute, store, energy guzzling activity, and here, cost of accuracy is directly proportional to cost of compute in pre-training. With post-training, the decoupling happens as costs come down significantly with iterations becoming focused around better reasoned output. Time-test scaling is still in experimental stages.

 

DeepSeek optimizes on both costs, and its strategy was sown much earlier than any of us heard the term generative or GPT. Summarizing all three:

 

  1. Strategy that culminated in a cost-effective DeepSeek 

 

a. The foundation of DeepSeek and its founder – The first GPT was made in 2017, released in 2018, based on 7000 books. This was trigger enough for the Chinese algorithmic trader, Lian Wenfeng, who had established a quantitative algorithmic trading firm by the name of Ningbo High-Flyer Quantitative Investment Management Partnership (Limited Partnership), trading as High-Flyer, back in 2016.  Lian Wenfeng is a master’s in electronic information and communication engineering and had done his thesis on algorithms to track moving targets using low-cost PTZ cameras. In a nutshell, a hi-tech and analytics-oriented founder with a bent towards math, statistics, and algorithms.

 

The India take: India should not have to worry with one of the largest STEM pipeline that we generate every year from schools and engineering colleges, except that do our graduates retain that inclination to pursue careers rooted deep in science and mathematics. The writer of this blog takes self-blame humbly!

b. Monetizing the opportunity cost – Lian had been collecting financial data and systematizing that data for algorithmic trading. A crucial step managed not for AI adoption, but in principle also because his company operated in the most data standardized BFSI industry. By 2019, High-Flyer AI was founded to conduct cutting-edge research in AI. Lian had acquired 10,000 A100s by the time the first US restrictions on China came, much to the shock of his peer founders worldwide. DeepSeek could have most likely been the final result of a long experimentation on less costly GPUs, and once a cost-effective training technique was found, Lian acquired H800s needed to do the job.

 

The India take: IndiaAI mission will be procuring 18,000 GPUs under its centrally-budgeted USD 1.3 billion initiative.

 

c. Effective utilization of local talent – Morale-boosting is easier said than done, especially when you choose to work with humanities graduates to reverse engineer some of the world’s top closed-source LLMs! That’s what Lian did. Since he launched DeepSeek immediately after ChatGPT 3.5 got released, he realized that China did not lack capital, but the confidence in its own knowledge and ability to exploit talent. He decided to reverse engineer the problem.

 

The India take: India has the 3rd largest installed AI talent already, but likely that India is over-confident of its “jugaad” abilities. In jugaad, the data problem cannot be solved for and Indian companies are learning it the hard way.

 

  1. Cost of compute/accuracy: A lot less is known about the actual cost of DeepSeek, but from the sequence of events of how meticulously the founder planned each next step, or serendipity played its part, the cost that has rattled the world is largely the cost of applying smart accuracy-boosting measures.

The India take: DeepSeek’s cost effectiveness is certainly going to help Indian LLM makers think afresh on potential ways to improve model training techniques and maybe leverage reinforcement learning without supervision a lot more to reduce the costs of pre-training and post-training. Yet, the base cost of having good robust datasets, some sound tech infrastructure, and most importantly, purposeful talent to drive innovation will have to borne.

 

According to India’s GenAI Startup Landscape 2024 released by Nasscom, India already has 17+ LLMs in place. We have few open-source models as well. We could make some key optimization in already available foundational models for a single general-purpose model for all other AI applications other than the critical sectors. In short, the chorus is rightly on that India should eventually have its own foundation model.

 

The bigger question to address is the set of creative innovations that can emerge from India given the richness of language, dialects, accents, population segments, consumption patterns, learning behaviors, societal factors and the uniqueness of its geography and positioning. All are crucial data elements that need to be captured, digitized, and democratized for experimentation – at least at the rate at which AI is evolving today!


That the contents of third-party articles/blogs published here on the website, and the interpretation of all information in the article/blogs such as data, maps, numbers, opinions etc. displayed in the article/blogs and views or the opinions expressed within the content are solely of the author's; and do not reflect the opinions and beliefs of NASSCOM or its affiliates in any manner. NASSCOM does not take any liability w.r.t. content in any manner and will not be liable in any manner whatsoever for any kind of liability arising out of any act, error or omission. The contents of third-party article/blogs published, are provided solely as convenience; and the presence of these articles/blogs should not, under any circumstances, be considered as an endorsement of the contents by NASSCOM in any manner; and if you chose to access these articles/blogs , you do so at your own risk.


images
Madhumay
Deputy Manager - Research

© Copyright nasscom. All Rights Reserved.