Terms of use

Terms of Use

The use of this site and the content contained therein is governed by the Terms of Use. When you use this site you acknowledge that you have read the Terms of Use and that you accept and will be bound by the terms hereof and such terms as may be modified from time to time.

All text, graphics, audio, design and other works on the site are the copyrighted works of nasscom unless otherwise indicated. All rights reserved.
Content on the site is for personal use only and may be downloaded provided the material is kept intact and there is no violation of the copyrights, trademarks, and other proprietary rights. Any alteration of the material or use of the material contained in the site for any other purpose is a violation of the copyright of nasscom and / or its affiliates or associates or of its third-party information providers. This material cannot be copied, reproduced, republished, uploaded, posted, transmitted or distributed in any way for non-personal use without obtaining the prior permission from nasscom.
The nasscom Members login is for the reference of only registered nasscom Member Companies.
nasscom reserves the right to modify the terms of use of any service without any liability. nasscom reserves the right to take all measures necessary to prevent access to any service or termination of service if the terms of use are not complied with or are contravened or there is any violation of copyright, trademark or other proprietary right.
From time to time nasscom may supplement these terms of use with additional terms pertaining to specific content (additional terms). Such additional terms are hereby incorporated by reference into these Terms of Use.

Disclaimer

The Company information provided on the nasscom web site is as per data collected by companies. nasscom is not liable on the authenticity of such data.
nasscom has exercised due diligence in checking the correctness and authenticity of the information contained in the site, but nasscom or any of its affiliates or associates or employees shall not be in any way responsible for any loss or damage that may arise to any person from any inadvertent error in the information contained in this site. The information from or through this site is provided "as is" and all warranties express or implied of any kind, regarding any matter pertaining to any service or channel, including without limitation the implied warranties of merchantability, fitness for a particular purpose, and non-infringement are disclaimed. nasscom and its affiliates and associates shall not be liable, at any time, for any failure of performance, error, omission, interruption, deletion, defect, delay in operation or transmission, computer virus, communications line failure, theft or destruction or unauthorised access to, alteration of, or use of information contained on the site. No representations, warranties or guarantees whatsoever are made as to the accuracy, adequacy, reliability, completeness, suitability or applicability of the information to a particular situation.
nasscom or its affiliates or associates or its employees do not provide any judgments or warranty in respect of the authenticity or correctness of the content of other services or sites to which links are provided. A link to another service or site is not an endorsement of any products or services on such site or the site.
The content provided is for information purposes alone and does not substitute for specific advice whether investment, legal, taxation or otherwise. nasscom disclaims all liability for damages caused by use of content on the site.
All responsibility and liability for any damages caused by downloading of any data is disclaimed.
nasscom reserves the right to modify, suspend / cancel, or discontinue any or all sections, or service at any time without notice.

For any grievances under the Information Technology Act 2000, please get in touch with Grievance Officer, Mr. Anirban Mandal at data-query@nasscom.in.

New

See all

No notification found.

10 Data Science concepts for Beginners

Siddharth K M

@siddharthkm

September 22, 2022

Data Science & AI Community

366

Introduction to Data Science

Although there is still much to learn and many developments to come in the field of data science, a core set of fundamental principles is still crucial. Here, fifteen of these principles are emphasized as being crucial to examine before a job interview or merely to refresh your understanding of the fundamentals.

Dataset

Data science, as its name implies, is a branch of research that analyses data using the scientific method to discover relationships between different attributes and draw inferences. from these connections. Data is thus the central element of data science.

A dataset is a specific instance of data that is currently utilized for analysis or model construction. A dataset can be composed of several types of information, including category and numerical data as well as text, picture, audio, and video data. A dataset may be static (constantly the same) or dynamic (changes with time, for example, stock prices). Additionally, a dataset could be space-dependent.

Data Wrangling

The act of transforming data from an unorganized state into one that is ready for analysis is known as "data wrangling." Data import, cleaning, structuring, string processing, HTML parsing, managing dates and times, handling missing data, and text mining are just a few of the procedures that make up the crucial stage of data wrangling in the data preparation process.

A crucial step for any data scientist is the practice of data wrangling. Data is rarely easily available for examination in a data science project. The likelihood of the data being in a file, database, or an extract from a document like a web page, tweet, or PDF is higher. You can extract important insights from your data that would otherwise be concealed if you know how to manage and clean data.

Data Visualization

Data visualization is the most crucial field of data science. It is one of the primary methods used to examine and research the connections between various variables. Descriptive analytics can make use of data visualization (such as scatter plots, line graphs, bar plots, histograms, Q-Q plots, smooth densities, box plots, pair plots, heat maps, etc.).

Additionally, machine learning employs data visualization for feature selection, model construction, model testing, and model assessment.

Outliers

A data point that deviates significantly from the rest of the dataset is known as an outlier. Outliers are frequently merely faulty data, such as those caused by a broken sensor, tainted studies, or human mistakes in data recording. Outliers can occasionally point to an actual problem, like a flaw in the system. In huge datasets, outliers are predicted and are highly prevalent. A common method for identifying outliers in a dataset is a box plot.

Data Imputation

Missing values are common in datasets. The easiest technique to handle missing data is to discard the data item. However, it is simply not possible to remove samples or eliminate entire feature columns since we risk losing an excessive amount of important data. In this instance, we may approximate the missing values from the other training samples in our dataset using various interpolation approaches.

Data Scaling

Scaling your features will help your model become more accurate and predictive. As an illustration, imagine that you want to create a model that uses predictor factors like income and credit score to forecast the creditworthiness of a target variable. Without scaling your characteristics, the model will be skewed towards the income component as credit scores range from 0 to 850, while yearly income might be between Rs.25,000 and Rs.5,00,000 (depending on your location).

As a result, the income parameter's weight factor will be very low, which implies the predictive model will solely estimate creditworthiness using the income parameter.

Principal Component Analysis (PCA)

When characteristics are associated with one another in large datasets with hundreds or thousands of features, redundancy is frequently the result. Overfitting can occur when a model is trained on a high-dimensional dataset with an excessive number of features (the model captures both real and random effects).

A model with too many characteristics or extremely complicated might also be challenging to comprehend. One may address redundancy by using dimensionality reduction and a feature selection approach like PCA. The results of a PCA transformation are as follows:

By concentrating primarily on the elements contributing the bulk of the dataset's variation, fewer features will be needed in the final model.
Eliminates the relationship between the characteristics.

Linear Discriminant Analysis (LDA)

Two data preprocessing linear transformation methods, PCA and LDA, are frequently employed for dimensionality reduction in order to choose pertinent features that may be incorporated into the final machine learning algorithm.

Data Partitioning

When used for machine learning, the dataset is frequently divided into training and testing sets. The training dataset is used to develop the model, while the testing dataset is used to evaluate it. As a result, the testing dataset is the unknown dataset, which is used to calculate a generalization error.

Supervised Learning

These algorithms use machine learning to examine the correlation between the feature variables and the predetermined target variable. Two types of supervised learning are available:

Continuous Target Variables

Linear Regression, KNeighbors Regression (KNR), and Support Vector Regression are algorithms for forecasting continuous target variables (SVR).

Discrete Target Variables

There are several algorithms for forecasting discrete target variables:

Classifier using perceptrons
Classifier using logistic regression
Decision tree classifier using Support Vector Machines (SVM)
K-nearest neighbor
Bayes's naive classifier

Conclusion

Hope this article was helpful and informative for you as a beginner. If these techniques are used properly, you can derive proper solutions. If you’re a data science aspurant and looking for the resources to learn, Learnbay has the perfect Data science and AI Bootcamp.

Data Science data scientist data science concepts Data Visualization

Disclaimer

That the contents of third-party articles/blogs published here on the website, and the interpretation of all information in the article/blogs such as data, maps, numbers, opinions etc. displayed in the article/blogs and views or the opinions expressed within the content are solely of the author's; and do not reflect the opinions and beliefs of NASSCOM or its affiliates in any manner. NASSCOM does not take any liability w.r.t. content in any manner and will not be liable in any manner whatsoever for any kind of liability arising out of any act, error or omission. The contents of third-party article/blogs published, are provided solely as convenience; and the presence of these articles/blogs should not, under any circumstances, be considered as an endorsement of the contents by NASSCOM in any manner; and if you chose to access these articles/blogs , you do so at your own risk.

Siddharth K M

Strengthening Export Competitiveness through Targeted Reforms – Spotlight on GST Treatment of Intermediary Service

Ashish Aggarw..

@ashish.aggarwal

01 Sep 2025

Public Policy In the Spotlight

India’s technology industry remains one of the most significant drivers of the country’s economic progress. In 2024–25, exports of IT and IT enabled services (IT–ITES) were valued at more than USD 224 billion, accounting for nearly half of India’s…

Strengthening Tech Ties: India-Japan alliance deepens to unlock next phase of growth

Kuhu Singh

@Kuhu

30 Aug 2025

Current Issues Collaborations

The year 2025-26 has been declared the ‘India–Japan Year of Science, Technology and Innovation Exchanges,’ marking 40 years of cooperation between the two nations. To further strengthen ties between the two countries, the Indian Prime Minister…

Generative AI vs Agentic AI: Which is More Cost-Effective?

Cyfuture.AI

@cyfutureai

28 Aug 2025

AI AI Inside

Artificial Intelligence (AI) is transforming the way businesses operate, and two of the most talked-about paradigms today are Generative AI (GenAI) and Agentic AI. Both promise significant efficiency gains, but they operate differently, and their…

Best Practices for Fine-Tuning Large Language Models (LLMs)

Cyfuture.AI

@cyfutureai

28 Aug 2025

AI AI Inside

Large Language Models (LLMs) like GPT, LLaMA, and other open-source variants have revolutionized AI applications by enabling natural language understanding, generation, and reasoning at scale. However, the out-of-the-box performance of these models…

Agentic AI: Transforming the Product Lifecycle from birth to disposal

Infosys Ltd

@Infosys Ltd

28 Aug 2025

Engineering Research & Design Data Science & AI Community Industry 4.0

The engineering product sector is undergoing significant transformation as artificial intelligence matures from automating tasks to orchestrating the entire product lifecycle. Agentic AI stands at the forefront of this evolution, driving intelligent…

India’s Fintech GCCs are Building Tomorrow’s Digital Banks today

Sneha Sharma

@snsharma

27 Aug 2025

GCC Data Science & AI Community BFSI

India’s fintech Global Capability Centers (GCCs) are at the forefront of the country’s remarkable transformation into a powerhouse for technology-driven innovation and enterprise impact. India now hosts 1,760+ GCCs with over 2,975 units as of FY2024…

Topics In Demand

Notification

New

10 Data Science concepts for Beginners

Introduction to Data Science

Dataset

Data Wrangling

Data Visualization

Outliers

Data Imputation

Data Scaling

Principal Component Analysis (PCA)

Linear Discriminant Analysis (LDA)

Data Partitioning

Supervised Learning

Continuous Target Variables

Discrete Target Variables

Conclusion

Share this blog

Related blogs