Search
Register

Big Data Analytics

Importance of data analytics in understanding consumer behavior

|

  By Mr. Vikram Kumar, Co-Founder and Managing Director, SRV Media Pvt. Ltd   To understand data analytics and why it is crucial, we need to first understand what data is and why there is a need to analyze it. Data, in common terms, is the representation of information. It can be qualitative or quantitative, coded or formatted depending on the use of that information. Data analysis, on the other hand, means deriving the meaning of the information for which the data has been collected. It is the logical method of evaluating the data and determining the most accurate and appropriate interpretation of it so that the knowledge extracted can be put to good use. Without the analysis of data, we would not have a clear understanding of what the market needs and what can help the market g...

Qubole brings machine learning to the data warehouse.

|

Machine Learning Juggernaut Machine learning continues to be on a tear, fueled by the availability of data and compute capacity – especially in the cloud. A wide range of industry metrics points to the same conclusion. Here are some that I found insightful International Data Corporation (IDC) forecasts that spending on AI and ML will grow from $12B in 2017 to $57.6B by 2021. Machine learning patents grew at a 34% Compound Annual Growth Rate (CAGR) between 2013 and 2017, the third-fastest growing category of all patents granted. Deloitte Global predicts the number of machine learning pilots and implementations will double in 2018 compared to 2017, and double again by 2020. In our customer base, I see a similar growth trend in machine learning use cases. Of our 200+ customers, almost everyon...

Dive into Your Data Lake with Self-Service Analytics

|

The concept of self-service is one that dominates much of our lives today — we can ring up our own groceries, pump our gas, and answer support or inventory queries with the help of an automated system. Self-service is also sweeping through the business world, with promises of increased employee productivity and more accurate reporting. Self-service analytics help solve a critical problem many organizations face: the unbridgeable gap between the demand for data support and the existing capabilities of a data team. This gap — what we refer to as the Activation Gap — occurs due to the combined increase in the number of users, user expectations, use cases, data volume and variety, and data security concerns. Today, organizations simply don’t have a large enough supply of big data skills or IT ...

Best Web Scraping or Web Crawling Ethics to Follow

|

Many of us are always thinking about what are the best practices one should follow when undertaking a web scrape projects. Although there have no major legal hurdles in scraping publicly available data to really write about (other than a one off case of Ryan Air), it is best advised to follow a few steps that will keep you on right side of law. 1. Never swamp the targeted site to extent of denying access to other legitimate users. You can do this by limiting your access to their non-peak hours and ramping up in the evenings till dawn, on weekends and public holidays. Some popular sites like Google, Yahoo, Amazon, Facebook etc. warn you if you access the content too fast. That is a warning signal for you to slow your scraper down. 2. Never download the same content more than once as you are...

Top 5 Reasons to Move Enterprise Data Science Off the Laptop and to the Cloud

|

We live in a world that is inundated with data. Data science and machine learning (ML) techniques have come to the rescue in helping enterprises analyze and make sense of these large volumes of data. Enterprises have hired data scientists — people who apply scientific methods to data to build mathematical software models — to generate insights or predictions that enable data-driven business decisions. Typically, data scientists are experts in statistical analysis and mathematical modeling who are proficient in programming languages such as R or Python. Barring a few large enterprises, most data science is still being carried out on laptops, leading to a very inefficient process that is prone to errors and delays. In this blog, we will explore the top 5 reasons why we think ‘laptop data sci...

Advancing BIM and GIS through 4D Digital Twins

|

Summary At Bentley Systems’ Year in Infrastructure (YII 2019) Conference in Singapore, the keynote address by CEO Greg Bentley provided a company update and highlighted projects from the 2019 YII nominees that exemplify advancements that go beyond BIM (building information modeling) to digital twins.  He suggested a roadmap of non-disruptive steps to help make infrastructure engineering digital twins a key business benefit of every organization’s “going digital” strategy.  Mr. Bentley’s presentation, streamed live to people all over the world, spelled out the company’s strategies and collaborative efforts to accelerate the digital journey. According to Mr. Bentley, “We’ll always be talking about going digital, not a digital transformation that we’ll go through one time and have it behind u...

What does data science used for?

|

Data science is a multi-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from structured and unstructured data. The other terms: Business Analytics, Data Analytics, Data Mining, Predictive Analytics are essentially the same as Data Science. Data Science is concerned with analyzing data and extracting useful knowledge from it. Building predictive models are usually the most important activity for a Data Scientist. More generally, a data scientist is someone who knows how to extract meaning from and interpret data, which requires both tools and methods from statistics and machine learning, as well as being human. She spends a lot of time in the process of collecting, cleaning, and munging data, because data is never clean. ...

Addressing Regulatory GDPR and CCPA frameworks with Qubole ACID and Apache Ranger

|

Data lakes are at the heart of digital transformation in the enterprises. As more organizations run analytics, machine learning, and ETL workloads on the data residing in the data lakes, they come with privacy and integrity risk. Organizations have the urgency to preserve privacy and control access to this data as per regulations such as GDPR (Right to be Forgotten) and CCPA (Right to Be Erased) and other frameworks. As these regulations have a deadline for enterprises to get compliant, organizations are always looking for faster and scalable ways to get their data lake(s) compliant. Organizations put different measures for data governance at multiple levels from data security to data accessibility. But current high-level file-level security measures and accepted best practices are not suf...

Airflow on Anaconda: A Match Made in Heaven, Perfected by Qubole

|

Apache Airflow is a workflow management platform used to author workflows as Directed Acyclic Graphs (DAGs). This makes it easier to build data pipelines, monitor them, and perform ETL operations. A simple machine learning task may involve complex data pipelines. Triggering and monitoring these pipelines manually may cause unnecessary overhead and errors. Qubole offers Airflow running on top of the Anaconda environment to make running machine learning pipelines and data science tasks seamless. Anaconda is an open source Python distribution for data science, machine learning, and large-scale data processing tasks with over 1,400 packages. This gives users the ease of running huge data pipelines along with better package support for their tasks. Qubole also offers Package Management, which a...

Qubole and Google Join Forces to Deliver Unified User Experience for Apache Spark and Hadoop

|

I’m very excited to announce our expanded partnership with Google Cloud Platform (GCP). We have joined forces to offer an enterprise self-service data platform powered by optimized versions of Apache Spark and Hadoop, with unified tools for data science and data engineering running on GCP. Why Now? My cofounder Joydeep and I have always felt strongly that the future of big data is on the cloud. As a result, we created a platform with the flexibility to use the technologies and frameworks that best fit your environment today as well as how that environment will look tomorrow. In recent years we’ve seen not only the expansion of cloud usage but also a discernible shift toward a multi-cloud world where customers demand choice. We recognize that Google Cloud offers a compelling choice for cust...