Big Data Analytics


  

Advancing BIM and GIS through 4D Digital Twins

Summary At Bentley Systems’ Year in Infrastructure (YII 2019) Conference in Singapore, the keynote address by CEO Greg Bentley provided a company update and highlighted projects from the 2019 YII nominees that exemplify advancements that go beyond BIM (building information modeling) to digital twins.  He suggested a roadmap of non-disruptive steps to help make infrastructure engineering digital twins a key business benefit of every organization’s “going digital” strategy.  Mr. Bentley’s presentation, streamed live to people all over the world, spelled out the company’s strategies and collaborative efforts to accelerate the digital journey. According to Mr. Bentley, “We’ll always be talking about going digital, not a digital transformation that we’ll go through one time and have it behind u...

What does data science used for?

Data science is a multi-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from structured and unstructured data. The other terms: Business Analytics, Data Analytics, Data Mining, Predictive Analytics are essentially the same as Data Science. Data Science is concerned with analyzing data and extracting useful knowledge from it. Building predictive models are usually the most important activity for a Data Scientist. More generally, a data scientist is someone who knows how to extract meaning from and interpret data, which requires both tools and methods from statistics and machine learning, as well as being human. She spends a lot of time in the process of collecting, cleaning, and munging data, because data is never clean. ...

Top Promising Blockchain Use Cases

In the upcoming fiscal years, Blockchain is going to be a big demand. That is the reason the demand for learning Blockchain technology has increased among freshers and tech enthusiasts. To become a blockchain developer students are preferring Blockchain certification courses these days. There are various use cases in Blockchain which proves why learning blockchain is a must. Here is a few blockchain used case which is used for enterprises and governments: Banking and Finance Digital Identity Energy and Sustainability Government and the Public Sector Healthcare and the Life Sciences International Trade and Commodities Law Media and Entertainment Real Estate Sports and Esports Supply Chain Few Potential Blockchain use cases are as follows, Proof of Existence : Demonstrating data ownership wi...

Addressing Regulatory GDPR and CCPA frameworks with Qubole ACID and Apache Ranger

Data lakes are at the heart of digital transformation in the enterprises. As more organizations run analytics, machine learning, and ETL workloads on the data residing in the data lakes, they come with privacy and integrity risk. Organizations have the urgency to preserve privacy and control access to this data as per regulations such as GDPR (Right to be Forgotten) and CCPA (Right to Be Erased) and other frameworks. As these regulations have a deadline for enterprises to get compliant, organizations are always looking for faster and scalable ways to get their data lake(s) compliant. Organizations put different measures for data governance at multiple levels from data security to data accessibility. But current high-level file-level security measures and accepted best practices are not suf...

Airflow on Anaconda: A Match Made in Heaven, Perfected by Qubole

Apache Airflow is a workflow management platform used to author workflows as Directed Acyclic Graphs (DAGs). This makes it easier to build data pipelines, monitor them, and perform ETL operations. A simple machine learning task may involve complex data pipelines. Triggering and monitoring these pipelines manually may cause unnecessary overhead and errors. Qubole offers Airflow running on top of the Anaconda environment to make running machine learning pipelines and data science tasks seamless. Anaconda is an open source Python distribution for data science, machine learning, and large-scale data processing tasks with over 1,400 packages. This gives users the ease of running huge data pipelines along with better package support for their tasks. Qubole also offers Package Management, which a...

Qubole and Google Join Forces to Deliver Unified User Experience for Apache Spark and Hadoop

I’m very excited to announce our expanded partnership with Google Cloud Platform (GCP). We have joined forces to offer an enterprise self-service data platform powered by optimized versions of Apache Spark and Hadoop, with unified tools for data science and data engineering running on GCP. Why Now? My cofounder Joydeep and I have always felt strongly that the future of big data is on the cloud. As a result, we created a platform with the flexibility to use the technologies and frameworks that best fit your environment today as well as how that environment will look tomorrow. In recent years we’ve seen not only the expansion of cloud usage but also a discernible shift toward a multi-cloud world where customers demand choice. We recognize that Google Cloud offers a compelling choice for cust...

Intel vs. AMD: Comparing Instance Types for Big Data Workloads

Recently AWS announced support for instances running AMD Epyc processors. While the new instances are 10 percent cheaper, cost and performance are workload dependent. As the AWS announcement notes: “We recommend that you measure performance and cost on your own workloads when choosing your instance types.” This raises an obvious question: how do these instance types fare for the big data workloads that our customers run? To answer this question, we compared the performance of AMD and Intel instances using two sets of benchmarks common in the big data space: TeraGen and TeraSort for ETL workloads TPCDS for Data-Warehousing workloads The rest of the post goes into the details of the benchmark setup, results therefrom, and conclusions. Benchmark Setup All benchmarks were conducted using Apach...

Blockchain: A Technology Fad That’s Fading Away!

Despite the hype around Blockchain in the past few years, a 2018 Gartner survey indicates Blockchain adoption rates are still as low as 1%. Within that 1%, the doubt is how operationally effective and efficient it is. This blog also explores the inherent risks with Blockchain and therefore why Blockchain may not be the magic cure as it is made out to be!  Blockchain is surely a fantastic technology as it has already been proven by cryptocurrency like Bitcoin. Cryptocurrency has been a trend since the Bitcoin prices soared suddenly in 2017, to reach an all-time high of $19,783 the same year.1 Ever since, Blockchain, which is the underlying technology of crypto has been the talk of the town. Various large global financial institutions were quick to analyze and explore various possibilities o...

Building Digital Payments

Making Digital Payments easier in India.

Benefit From Implementing Blockchain In Public Systems

The emergence of bitcoin and other cryptocurrencies has become a catalyst for interest in the structure of special data chains – the blockchain. The technology has gained a second wind and continues to develop rapidly now. Of course, blockchains have not only positive features. Shortcomings are also characteristic of similar structures. But today, the advantages of blockchains are more interesting, both in general perception and with specific subjective approaches. They will be discussed further. A concrete example is decentralization. Closed structures are already being formed. Additionally, it is possible to create a group uniting more than 50% of users, which will allow them to change the rules within the system. Subjective opinions on the benefits of blockchain A useful approach ...