Topics In Demand
Notification
New

No notification found.

Blog
Airflow on Anaconda: A Match Made in Heaven, Perfected by Qubole

December 13, 2019

927

0

Apache Airflow is a workflow management platform used to author workflows as Directed Acyclic Graphs (DAGs). This makes it easier to build data pipelines, monitor them, and perform ETL operations. A simple machine learning task may involve complex data pipelines. Triggering and monitoring these pipelines manually may cause unnecessary overhead and errors.

Qubole offers Airflow running on top of the Anaconda environment to make running machine learning pipelines and data science tasks seamless. Anaconda is an open source Python distribution for data science, machine learning, and large-scale data processing tasks with over 1,400 packages. This gives users the ease of running huge data pipelines along with better package support for their tasks. Qubole also offers Package Management, which allows users to install various Anaconda packages on their clusters directly from the UI without restarting the clusters.

Running Airflow on the Anaconda environment provides users with the simplicity of running machine learning and data science tasks by building complex data pipelines. It also gives them the flexibility to install various packages optimized for data science tasks available within the Anaconda environment on the go with the help of Qubole’s package management feature.

How To Run Airflow On Anaconda With Qubole

Step 1: Creating a cluster

  • From the cluster page, select the Airflow cluster with the Python version set to 3.5. This will automatically attach this cluster to an Anaconda environment.

  • A new Airflow cluster will be created and can then be used.

Step 2: Adding packages

  • Various Python packages can be installed on the cluster from the Qubole Environments page without restarting the cluster. Just open the page and select your cluster.

  • Add the package you require. The selected package will be installed in the Anaconda environment.

Step 3: Running shell commands on the cluster

  • Qubole provides the flexibility of performing various shell commands directly from the Analyze page.

 

With the steps shown above, we have demonstrated how you can simplify the building of your data pipelines with the help of Qubole. Now you can build, train, and deploy various machine learning/ data science pipelines effortlessly right on top of the Anaconda environment with the support of package management.


That the contents of third-party articles/blogs published here on the website, and the interpretation of all information in the article/blogs such as data, maps, numbers, opinions etc. displayed in the article/blogs and views or the opinions expressed within the content are solely of the author's; and do not reflect the opinions and beliefs of NASSCOM or its affiliates in any manner. NASSCOM does not take any liability w.r.t. content in any manner and will not be liable in any manner whatsoever for any kind of liability arising out of any act, error or omission. The contents of third-party article/blogs published, are provided solely as convenience; and the presence of these articles/blogs should not, under any circumstances, be considered as an endorsement of the contents by NASSCOM in any manner; and if you chose to access these articles/blogs , you do so at your own risk.


QuboleTechnologies

© Copyright nasscom. All Rights Reserved.