Beginners guide to Apache Airflow

A set of steps to accomplish a given Data Engineering task. These can include any given task, such as downloading a file, copying data, filtering information, writing to a database, and so forth.

A workflow is of varying levels of complexity. Some workflows may only have 2 or 3 steps, while others consist of hundreds of components.

What is Airflow?

Airflow is a platform to program workflows (general), including the creation, scheduling, and monitoring of workflows. Airflow implements workflows as DAGs, or Directed Acyclic Graphs.

Airflow can be accessed and controlled via code, via the command-line, or via…

#Hands on implementation in Python to find out Churn Rate

  • What is Churn rate?
  • Which Dataset are we using?
  • Python implementation

What is Churn rate?

“The annual percentage rate at which customers stop subscribing to a service or employees leave a job.” — Google search result

Python comprehensive guide for beginners

Python is interpreted, interactive, object-oriented, and widely used computer programming language. After reading this article you will have an understanding of how things work in python.

Trust me, it’s very easy to learn,read and implement !

  • Python Installation
  • Python - Hello World
  • Assigning Values to Variables
  • Python Strings
  • Python Lists
  • Python Tuples
  • Python Dictionary
  • Python Operator & Data Type conversion
  • User Input
  • Python Decision Making & Loops
  • Python Functions

Python Installation

You can install python from their official site

I personally use Spyder, it is a powerful scientific environment written in Python, for Python, and designed…

Project- Finding the most popular movies file

It’s a simple project where you can get the feel of how spark uses dataframes and do the manipulation on data as per requirement.

In this project we will try to find out the most popular movie using spark, it’s a basic project which can give you an understanding of how things work in spark.

I will be using Dataframes for the processing and python’s API pyspark. first, we need to download the datasets for our small project.

I am using the dataset given by movielens, you can download it from here


Basic Intro and Installation on MAC

1. Intro to Spark

1. Intro to Spark

According to Apache spark there one line definition is “A fast and general engine for large scale data processing”.

That’s actually a good summary of what it’s all about. Spark can manage to process a massive amount of data that can represent anything weblogs, live data or it could be anything.

A high-Level overview of Data processing


Spark can distribute the data amongst the cluster of computers and do the processing, remember it’s only an execution engine it has no storage.

Spark can collect data from various sources either real-time or batch processing…

Guide to clear Azure Fundamentals AZ-900 certification exam.

This is part 2 of Azure Fundamentals, it will help you to clear the AZ-900 Exam.

Link of part 1

2.2 Azure Core Services- Networking

2.2.1 Virtual Network

When we create a virtual machine in Azure it has to be a part of the virtual network. An azure virtual network is a home for your virtual machines, in the virtual network we specify the range of IP address and VMs gets the IP address as per our defined IP address range.

We can also do the subnetting of IP address range, subnetting is a logical separation of resources that we…

Guide to clear Azure Fundamentals AZ-900 certification exam.

1. Basic Definitions

  • 1.1 What is Cloud Computing
  • 1.2 Resources & Resource Group
  • 1.3 What is Subscription
  • 1.4 High Availability
  • 1.5 Cloud Service Models IAAS, PAAS, SAAS

2. Azure Core Services

  • 2.1.1 Creating a Virtual Machine
  • 2.1.2 Connecting to the VM
  • 2.1.3 VM Details- Types, Series
  • 2.1.4 Deleting a VM
  • 2.1.5 Availability Sets
  • 2.1.6 Availability Zones
  • 2.2.1 Virtual Network
  • 2.2.2 Network Security Groups
  • 2.2.3 Application Security Groups
  • 2.3.1 Creating a Storage account
  • 2.3.2 Types of Storage Accounts
  • 2.3.3 Service offered by Storage Accounts
  • 2.3.4 Storage Accounts- Replications
  • 2.3.5 Access Tiers
  • 2.3.6 Working with the blob service

3. More Azure Features

  • 3.1 Load Balancer

In this post you will learn about how to restore a PostgreSQL database in Point in time.

Open source databases are taking over other conventional Database softwares at very high speed. For me personally, I really like the way PostgreSQL is designed and it’s simple architecture.

I have tried to write it down every possible steps which will help you to restore your Postgres database.

Below Steps will be same, but few commands may vary with any other backup tool. PITR PostgreSQL

  • Allow you to restore database to a specific moment in time PITR.
  • It make use of live database…

Multiple ways of installation of PostgreSQL database

In this article, I will try to help you with the easiest ways of installation PostgreSQL database.

PostgreSQL (or postgres) is open-source free to use relational database management system (RDBMS). It has reliability, feature robustness, and performance.

PostgreSQL installation

  1. Interactive installation by Software EDB
  2. Yum Repository Based Installation

1. Interactive installation by Software EDB

EDB is a Company which provides paid support for postgreSQL database, and the paid version of postgreSQL DB is called as EDB advance server.

Here we are interested in downloading the open-source/free version so you can download the DB software from EDB official site…

What is Deep Learning, Future, Application, and Component of DL

  • What is Deep Learning
  • Why Deep learning is becoming popular now?
  • Future of Deep Learning.
  • Applications of Deep Learning.
  • Components of Deep Learning and how it works?

If you are interested in learning the basics of Machine learning then please read here.

Machine Learning- Quick and Easy Part-1

Machine Learning Quick and Easy Part-2

What is Deep Learning

Deep Learning is a subset of machine learning in AI. It utilizes a hierarchical level of artificial neural networks to carry out the process of machine learning. …

