Configure Hadoop Using Ansible Playbook

Hello Geeks 🙋🏻‍♂️, I am back with one another interesting blog. In this blog i am going to automate Hadoop Cluster Configuration using Ansible Playbook, So let’s begin…

Nazim Dalwai
4 min readMar 17, 2021

⚪ Prerequisites :

 → Minimum Three Virtual Machines     1. Controller node     2. Master node     3. Slave node → Hadoop software & appropriate JDK in the controller nodes local system → And also have good network connectivity with the other virtual machines.

What is an Ansible ?

Ansible is a software tool that provides simple but powerful automation for cross-platform computer support. It is primarily intended for IT professionals, who use it for application deployment, updates on workstations and servers, cloud provisioning, configuration management, intra-service orchestration, and nearly anything a systems administrator does on a weekly or daily basis. Ansible doesn’t depend on agent software and has no additional security infrastructure, so it’s easy to deploy.

Because Ansible is all about automation, it requires instructions to accomplish each job. With everything written down in simple script form, it’s easy to do version control. The practical result of this is a major contribution to the “infrastructure as code” movement in IT: the idea that the maintenance of server and client infrastructure can and should be treated the same as software development, with repositories of self-documenting, proven, and executable solutions capable of running an organization regardless of staff changes.

What is an Ansible Playbook ?

An Ansible playbook is a blueprint of automation tasks — which are complex IT actions executed with limited or no human involvement. Ansible playbooks are executed on a set, group, or classification of hosts, which together make up an Ansible inventory.

Ansible playbooks are essentially frameworks, which are prewritten code developers can use ad-hoc or as starting template. Ansible playbooks are regularly used to automate IT infrastructure (such as operating systems and Kubernetes platforms), networks, security systems, and developer personas (such as Git and Red Hat CodeReady Studio).

Ansible playbooks help IT staff program applications, services, server nodes, or other devices without the manual overhead of creating everything from scratch. And Ansible playbooks — as well as the conditions, variables, and tasks within them — can be saved, shared, or reused indefinitely.

Step to Configure Ansible Playbook for Hadoop

Software Configuration in master node and slave Node :

→ Update the ansible inventory for Master IP & Slave IP

→ Copy the software and install in both nodes Master and Slave

Commands -
→ ansible-playbook hadoop.yml :- to run ansible playbook
→ java -version :- to check installed java version
→ hadoop version:- to check installed hadoop version

As you can see after running the playbook, hadoop & jdk software’s are successfully installed on the target nodes.

Configuration of Master Node using Playbook

→ Create the directory on the Master Node - \NN : To create directory on Target Nodes, file module is used

  • Copy the Hadoop Master Configuration Files from Master Node to the Controller Node & edit the Configuration Files and then after start the service using the Shell Module: Shell module helps to run command directly on the target nodes
hdfs-site.xml
core-site.xml

After editing the configuration files save it to the current workspace, now we are going to write code to copy this file into master nodes configuration location files so that master gets configured.

Configuration of Slave Node using Playbook

To do the configuration of slave node we have to do same process with few changes

hadoop.yml file

- hosts: all
tasks:
- copy:
src: "/root/hadoop-1.2.1-1.x86_64.rpm"
dest: "/root/hadoop-1.rpm"- copy:
src: "/root/jdk-8u171-linux-x64.rpm"
dest: "/root/jdk-1.rpm"

- command: "rpm -i /root/jdk-1.rpm --force"

- command: "rpm -i /root/hadoop-1.rpm --force"- hosts: master
tasks:
- file:
state: directory
path: "/NN"- copy:
src: "/ansible-hadoop-ws/hdfs-site-master.xml"
dest: "/etc/hadoop/hdfs-site.xml"- copy:
src: "/ansible-hadoop-ws/core-site.xml"
dest: "/etc/hadoop/core-site.xml"- shell: "echo Y|hadoop namenode -format"- shell: "hadoop-daemon.sh start namenode"- hosts: slave
tasks:
- file:
state: directory
dest: "/DN"- copy:
src: "/ansible-hadoop-ws/hdfs-site-slave.xml"
dest: "/etc/hadoop/hdfs-site.xml"- copy:
src: "/ansible-hadoop-ws/core-site.xml"
dest: "/etc/hadoop/core-site.xml"- shell: "hadoop-daemon.sh start datanode"

After Running Playbook Successfully:

Thank - You for Reading…😀😉

--

--