项目作者: pradeepkumar-27

项目描述 :
Ansible role to configure Hadoop Distributed File System (HDFS)
高级语言: Jinja
项目地址: git://github.com/pradeepkumar-27/AnsibleRole-HadoopHDFS.git
创建时间: 2021-05-01T14:09:45Z
项目社区:https://github.com/pradeepkumar-27/AnsibleRole-HadoopHDFS

开源协议:

下载


HadoopHDFS

Ansible role to configure Hadoop Distributed File System.

Requirements

I have created this role to configure my HDFS servers on top of Amazom Web Services (AWS), hence I have created my custom Elastic Cloud Compute (EC2) Amazon Machine Image (AMI) on top of Amazon Linux 2 with JDK 8u171 and Hadoop 1.2.1 pre-installed. You can find my custom AMI ID “ami-01bb2347b233b5110” on AWS.

Role Variables

This role has three variables namely “nn_dir” which depicts the NameNode directory on the master node, “dn_dir” which depicts the DataNode directory on the slave nodes and “hdfs_port” which represents the port number on which the cluster works.

ansibele.cfg

Ansible configuration file to run this role

  1. [defaults]
  2. interpreter_python=auto_silent
  3. inventory = ./hosts
  4. roles_path = ./yourRolesPath (i.e the path where you have downloaded this role)
  5. host_key_checking = False
  6. remote_user = ec2-user
  7. private_key_file = ./yourKey.pem
  8. [privilege_escalation]
  9. become=True
  10. become_method=sudo
  11. become_user=root
  12. become_ask_pass=False

hosts

Ansible inventory file where you have to put the IP of the servers. Since I’m setting up the HDFS cluster with the intention to integrate it with Hadoop MapReduce cluster for data analysis on the stored BigData. Hence I’m configuring JobTracker and Client systems also.

  1. [NameNode]
  2. namenode
  3. [DataNodes]
  4. datanode1
  5. datanode2
  6. datanode3
  7. [JobTracker]
  8. jobtracker
  9. [Client]
  10. client
  11. [HDFS:children]
  12. NameNode
  13. DataNodes
  14. Client
  15. JobTracker

Example Playbook

  1. - hosts: HDFS
  2. roles:
  3. - role: HadoopHDFS
  4. vars:
  5. nn_dir: /nn
  6. dn_dir: /dn
  7. hdfs_port: 7001