The fastest shortcut in the world , It's down to earth , This article has been included 【 Architecture technology column 】 Focus on this place where you like to share .
Recently, because of the need to reuse Ambari Set up a set of Hadoop colony , I recorded the construction process , I also hope to give a reference to the partners who have the same needs ,
author ： Header data
Ambari Ubuntu14.04 The latest version 2.2.1
HDP Ubuntu14.04 The latest version 126.96.36.199
Ambari What is it?
Apache Ambari It's based on Web Tools for , Support Apache Hadoop The supply of clusters 、 Management and monitoring .
Ambari Has supported most of Hadoop Components , Include HDFS、MapReduce、Hive、Pig、 Hbase、Zookeeper、Sqoop and Hcatalog etc. .
Apache Ambari Support HDFS、MapReduce、Hive、Pig、Hbase、Zookeepr、Sqoop and Hcatalog And so on . It's also 5 Top tier hadoop One of the management tools .（ It's an open source hadoop One click installation service ）
What can we do with him ？ Why should we use it ？
We can use ambari Fast build and manage hadoop And frequently used service components .
such as hdfs、yarn、hive、hbase、oozie、sqoop、flume、zookeeper、kafka wait .（ To put it bluntly, you can steal a lot of laziness ）
And why do we use it
- The first is the ambari It's an early Hadoop Manage cluster tools
- The second is mainly now Hadoop The official website also recommends the use of Ambari.
- Cluster provisioning is simplified through step-by-step installation wizard .
- Pre configure the key operation and maintenance indicators （metrics）, Can be viewed directly Hadoop Core（HDFS and MapReduce） And related projects （ Such as HBase、Hive and HCatalog） Is it healthy? .
- Support visualization and analysis of job and task execution , Better view of dependencies and performance .
- Through a complete RESTful API Expose the surveillance information , Integration of existing operation and maintenance tools .
- The user interface is very intuitive , Users can easily and effectively view information and control the cluster .
Ambari Use Ganglia Collect metrics , use Nagios Support system alarm , When you need to get the attention of the Administrator （ such as , Problems such as node downtime or insufficient disk space ）, The system will send an email to it .
Besides ,Ambari Be able to install secure （ be based on Kerberos）Hadoop colony , In this way, the right Hadoop Security support , Provides role-based user authentication 、 Authorization and audit functions , And integrated for user management LDAP and Active Directory.
1、 Let's do some preparation before installation
## Tell the servers who they are first , What's the nickname （ Modify the configuration hosts file ） vim /etc/hosts 10.1.10.1 master 10.1.10.2 slave1 10.1.10.3 slave2 ## Then let's go in and out of their house freely with the access card Stop at the station （ Configure password free login ） ssh-keygen -t rsa ## Execute on all machines cat ~/.ssh/id_rsa.pub ## View public key cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys ## Write the public key to authorized_keys In file ### First, write all the public keys to master The server ### Next, put master Don't write the public key of slave1,slave2 ### Finally using scp Order the password to be told （ I won't tell you my password is “ What time is it ”） scp ~/.ssh/authorized_keys slave1:~/.ssh/authorized_keys scp ~/.ssh/authorized_keys slave2:~/.ssh/authorized_keys ## Update time zone and system localization configuration apt-get install localepurge ## a meal enter Don't mind （ Unload the unused local Translating documents ） dpkg-reconfigure localepurge && locale-gen zh_CN.UTF-8 en_US.UTF-8 ## a meal enter Don't mind apt-get update && apt-get install -y tzdata echo "Asia/Shanghai" > /etc/timezone ## Change the time zone to Shanghai rm /etc/localtime dpkg-reconfigure -f noninteractive tzdata vi /etc/ntp.conf server 10.1.10.1
2、 And then doing something Ubuntu System optimization
###1.1 Turn off swap partition swapoff -a vim /etc/fstab ## Delete Note swap That line It's like this # swap was on /dev/sda2 during installation #UUID=8aba5009-d557-4a4a-8fd6-8e6e8c687714 none swap sw 0 0 ### 1.2 Modify the number of open file descriptors Add at the end ulimit vi /etc/profile ulimit -SHn 512000 vim /etc/security/limits.conf ## The size increases 10 times * soft nofile 600000 * hard nofile 655350 * soft nproc 600000 * hard nproc 655350 ### 1.2 Use the command to make the change effective source /etc/profile ###1.3 Modify kernel configuration vi /etc/sysctl.conf ### Just stick it up fs.file-max = 65535000 net.core.somaxconn = 30000 vm.swappiness = 0 net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 net.ipv4.tcp_rmem = 4096 87380 16777216 net.ipv4.tcp_wmem = 4096 16384 16777216 net.core.netdev_max_backlog = 16384 net.ipv4.tcp_max_syn_backlog = 8192 net.ipv4.tcp_syncookies = 1 net.ipv4.tcp_tw_reuse = 1 net.ipv4.tcp_tw_recycle = 1 net.ipv4.ip_local_port_range = 1024 65000 net.ipv6.conf.all.disable_ipv6=1 net.ipv6.conf.default.disable_ipv6=1 net.ipv6.conf.lo.disable_ipv6=1 ### Execute the command to make the configuration work sysctl -p ###1.4 Configure kernel shutdown THP function echo never > /sys/kernel/mm/transparent_hugepage/enabled ## Permanent ban . vi /etc/rc.local if test -f /sys/kernel/mm/transparent_hugepage/enabled; then echo never > /sys/kernel/mm/transparent_hugepage/enabled fi if test -f /sys/kernel/mm/transparent_hugepage/defrag; then echo never > /sys/kernel/mm/transparent_hugepage/defrag fi
3、 Installation and deployment ambari-server ( Environmental Science ：Ubuntu 14.04 + Ambari 2.2.1)
## Update download source wget -O /etc/apt/sources.list.d/ambari.list http://public-repo-1.hortonworks.com/ambari/ubuntu14/2.x/updates/188.8.131.52/ambari.list apt-key adv --recv-keys --keyserver keyserver.ubuntu.com B9733A7A07513CAD apt-get update ## stay master Node installation ambari-server apt-get install ambari-server -y ## Install on all nodes ambari-agent apt-get install ambari-agent -y
4、 modify ambari-agent Configuration point ambari-server
vi /etc/ambari-agent/conf/ambari-agent.ini ## modify hostname [server] hostname=master url_port=8440 secured_url_port=8441 ## initialization ambari-server To configure ambari service Database, JDK( Default 1.7), LDAP Generally choose default ambari-server setup ## Go crazy enter ## start-up ambari ambari-server start ambari-agent start
5、 After a headache Shell command , It's time to connect some human things .
Use your browser to access http://10.1.10.1:8080/ The account password defaults to
amdin/admin Click on
LAUNCH INSTALL WIZARD Let's start happily
6、 Give the cluster a name
7、 There's a little bit of attention here to make sure your hdp Version, or there will be trouble later
**8、 What I have configured in this is HDP2.4.3 **
Click on next It will check whether the data source is normal , If you report an error here, you can click "Skip Repository Base URL validation (Advanced) " Do a skip check
9、 fill hostname master slave1 slave2 Because in slave install ambari-agent So choose not to use ssh
10、 Check the server status -- We need to wait here If the waiting time is too long, you can restart ambari-server
11、 Choose the services we need HDFS YARN ZK
12、 Use it directly Ambari Default allocation method Click next to start the installation
13、 Now it's time to think about the speed of the Internet
14、 All the way after installation Next Refresh the main page to see our Hadoop The cluster is started by default
15、 Get into HDFS Next Click on restart ALL You can restart all components
16、 Verify that the installation was successful Click on NameNodeUI
17、 Basic information page
18、Hadoop It's finished. Don't you want to run a task ？
## Enter the server to execute ### establish hdfs Catalog You can do it again http://master:50070/explorer.html#/ Interface hdfs dfs -mkdir -p /data/input ### Upload files from the server to hdfs On hdfs dfs -put file /data/input/ ### Use the example provided on the official website to test hadoop jar hdfs://tesla-cluster/data/hadoop-mapreduce-examples-184.108.40.206.4.0.0-169.jar wordcount /data/input /data/output1
19、 give the result as follows Generate _SUCCESS And documents
The following is an unorthodox account
finally , Through the above steps, we built a set of hadoop colony , But then there are some problems ,NameNode and ResouceManage It's all a single point model ,ambari Support HA( High availability ) Because the space is limited , There will be a separate one at the back of the head .