Many tools use Hadoop as backend for performing some jobs. For example we can use Kafka (or HDFS) as stage area for Oracle Data Integrator or GoldenGate. Usually it better to install separate node which will be used by ODI or GoldenGate exclusively because if will install them on Hadoop node then they will interference with other workload. And because Hadoop is cluster. Each node does its work and whole job is not finished until last node is finished. So caravans move at the speed of the slowest camel.
Hadoop vendors call such special node “Edge” or “Gateway”. They don’t contain any data, don’t participate in data process but host client software and Hadoop configuration. Let’s look how to install such node. I will use Cloudera distribution and Cloudera Manager as management tool.
Why do we need to configure Edge nodes using tools like Cloudera Manager or Ambari? Because software and configuration should be refreshed. We shouldn’t bother if somebody add new Kafka broker or changed Zookeeper host. That’s why management tool does this.
So let’s start.
1. You should add host to Hadoop cluster. So go to Hosts->All Hosts. Then click «Add New Hosts To Cluster». Click Next on the first page.
2. Enter hostname you want to use as edge node and click «Search». When host will be inspected click «Continue»
3. Agree to install same agent version «Matched release for this Cloudera Manager Server», click «Continue»
4. Agree to install Oracle Java SE, click «Continue»
5. Enter password for root, click «Continue»
6. Wait until installation is finished.
Now we can add required «client» roles for our node. All these roles have name containing «Gateway». For example if we need to add host to Hadoop then:
1. Click on HDFS->Instances.
3. Then press OK and Continue
4. You will see than Redeployment is needed. Let’s it.