Technospy-Your Secret Technology Agent: 2012

Oracle R Connector for Hadoop(ORCH) is a package that provides you with a way of interacting with Hadoop using your local R. By using it, you can copy data between R memory, the local filesystem and HDFS. Besides this you can schedule R programs to execute map-reduce jobs in Hadoop and return the data to any of the above mentioned locations.

Pre-requisites for installing ORCH

JVM
R distribution 2.13.2 and above with all base libraries on all nodes in the Hadoop cluster.

NOTE: You will need to install the ORCH package on each R distribution on each Hadoop node.

Steps for installing ORCH

Set the environment variables for Hadoop and Java as follows:

export HADOOP_HOME=/path/to/your/Hadoop/home

export JAVA_HOME=/path/to/your/java/home
Download the package from here and unzip the downloaded file.

$ unzip orhc.tgz.zip

NOTE: If you downloaded just a .zip file, after unzip please manually convert it to .tgz file. For example I got a folder named 'orch' after unzip and I used the following command to create the .tgz file :

$ tar cvzf orch.tgz orch
Install the package using the following command:

$ R CMD INSTALL orhc.tgz
Alternatively you can open R and install the package as follows:

> install.packages("/path/to/orhc.tgz", repos=NULL)

You are done with the installation now. Cheers!!

Technospy-Your Secret Technology Agent

Thursday, August 30, 2012

Installing Oracle R Connector For Hadoop on Linux machine