Thursday, August 30, 2012

Installing Oracle R Connector For Hadoop on Linux machine

Oracle R Connector for Hadoop(ORCH) is a package that provides you with a way of interacting with Hadoop using your local R. By using it, you can copy data between R memory, the local filesystem and HDFS. Besides this you can schedule R programs to execute map-reduce jobs in Hadoop and return the data to any of the above mentioned locations.


Pre-requisites for installing ORCH
  • JVM
  • R distribution 2.13.2 and above with all base libraries on all nodes in the Hadoop cluster.

NOTE: You will need to install the ORCH package on each R distribution on each Hadoop node.


Steps for installing ORCH
  1. Set the environment variables for Hadoop and Java as follows:

    export HADOOP_HOME=/path/to/your/Hadoop/home

    export JAVA_HOME=/path/to/your/java/home
  2. Download the package from here and unzip the downloaded file.

    $ unzip orhc.tgz.zip

    NOTE: If you downloaded just a .zip file, after unzip please manually convert it to .tgz file. For example I got a folder named 'orch' after unzip and I used the following command to create the .tgz file :

      $ tar cvzf orch.tgz orch
  3. Install the package using the following command:

    $ R CMD INSTALL orhc.tgz
  4. Alternatively you can open R and install the package as follows:

    > install.packages("/path/to/orhc.tgz", repos=NULL)

You are done with the installation now. Cheers!!