Apache Sqoop is a tool to bulk import/export data into Hadoop ecosystem (HDFS, HBase or Hive).
- works with number of databases and commercial data-warehouses.
- available as command line tool, can be used in java with passing appropriate arguments.
- graduated from the Incubator & became top-level-project in ASF.
Fig : Sqoop 2 Architecture diagram taken from Cloudera.com |
Example :
alok@ubuntu:~/apache/sqoop-1.4.2$ bin/sqoop import --connect jdbc:mysql://<hostname>:3306/<dbname> --username <user> -P --driver com.mysql.jdbc.Driver --table <tablename> --hbase-table <hbase-tablename> --column-family <hbase-columnFamily> --hbase-create-table
or use it like this in your java programs -
ArrayList<String> args = new ArrayList<String>();
args.add("--connect");
args.add("jdbc:mysql://<hostname>:3306/<dbname>");
args.add("--username");
args.add("<user>");
args.add("--driver");
args.add("com.mysql.jdbc.Driver");
args.add("--table");
args.add("<tablename>");
args.add("--hbase-table");
args.add("<hbase-tablename>");
args.add("--column-family");
args.add("<hbase-colFamilyName>");
args.add("--hbase-create-table");
args.add("--num-mappers");
args.add("2");
int ret = Sqoop.runTool(args.toArray(new String[args.size()]));
- Sqoop can write data directly to HDFS or HBase or Hive.
- It can also export data back to RDBMS tables from Hadoop.
- Sqoop integrates with Oozie, allowing you to schedule and automate import and export tasks.
No comments:
Post a Comment