How to Import Data from MySQL to HDFS Using Sqoop
Abstract: 53 INFO mapreduce.Job//192.168.1.33/linoxide --table employee --username root -P --target-dir /sqoop_out -m 1 Enter the Password of your mysql when as
Apache Sqoop is a tool in Hadoop ecosystem which is used to import/export data between RDBMS and HDFS. This data is in structured format and has a schema. There are multiple cases where you want to analyze some data in your RDBMS, but due to huge size of data your RDBMS is not capable enough to process that big data. Hence, you can use Sqoop in such situations and send this data on HDFS where storing and processing big data is not a problem. In this blog, I will show you how to send data from MySQL to HDFS using Sqoop Import.
MySql to HDFS Using SqoopTo show this example, let me create a table in mysql which is on my windows machine and put some data in it.
Create a Database named 'linoxide', then create a Table named 'employee' by executing the below command:
How to insert data in PHP XAMPP Ser...To view this video please enable JavaScript, and consider upgrading to a web browser that supports HTML5 video
How to insert data in PHP XAMPP ServerCommand: create table employee ( id int, name varchar(20), dept varchar(20) );
Insert values in table employee and check the rows present in table employee by executing the below command:
Command: select * from employee;
Now, Open Command Prompt (CMD) on Windows and check the IPv4 Address of your LAN (most cases vmnet1) connection by executing the below command:
Find out the IPv4 address of your system using the above step. In my case it is 192.168.1.33
Grant all privileges to root@your_ipv4_address by executing the below command.
Command: grant all privileges on *.* to [email protected] IDENTIFIED BY ‘root’ WITH GRANT OPTION;
Download connector of MySQL on your machine where sqoop is installed using below links. This connector are needed to make connection between Sqoop and Mysql db.
MySQL connector : Download
Move mysql connector to the lib folder of sqoop by executing the below command:
Command: sudo mv Downloads/mysql-connector-java-5.1.26-bin.jar /home/hadoop/sqoop/lib
Now we are ready to run sqoop import command.
Required items for the command:
IPv4 Address – Your IPv4 address. In my case it is 192.168.1.33
Database Name – linoxide
Table Name – employee
Username – root
Output Directory – Could be any, I have used sqoop_out
Command: bin/sqoop import --connect jdbc:mysql://192.168.1.33/linoxide --table employee --username root -P --target-dir /sqoop_out -m 1
Enter the Password of your mysql when asked.
Import the table employee present in MySQL database to hdfs by executing the below command.
hadoop@hadoop-VirtualBox:~/sqoop$ bin/sqoop import --connect jdbc:mysql://192.168.
1.33/linoxide --table employee --username root -P --target-dir /sqoop_out -m 1
16/12/25 03:03:04 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5
Enter password:
16/12/25 03:03:09 INFO manager.MySQLManager: Preparing to use a MySQL streaming
resultset.
16/12/25 03:03:09 INFO tool.CodeGenTool: Beginning code generation
Sun Dec 25 03:03:10 IST 2016 WARN: Establishing SSL connection without server's
identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and
5.7.6+ requirements SSL connection must be established by default if explicit
option isn't set. For compliance with existing applications not using SSL the
verifyServerCertificate property is set to 'false'. You need either to explicitly
disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for
server certificate verification.
16/12/25 03:03:11 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM
`employee` AS t LIMIT 1
16/12/25 03:03:11 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM
`employee` AS t LIMIT 1
16/12/25 03:03:11 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/
hadoop-2.7.3
16/12/25 03:03:21 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/
compile/33bf897949f7b40e3cc1f5b5afae7cb3/employee.jar
16/12/25 03:03:21 INFO manager.MySQLManager: Setting zero DATETIME behavior to
convertToNull (mysql)
16/12/25 03:03:21 INFO mapreduce.ImportJobBase: Beginning import of employee
16/12/25 03:03:23 INFO Configuration.deprecation: mapred.jar is deprecated.
Instead, use mapreduce.job.jar
16/12/25 03:03:28 INFO Configuration.deprecation: mapred.map.tasks is deprecated.
Instead, use mapreduce.job.maps
16/12/25 03:03:28 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:
8032
Sun Dec 25 03:03:37 IST 2016 WARN: Establishing SSL connection without server's
identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and
5.7.6+ requirements SSL connection must be established by default if explicit
option isn't set. For compliance with existing applications not using SSL the
verifyServerCertificate property is set to 'false'. You need either to explicitly
disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for
server certificate verification.
16/12/25 03:03:37 INFO db.DBInputFormat: Using read commited transaction isolation
16/12/25 03:03:38 INFO mapreduce.JobSubmitter: number of splits:1
16/12/25 03:03:39 INFO mapreduce.JobSubmitter: Submitting tokens for job:
job_1482614949106_0001
16/12/25 03:03:42 INFO impl.YarnClientImpl: Submitted application
application_1482614949106_0001
16/12/25 03:03:43 INFO mapreduce.Job: The url to track the job:
http://hadoop-VirtualBox:8088/proxy/application_1482614949106_0001/
16/12/25 03:03:43 INFO mapreduce.Job: Running job: job_1482614949106_0001
16/12/25 03:04:25 INFO mapreduce.Job: Job job_1482614949106_0001 running in
uber mode : false
16/12/25 03:04:25 INFO mapreduce.Job: map 0% reduce 0%
16/12/25 03:04:51 INFO mapreduce.Job: map 100% reduce 0%
16/12/25 03:04:52 INFO mapreduce.Job: Job job_1482614949106_0001 completed
successfully
16/12/25 03:04:53 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=127675
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=87
HDFS: Number of bytes written=86
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=20382
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=20382
Total vcore-milliseconds taken by all map tasks=20382
Total megabyte-milliseconds taken by all map tasks=20871168
Map-Reduce Framework
Map input records=5
Map output records=5
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=210
CPU time spent (ms)=2780
Physical memory (bytes) snapshot=118214656
Virtual memory (bytes) snapshot=1904357376
Total committed heap usage (bytes)=54853632
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=86
16/12/25 03:04:53 INFO mapreduce.ImportJobBase: Transferred 86 bytes in 85.6488
seconds (1.0041 bytes/sec)
16/12/25 03:04:53 INFO mapreduce.ImportJobBase: Retrieved 5 records.
Once the job completes, check the output stored on HDFS
hadoop@hadoop-VirtualBox:~/sqoop$ hdfs dfs -cat /sqoop_out/part-m-00000
1,John,sales
2,Alex,Development
3,Jessica,Marketing
4,Luke,Operation
5,Lisa,Marketing
hadoop@hadoop-VirtualBox:~/sqoop$
Oracle to HDFS
Similarly, to send the data from Oracle database to HDFS, you will need to put oracle connector in sqoop's lib directory.
Oracle Connector : Download
Below are the required items for the command to import data from oracle to HDFS:
IPv4 Address – Your IPv4 address. In my case it is 192.168.1.33
Database Name – linoxide
Table Name – employee
Username – root
Output Directory – sqoop_out
Command - sudo bin/sqoop import –connect jdbc:oracle:thin:system/[email protected]:1521:xe --username system -P --table system.emp –columns 「ID」 –target-dir /sqoop_out-m 1
ConclusionCongratulations! You have successfully transferred data from MySQL to HDFS using Apache Sqoop. Similarly, you can transfer structured data from any RDBMS to HDFS that you want to process, make sure that you put the connectors in sqoop's lib directory before running import/export command.