Hive tables are very important when it comes to Hadoop and Spark as both can integrate and process the tables in Hive.
Let's see how we can create a hive table that internally stores the records in it in a parquet fashion.
Storing a hive table as a parquet file with a snappy compression in a traditional hive shell
- Create a hive table called transaction and load it with records using the load command.
create table transaction(no int,tdate string,userno int,amt int,pro string,city string,pay string) row format delimited fields terminated by ',';
load data local inpath '/home/cloudera/online/hive/transactions' into table transaction;
- Create another hive table named tran_snappy with storage type as parquet and compression technique as snappy.
create table tran_snappy(no int,tdate string,userno int,amt int,pro string,city string,pay string) stored as parquet tblproperties('parquet.compression' = 'SNAPPY');
- Insert the second table with records from the first table.
insert into table tran_snappy select * from transaction;
- Go to the /user/hive/warehouse directory to check whether the file is in snappy format or not.
Storing a hive table as a parquet file with a snappy compression in spark sql
1.Import the hive context in the spark shell and create and load the hive table in a parquet format.
Import org.apache.spark.sql.hive.HiveContext
Val sqlContext = new HiveContext(sc)
Scala> sqlContext.sql(“create table transaction(no int,tdate string,userno int,amt int,pro string,city string,pay string) row format delimited fields terminated by ','
”)
2.Load the created table
Scala>sqlContext.sql(“load data local inpath '/home/cloudera/online/hive/transactions' into table transaction”)
3.Create a snappy compressed parquet table
Scala>sqlContext.sql(“create table tran_snappy(no int,tdate string,userno int,amt int,pro string,city string,pay string) stored as parquet tblproperties('parquet.compression' = 'SNAPPY')”)
4.Load the table from the table gets created in the step 1.
Scala>val records_tran=sqlContext.sql(“select * from transaction”)
Scala>records_tran.insertInto(“tran_snappy”)
Now the records are inserted into the snappy compressed hive table. Go to the /user/hive/warehouse directory to check whether the parquet file gets generated for the corresponding table.