Partitioning in Hive - Demo

 


 Implement Partitioning in Hive




The partitioning in Hive means dividing the table into some parts based on the values of a particular column like date, course, city or country.

Let's assume we have a data of 10 million students studying in an institute. Now, we have to fetch the students of a particular course. If we use a traditional approach, we have to read the entire data leads to performance degradation. The better approach will be to  partitioning the table in Hive and divide the data among the different datasets based on particular columns. The advantage of partitioning is that since the data is stored in slices, the query response time becomes faster.


Types of Partitioning


There are 2 Types of Partitioning in Hive

Static Partitioning

  • It is required to pass the values of partitioned columns manually while loading the data into the table.
  • Insert input data files individually into each partition table is Static Partition

Dynamic Partitioning

  • Single insert to partition table (all partitions in one go) is known as a dynamic partition.
  • Usually, dynamic partition loads the data from the non-partitioned table.



Partitioning  Demo -Static



Step 1. Create 2 files 

  • Stud_M   (for male students)
  • Stud_F (For female students)

bucketing algae study in hive



Step 2. Copy both files in hdfs


algae study cloudera hadoop

Step 3. Create hive table with partition on gender


hdfs dfs commands algaeservices in hadoop hive big data

Step 4. Load stud_m with partition gender =m


hdfs dfs commands in hadoop hive big data algaestudy

Step 5. Load stud_f with partiton gender = f


algaeservices.co.in hadoop big data

Step 6. Validate physical location of the file


hdfs dfs commands in hadoop hive big data


Step 7. Validate the data


algaestudy.com



Hive Course: Hive-SQL - Video-tutorial

No comments:
Write comments

Please do not enter spam links

Meet US

Services

More Services