Partitioning in Hive - Demo


 Implement Partitioning in Hive

The partitioning in Hive means dividing the table into some parts based on the values of a particular column like date, course, city or country.

Let's assume we have a data of 10 million students studying in an institute. Now, we have to fetch the students of a particular course. If we use a traditional approach, we have to read the entire data leads to performance degradation. The better approach will be to  partitioning the table in Hive and divide the data among the different datasets based on particular columns. The advantage of partitioning is that since the data is stored in slices, the query response time becomes faster.

Types of Partitioning

There are 2 Types of Partitioning in Hive

Static Partitioning

  • It is required to pass the values of partitioned columns manually while loading the data into the table.
  • Insert input data files individually into each partition table is Static Partition

Dynamic Partitioning

  • Single insert to partition table (all partitions in one go) is known as a dynamic partition.
  • Usually, dynamic partition loads the data from the non-partitioned table.

Partitioning  Demo -Static

Step 1. Create 2 files 

  • Stud_M   (for male students)
  • Stud_F (For female students)

bucketing algae study in hive

Step 2. Copy both files in hdfs

algae study cloudera hadoop

Step 3. Create hive table with partition on gender

hdfs dfs commands algaeservices in hadoop hive big data

Step 4. Load stud_m with partition gender =m

hdfs dfs commands in hadoop hive big data algaestudy

Step 5. Load stud_f with partiton gender = f hadoop big data

Step 6. Validate physical location of the file

hdfs dfs commands in hadoop hive big data

Step 7. Validate the data

Hive Course: Hive-SQL - Video-tutorial

No comments:
Write comments

Please do not enter spam links

Meet US


More Services