Hive Lab Case Study


Prepare Data Set

  • I have downloaded the data set from kaggle
  • Fine name : sales data-set.csv
  • You can download dataset of your choice

Copy data from windows to Linux using Filezilla

  • Copied data using filezilla UI from windows drive to Linux system 

Copy data from Linux to HDFS

  • Create multiple copies in HDFS from Linux to try different operations in hive

hive -put  hortonworks cloudera code to move data from linux to hdfs

Create Table and Load data in Hive

  • create table sales(store int,dept int,date date,weekly_sales float, isholiday boolean) row format delimited fields terminated by ',';
  • load data inpath 'salesdataset' into table sales;

Hive create table, data load, algae

Query's to try now:

  • Find total rows in table

  • Extract Dept who did weekly max sale in a for a store

  • Which Store  did max weekly sale in a week

  • Highest average weekly sale of dept in a store

  • Highest average weekly sale of dept in a store

  • Find dept of store 1 who has 3rd highest max weekly sale

  • Which dept in each store is performing worst

  • Rank sales of each dept in each store as per there sales

