Hive Lab Case Study

 


Hive -  Lab - Case Study


Prepare Data Set


  • I have downloaded the data set from kaggle
  • https://www.kaggle.com/manjeetsingh/retaildataset
  • Fine name : sales data-set.csv
  • You can download dataset of your choice




Copy data from windows to Linux using Filezilla

  • Copied data using filezilla UI from windows drive to Linux system 


Copy data from Linux to HDFS


  • Create multiple copies in HDFS from Linux to try different operations in hive

hive -put  hortonworks cloudera code to move data from linux to hdfs


Create Table and Load data in Hive

  • create table sales(store int,dept int,date date,weekly_sales float, isholiday boolean) row format delimited fields terminated by ',';
  • load data inpath 'salesdataset' into table sales;

Hive create table, data load, algae




Query's to try now:




  • Find total rows in table

  • Extract Dept who did weekly max sale in a for a store

  • Which Store  did max weekly sale in a week

  • Highest average weekly sale of dept in a store

  • Highest average weekly sale of dept in a store

  • Find dept of store 1 who has 3rd highest max weekly sale

  • Which dept in each store is performing worst

  • Rank sales of each dept in each store as per there sales


No comments:
Write comments

Please do not enter spam links

Meet US

Services

More Services