Hive - Lab - Case Study
Prepare Data Set
- I have downloaded the data set from kaggle
- https://www.kaggle.com/manjeetsingh/retaildataset
- Fine name : sales data-set.csv
- You can download dataset of your choice
Copy data from windows to Linux using Filezilla
Copied data using filezilla UI from windows drive to Linux system
Copy data from Linux to HDFS
- Create multiple copies in HDFS from Linux to try different operations in hive
Create Table and Load data in Hive
- create table sales(store int,dept int,date date,weekly_sales float, isholiday boolean) row format delimited fields terminated by ',';
- load data inpath 'salesdataset' into table sales;
Query's to try now:
Find total rows in table
Extract Dept who did weekly max sale in a for a store
Which Store did max weekly sale in a week
Highest average weekly sale of dept in a store
Highest average weekly sale of dept in a store
Find dept of store 1 who has 3rd highest max weekly sale
Which dept in each store is performing worst
Rank sales of each dept in each store as per there sales
No comments:
Write commentsPlease do not enter spam links