Pig Lab - Defining Schema Data Types in PIG ETL

 



DEFINE A SCHEMA


Continue on whitehouse dataset

Step 1: Creating Schema for a relation


  • Load the whitehouse data again but this time use the PigStorage loader and also define partial schema
    • Command: grunt> B = LOAD '/user/root/whitehouse/visits.txt' USING PigStorage(',') AS (lname:chararray, fname:chararray, mname:chararray, id:chararray, status:chararray, state:chararray , arrival:chararray);.
Pig language, ETL, data science, algae services, algae study hadoop hdfs,




  • Use the DESCRIBE command to view the schema
    • Command: grunt> Describe B


Pig language, ETL, data science, algae services, algae study hadoop hdfs,







Step 2: THE STORE COMMAND



  • Enter the following STORE command, which stores the B relation into a folder named whouse_tab and separates the field of each records with tabs.
  • Command: grunt> store B into 'whouse_tab' USING PigStorage('\t');

Pig language, ETL, data science, algae services, algae study hadoop hdfs,


  • Verify the Whouse-tab folder is created
  • grunt> ls whouse_tab ;

  • You should see two map output files.
  • View one of the output files to verify they contain the B relation in tab-delimited format.
  • Command: grunt> Cat whouse_tab/part-m-00000 ;

  • Each record should contain 7 fields
  • What happen to the rest of fields from the data that was loaded from whitehouse/visits.txt

 

Step 3: USE A DIFFERENT STORER



  • We are going to store same relation as previous step but using json format.
  • Command: grunt> STORE B INTO 'whouse-json' USING JsonStorage();
Pig language, ETL, data science, algae services, algae study hadoop hdfs,






  • Verify the whouse_json folder is created
  • Command: grunt> ls whouse_json

  • View one of the output files
  • Command: grunt> CAT whouse_json\Part-m-00000

  • Notice that the scema you defined for the B relation was used to create the format of each json 







No comments:
Write comments

Please do not enter spam links

Meet US

Services

More Services