DEFINE A SCHEMA
Continue on whitehouse dataset
Step 1: Creating Schema for a relation
- Load the whitehouse data again but this time use the PigStorage loader and also define partial schema
- Command: grunt> B = LOAD '/user/root/whitehouse/visits.txt' USING PigStorage(',') AS (lname:chararray, fname:chararray, mname:chararray, id:chararray, status:chararray, state:chararray , arrival:chararray);.
- Use the DESCRIBE command to view the schema
- Command: grunt> Describe B
Step 2: THE STORE COMMAND
- Enter the following STORE command, which stores the B relation into a folder named whouse_tab and separates the field of each records with tabs.
- Command: grunt> store B into 'whouse_tab' USING PigStorage('\t');
- Verify the Whouse-tab folder is created
- grunt> ls whouse_tab ;
- You should see two map output files.
- View one of the output files to verify they contain the B relation in tab-delimited format.
- Command: grunt> Cat whouse_tab/part-m-00000 ;
- Each record should contain 7 fields
- What happen to the rest of fields from the data that was loaded from whitehouse/visits.txt
Step 3: USE A DIFFERENT STORER
- We are going to store same relation as previous step but using json format.
- Command: grunt> STORE B INTO 'whouse-json' USING JsonStorage();
- Verify the whouse_json folder is created
- Command: grunt> ls whouse_json
- View one of the output files
- Command: grunt> CAT whouse_json\Part-m-00000
- Notice that the scema you defined for the B relation was used to create the format of each json
No comments:
Write commentsPlease do not enter spam links