HDFS Lab - Beginners

 


Command to Dump data from Linux to HDFS



I have created a file stocks.csv in Linux   "/root/Labs/demos" folder using vi editor.  This file consist of csv data.  We can use put command to move data but we have so many options available to use



Step 1. Upload File to HDFS


1. Try putting this file into HDFS with a block size of 30 bytes using below command .

2. Command: hadoop fs -D dfs.blocksize=30  -put stocks.csv  stocks.csv
Notice: 30 bytes is not a valid block size. the block size need to be at least 1048576 according to the dfs.namenode.fs-limits.min-block-size property.
Error: put. Specified is less than configured minimum value (dfs.namenode.fs-limits.min-block-size):30<1048576

Error: put. Specified is less than configured minimum value (dfs.namenode.fs-limits.min-block-size) algaestudy




 
3. Try to put again, but this time use block size value 2,000,000

4. Command: hadoop fs -D dfs.blocksize=2000000 -put stocks.csv  stocks.csv
Error:  put: io.bytes.per.checksum(512) and blockSize(2000000) do not match. blockSize should be a multiple of io.bytes.per.checksum

put: io.bytes.per.checksum(512) and blockSize(2000000) do not match. blockSize should be a multiple of io.bytes.per.checksum algaeservices


Notice: 2,000,000 is not a valid block size because it is not a multiple of  512 (the checksum size)
 

5. Try to put again, but this time use block size value 1,048,576

6. Command: hadoop fs -D dfs.blocksize=1048576 -put stocks.csv  stocks.csv

7. No output, just ready for new command means your data is inserted from Linux VM to HDFS
hadoop fs -D dfs.blocksize=1048576 -put stocks.csv  stocks.csv algaestudy.com


8. Now to verify if data actually stored in hdfs, we will use"ls" command

9. Command: hadoop fs -ls

10. Below is output:

hadoop fs -ls algaestudy  output list of all files in hadoop



11. Do I really need to mention block size every time? No we don't need to. Just use below command
  • Command: hadoop fs -put  stocks.csv  stocks1.csv
 
 
 

Step 2. View The No. of Blocks

 

1. Run the below command to view no. of blocks created for our file stocks.csv.

2. Command: hdfs fsck  /user/root/stocks.csv


Command: hdfs fsck  /user/root/stocks.csv algaeservices tutorials


3. If you have notice we have 4 blocks with block size 903299 Byte

 

                    

 

Step 3. Find Actual Blocks

 

1. Enter the fsck command as earlier along with -files and -block options.

2. Command: hdfs fsck  /user/root/stocks.csv  -files -blocks

3. Output contains block id's, which coincidentally are the name of the files on the data nodes.

               

Command: hdfs fsck  /user/root/stocks.csv  -files -blocks  ouput list of files in hadoop algaestudy tutorials



(Note: IP Address for your system can be different) 



4. Here check the filezilla in Linux VM  path "/hadoop/hdfs/data/current/BP-1200952396-10.0.2.15-1398089695400/current/finalized/subdir54

 

Here check the filezilla in Linux VM  path "/hadoop/hdfs/data/current/BP-1200952396-10.0.2.15-1398089695400/current/finalized/subdir54  #algaestudy tutorials hadoop lab

 

5. Now we will see the content of blocks using tail command

6. Command: tail /hadoop/hdfs/data/current/BP-1200952396-10.0.2.15-1398089695400/current/finalized/subdir54/blk_1073742589

Command: tail /hadoop/hdfs/data/current/BP-1200952396-10.0.2.15-1398089695400/current/finalized/subdir54/blk_1073742589


7. If you check path in screenshot in Filezilla, There are 4 blocks. Three of them are of same size i.e. 1048576 and 4th is 467470 bytes.

8. Select the sandbox instance and click the  Play virtual machine icon at right bottom corner. 


9. The VM will start, which may take several minutes.  Once the VM startup is complete, the console should look like the following



No comments:
Write comments

Please do not enter spam links

Meet US

Services

More Services