HDFS Lab - Beginners


Command to Dump data from Linux to HDFS

I have created a file stocks.csv in Linux   "/root/Labs/demos" folder using vi editor.  This file consist of csv data.  We can use put command to move data but we have so many options available to use

Step 1. Upload File to HDFS

1. Try putting this file into HDFS with a block size of 30 bytes using below command .

2. Command: hadoop fs -D dfs.blocksize=30  -put stocks.csv  stocks.csv
Notice: 30 bytes is not a valid block size. the block size need to be at least 1048576 according to the dfs.namenode.fs-limits.min-block-size property.
Error: put. Specified is less than configured minimum value (dfs.namenode.fs-limits.min-block-size):30<1048576

Error: put. Specified is less than configured minimum value (dfs.namenode.fs-limits.min-block-size) algaestudy

3. Try to put again, but this time use block size value 2,000,000

4. Command: hadoop fs -D dfs.blocksize=2000000 -put stocks.csv  stocks.csv
Error:  put: io.bytes.per.checksum(512) and blockSize(2000000) do not match. blockSize should be a multiple of io.bytes.per.checksum

put: io.bytes.per.checksum(512) and blockSize(2000000) do not match. blockSize should be a multiple of io.bytes.per.checksum algaeservices

Notice: 2,000,000 is not a valid block size because it is not a multiple of  512 (the checksum size)

5. Try to put again, but this time use block size value 1,048,576

6. Command: hadoop fs -D dfs.blocksize=1048576 -put stocks.csv  stocks.csv

7. No output, just ready for new command means your data is inserted from Linux VM to HDFS
hadoop fs -D dfs.blocksize=1048576 -put stocks.csv  stocks.csv algaestudy.com

8. Now to verify if data actually stored in hdfs, we will use"ls" command

9. Command: hadoop fs -ls

10. Below is output:

hadoop fs -ls algaestudy  output list of all files in hadoop

11. Do I really need to mention block size every time? No we don't need to. Just use below command
  • Command: hadoop fs -put  stocks.csv  stocks1.csv

Step 2. View The No. of Blocks


1. Run the below command to view no. of blocks created for our file stocks.csv.

2. Command: hdfs fsck  /user/root/stocks.csv

Command: hdfs fsck  /user/root/stocks.csv algaeservices tutorials

3. If you have notice we have 4 blocks with block size 903299 Byte




Step 3. Find Actual Blocks


1. Enter the fsck command as earlier along with -files and -block options.

2. Command: hdfs fsck  /user/root/stocks.csv  -files -blocks

3. Output contains block id's, which coincidentally are the name of the files on the data nodes.


Command: hdfs fsck  /user/root/stocks.csv  -files -blocks  ouput list of files in hadoop algaestudy tutorials

(Note: IP Address for your system can be different) 

4. Here check the filezilla in Linux VM  path "/hadoop/hdfs/data/current/BP-1200952396-


Here check the filezilla in Linux VM  path "/hadoop/hdfs/data/current/BP-1200952396-  #algaestudy tutorials hadoop lab


5. Now we will see the content of blocks using tail command

6. Command: tail /hadoop/hdfs/data/current/BP-1200952396-

Command: tail /hadoop/hdfs/data/current/BP-1200952396-

7. If you check path in screenshot in Filezilla, There are 4 blocks. Three of them are of same size i.e. 1048576 and 4th is 467470 bytes.

8. Select the sandbox instance and click the  Play virtual machine icon at right bottom corner. 

9. The VM will start, which may take several minutes.  Once the VM startup is complete, the console should look like the following

No comments:
Write comments

Please do not enter spam links

Meet US


More Services