This article is a mirror article of machine translation, please click here to jump to the original article.

View: 16499|Reply: 0

[Source] Download files directly from Hadoop HDFS

[Copy link]
Posted on 7/10/2019 2:20:11 PM | | |
Download large files from HDFS

I got a large file (about 2GB) of DataInputStream from the HDFS client and I need to store it as a file on my host.

I'm thinking of using apache common IOUtils and doing something like this......


I've been looking for other solutions that are better than this one. The main concern is the use of buffering in input and IOUtils.copy.

For files larger than 2GB, it is recommended to use IOUtils.copyLarge() (if we talk about the same IOUtils: org.apache.commons.io.IOUtils)

The replica in IOUtils uses the default buffer size of 4Kb (although you can specify another buffer size as a parameter).

The difference between copy() and copyLarge() is that it returns the result.

Because copy(), if the stream is greater than 2GB, you will successfully use the copy, but the result is -1.

For copyLarge() the result is the number of bytes that are fully copied.

See more in the documentation here:


The hyperlink login is visible.



How to check if a file is fully downloaded via the Spring Rest API

I created a simple REST API to serve files from hdfs (the files are large and I don't want to copy them locally).

I want to log the information that the file download completed successfully, i.e. read the entire stream, but I don't know how. I can only record the information that the file download started.

Any help would be greatly appreciated.


You can try creating a wrapper on the InputStream and triggering some flags on the stream close(close()).

For example, you can use ProxyInputStream as a basis:








Previous:Share a few measured websites for receiving SMS verification codes online
Next:JS--Plugin: Tree Development and Implementation Attachment is downloadable!!
Disclaimer:
All software, programming materials or articles published by Code Farmer Network are only for learning and research purposes; The above content shall not be used for commercial or illegal purposes, otherwise, users shall bear all consequences. The information on this site comes from the Internet, and copyright disputes have nothing to do with this site. You must completely delete the above content from your computer within 24 hours of downloading. If you like the program, please support genuine software, purchase registration, and get better genuine services. If there is any infringement, please contact us by email.

Mail To:help@itsvse.com