A Review on P2P File System Based on IPFS for Concurrency Control in Hadoop

Jasmine Sethi,Shashank Srivastava,Divya Upadhyay

A Review on P2P File System Based on IPFS for Concurrency Control in Hadoop

2021

A distributed file system is used to store and share files in a peer-to-peer network using the InterPlanetary File System (IPFS) protocol. In a distributed file system, multiple central servers can save the files which will be accessed by various remote clients with proper authorization rights within the network. Nowadays, the amount of data getting generated each minute is huge and can be accessed by multiple users. This creates a problem in managing, accessing and executing data and finally leading to concurrency control issues. Concurrency control is a process of simultaneously managing the execution of data in a shared database and ensures the serializability of data for multiple users. Thus, in the context of Hadoop, if multiple clients want to write an updated data in the HDFS file system then the protocol that needs to be followed to make sure that the write done by one client does not influence the computation performed by the other client. This paper elaborates how multiple users can access the same file without getting any distortion in the content of that file. It also provides a theoretical solution to handle the concurrency control problem in Hadoop. The solution discussed in this paper is to implement Hadoop’s Java-based Filesystem interface for the decentralised, peer-to-peer file system using IPFS. The proposed interface will allow the Hadoop MapReduce functions to be directly performed on data files hosted on IPFS.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations