How can you recover from a Namenode failure in Hadoop?
Why is Namenode so important?
Namenode is the most important Hadoop service. It contains the location of all blocks in the cluster. It maintains the state of the distributed file system.We have something called a secondary name node. Secondary Namenode is not a back up for the name node. When a name node fails, it is possible to recover from a previous checkpoint generated by Secondary Namenode.Secondary Namenode performs periodic checkpoint process.
How to recover a failed Namenode?
We faced with a situation where the node hosting the Namenode service has failed. The secondary Namenode is running on some other separate machine. In the core-default.XML , the fs.checkpoint. Dir property has been set previously. This property tells the Secondary Namenode where to save the checkpoints on the local file system.
Carry out the following steps to recover from a NameNode failure:
1. Stop the Secondary NameNode:
$ cd /path/to/Hadoop
$ bin/hadoop-daemon.sh stop secondarynamenode
2. Bring up a new machine to act as the new NameNode. This machine should have Hadoop installed, be configured like the previous NameNode, and ssh password-less login should be configured. Also, it should have the same IP and hostname as the previous NameNode.
3. Copy the contents of fs.checkpoint.dir on the Secondary NameNode to the pdfs.name. Dir folder on the new NameNode machine.
4. Start the new NameNode on the new machine:
$ bin/hadoop-daemon.sh start namenode
5. Start the Secondary NameNode on the Secondary NameNode machine:
$ bin/hadoop-daemon.sh start secondarynamenode
6. Verify that the NameNode started successfully by looking at the NameNode status page http://localhost:50070/
The working: -
We first log in to the Secondary Namenode to stop its service. Next, we set up the Namenode in a new machine.
Next, we copy all the checkpoints and editing files from the Secondary Namenode to the new Namenode. In this way, we recover the filesystem status, metadata and editions at the time of the last checkpoint. Finally, we restarted the new Namenode and Secondary Namenode.