Introduction We have decided to aggregate in a single post the most important things to know about hadoop in a concise way. Let’s us know if you have any comments! Hadoop ########## ## HDFS ## ########## NameNode # => Managing filesystem namespace, if you loose it you have no pointers to your data, you practially lost your data. DataNode # => You know it holds data, installed on each worker. Block # => Each file split to B1,B2,.. where each block size 128MB replication is on blocks. Name node knows that File X is split to B1,B2 and where. ########## ## YARN ## ########## ResourceManager # => Like `NameNode` for computing, tracks NodeManagers and how available they are for work. NodeManager # => Like `Datanode` for computing, offer computational resources run applications tasks in containers. ApplicationMaster # => Each application has `ApplicationMaster` process which negotiates resources with `ResourceManager` which delivers a `containe...
Software Engineering Best Practices, System Design, High Scale, Algorithms, Math, Programming Languages, Statistics, Machine Learning, Databases, Front Ends, Frameworks, Low Level Machine Structure, Papers and Computing, Computer Science Book Reviews - Everything!