hadoop is used for distributed computing and can query large datasets based on its reliable and scalable architecture two major components of hadoop are the hadoop distributed file system hfds and mapreduce discuss the overall roles of these two

The traditional Databased Management Systems alone have been used to process data for over the last

five decades. However, within the past few years, datasets have became too large and too complex to be

managed with these traditional methods. Therefore, we need a complementary data processing technology

that can handle the heterogeneous and massive data.

Big Data Analytics Hadoop Cluster grows horizontally and scalable by adding more servers, while

the conventional database systems grow vertically by addition more CPU, memory, and disks, which

causes them to have a very limited data storage and processing capability, thereby they have a very

limited scalability.

Question:

HadoopÂ® is used for distributed computing and can query large datasets based on its reliable and scalable architecture.

Two major components of HadoopÂ® are the HadoopÂ® Distributed File System (HFDS) and MapReduce.

Discuss the overall roles of these two components, including their role during system failures.

Your discussion should include the advantages of parallel processing.

Discussion Requirements:

List the various traditional database systems, methods, and tools.

List and explain various tools to manage the Big Data Analytics – NoSQL, Hadoop & MapReduce.

Explain Hadoop and its components.

Explain MapReduce and how it processes massive data.

Mandatory videos:

https://www.youtube.com/watch?v=mafw2-CVYnA