The traditional Databased Management Systems alone have been used to process data for over the last
five decades. However, within the past few years, datasets have became too large and too complex to be
managed with these traditional methods. Therefore, we need a complementary data processing technology
that can handle the heterogeneous and massive data.
Big Data Analytics Hadoop Cluster grows horizontally and scalable by adding more servers, while
the conventional database systems grow vertically by addition more CPU, memory, and disks, which
causes them to have a very limited data storage and processing capability, thereby they have a very
limited scalability.
Question:
Hadoop® is used for distributed computing and can query large datasets based on its reliable and scalable architecture.
Two major components of Hadoop® are the Hadoop® Distributed File System (HFDS) and MapReduce.
Discuss the overall roles of these two components, including their role during system failures.
Your discussion should include the advantages of parallel processing.
Discussion Requirements:
List the various traditional database systems, methods, and tools.
List and explain various tools to manage the Big Data Analytics – NoSQL, Hadoop & MapReduce.
Explain Hadoop and its components.
Explain MapReduce and how it processes massive data.
Mandatory videos:
https://www.youtube.com/watch?v=mafw2-CVYnA
https://www.youtube.com/watch?v=DCaiZq3aBSc