Point out the wrong statement.
(a) Hadoop works better with a small number of large files than a large number of small files
(b) CombineFileInputFormat is designed to work well with small files
(c) CombineFileInputFormat does not compromise the speed at which it can process the input in a typical MapReduce job
(d) None of the mentioned
(a) Hadoop works better with a small number of large files than a large number of small files
(b) CombineFileInputFormat is designed to work well with small files
(c) CombineFileInputFormat does not compromise the speed at which it can process the input in a typical MapReduce job
(d) None of the mentioned
The correct option is (c) CombineFileInputFormat does not compromise the speed at which it can process the input in a typical MapReduce job
Easy explanation: If the file is very small (“small” means significantly smaller than an HDFS block) and there are a lot of them, then each map task will process very little input, and there will be a lot of them (one per file), each of which imposes extra bookkeeping overhead.