Cloudera Certified Developer for Apache Hadoop Sample Questions:
1. What is the preferred way to pass a small number of configuration parameters to a mapper or reducer?
A) As key-value pairs in the jobconf object.
B) Through a static variable in the MapReduce driver class (i.e., the class that submits the MapReduce job).
C) As a custom input key-value pair passed to each mapper or reducer.
D) Using a plain text file via the Distributedcache, which each mapper or reducer reads.
2. Which of the following best describes the map method input and output?
A) It accepts a single key-value pair as input and can emit only one key-value pair as output.
B) It accepts a single key-value pair as input and emits a single key and list of corresponding values as output
C) It accepts a single key-value pair as input and can emit any number of key-value pairs as output, including zero.
D) It accepts a list of key-value pairs as input hut run emit only one key value pair as output.
3. What happens in a MapReduce job when you set the number of reducers to one?
A) Setting the number of reducers to one is invalid, and an exception is thrown.
B) A single reducer gathers and processes all the output from all the mappers. The output is written in as many separate files as there are mappers.
C) A single reducer gathers and processes all the output from all the mappers. The output is written to a single file in HDFS.
D) Setting the number of reducers to one creates a processing bottleneck, and since the number of reducers as specified by the programmer is used as a reference value only, the MapReduce runtime provides a default setting for the number of reducers.
4. Can you use MapReduce to perform a relational join on two large tables sharing a key? Assume that the two tables are formatted as comma-separated file in HDFS.
A) Yes, but only if one of the tables fits into memory.
B) Yes.
C) Yes, so long as both tables fit into memory.
D) No, MapReduce cannot perform relational operations.
E) No, but it can be done with either Pig or Hive.
5. Combiners Increase the efficiency of a MapReduce program because:
A) They aggregate intermediate map output locally on each individual machine and therefore reduce the amount of data that needs to be shuffled across the network to the reducers.
B) They aggregate intermediate map output horn a small number of nearby (i.e., rack-local) machines and therefore reduce the amount of data that needs to be shuffled across the network to the reducers.
C) They provide a mechanism for different mappers to communicate with each Other, thereby reducing synchronization overhead.
D) They provide an optimization and reduce the total number of computations that are needed to execute an algorithm by a factor of n, where is the number of reducer.
Solutions:
Question # 1 Answer: A | Question # 2 Answer: C | Question # 3 Answer: B | Question # 4 Answer: B | Question # 5 Answer: A |