Google DataProc Exam-Interview Questions

QUESTION 1

Which native output connectors are supported by Dataproc? Choose 3

BigQuery
Cloud SQL
Cloud Firestore
Cloud Storage
Cloud Bigtable

Explanation:

Cloud Dataproc has built-in integration with BigQuery, Cloud Storage, Cloud Bigtable, Stackdriver Logging, and Stackdriver Monitoring.

QUESTION 2

A customer wants to run Spark jobs on a low-cost ephemeral Dataproc cluster, utilizing preemptible workers wherever possible, but needs to store the results of Dataproc jobs persistently. What would you recommend?

Use a secondary group of preemptible worker nodes, but ensure there is enough persistent storage on the primary (non-preemptible) worker nodes to store all of the data.
Use the Cloud Storage connector, and specify GCS locations for the input and output of jobs.
Do not use preemptible workers at all, it will prevent you from choosing any persistent storage option.
Use a secondary group of preemptible worker nodes, but add custom code to a job that copies its results to Cloud Storage.

Explain

The Cloud Storage connector lets you run Apache Hadoop or Apache Spark jobs directly on data in Cloud Storage and offers a number of other benefits over HDFS.

QUESTION 3

Which features are not compatible with Dataproc autoscaling? Choose 2

MapReduce Tasks
High-Availability Clusters
Preemptible Workers
YARN Node Labels
Spark Structured Streaming

Explain

Autoscaling does not support YARN node labels, nor the property dataproc:am.primary_only. YARN incorrectly reports cluster metrics when node labels are used. Autoscaling clusters
Autoscaling is not compatible with Spark Structured Streaming since Spark Structured Streaming currently does not support dynamic allocation. Autoscaling clusters

QUESTION 4

Your customer would like to use Dataproc, but the standard image does not contain some additional Spark components required to run their jobs on the ephemeral clusters. What would you recommend?

Use a Dataproc cluster, but specify an initialization action that installs all of the additional components.
Create custom Dataproc image that fulfils the customer requirements and use it to deploy a Dataproc cluster.
Split the customer workloads into 2 clusters. Where the extra components are not required, use Dataproc. Where extra components are required, build a custom image and use it to deploy a custom Spark cluster using Compute Engine.
Create an image that fulfils the customer requirements and use it to deploy a custom Spark cluster using Compute Engine.

Explain:

Cloud Dataproc clusters can be provisioned with a custom image that includes a user's pre-installed packages. You could alternatively use initialization actions to install the additional components, but this would be less efficient and incur more running time for ephemeral clusters.

QUESTION 5

Which primary Apache services does Dataproc run? Choose 2

Spark
Cassandra
Dataflow
Hadoop
Kafka

Explain

Cloud Dataproc is a managed Spark and Hadoop service that lets you take advantage of open source data tools for batch processing, querying, streaming, and machine learning.

QUESTION 6

Which GCP product implements the Apache Beam SDK and is sometimes recommended as an alternative to Dataproc particularly for streaming data?

Cloud Dataflow
Cloud Data Fusion
Cloud Composer
Cloud Datalab

Explain

The Apache Beam SDK is an open source programming model that enables you to develop both batch and streaming pipelines. You create your pipelines with an Apache Beam program and then run them on the Dataflow service.

QUESTION 7

True of False: Preemptible workers in a Dataproc cluster cannot store HDFS data.

False
True

Explain:

Since preemptibles can be reclaimed at any time, preemptible workers do not store data.

Algae Education Services

Labels

Google DataProc Exam-Interview Questions

QUESTION 1

Explanation:

QUESTION 2

Explain

QUESTION 3

Explain

QUESTION 4

Explain:

QUESTION 5

Explain

QUESTION 6

Explain

QUESTION 7

Explain:

No comments:

Followers

Categories

Total Pageviews

Popular Posts

Authors

Meet US

Services

More Services