Use Professional-Data-Engineer Exam Dumps (2023 PDF Dumps) To Have Reliable Professional-Data-Engineer Test Engine [Q116-Q134]

Share

Use Professional-Data-Engineer Exam Dumps (2023 PDF Dumps) To Have Reliable Professional-Data-Engineer Test Engine

Professional-Data-Engineer PDF Recently Updated Questions Dumps to Improve Exam Score


What is the duration, language, and format of Google Professional Data Engineer Exam

  • Cost: $200
  • Format: Multiple choices, multiple answers
  • Length of Examination: 120 minutes

 

NEW QUESTION 116
Data Analysts in your company have the Cloud IAM Owner role assigned to them in their projects to allow them to work with multiple GCP products in their projects. Your organization requires that all BigQuery data access logs be retained for 6 months. You need to ensure that only audit personnel in your company can access the data access logs for all projects. What should you do?

  • A. Enable data access logs in each Data Analyst's project. Restrict access to Stackdriver Logging via Cloud IAM roles.
  • B. Export the data access logs via a project-level export sink to a Cloud Storage bucket in the Data Analysts' projects. Restrict access to the Cloud Storage bucket.
  • C. Export the data access logs via a project-level export sink to a Cloud Storage bucket in a newly created projects for audit logs. Restrict access to the project with the exported logs.
  • D. Export the data access logs via an aggregated export sink to a Cloud Storage bucket in a newly created project for audit logs. Restrict access to the project that contains the exported logs.

Answer: D

Explanation:
https://cloud.google.com/iam/docs/roles-audit-logging#scenario_external_auditors

 

NEW QUESTION 117
Your company is performing data preprocessing for a learning algorithm in Google Cloud Dataflow.
Numerous data logs are being are being generated during this step, and the team wants to analyze them. Due to the dynamic nature of the campaign, the data is growing exponentially every hour.
The data scientists have written the following code to read the data for a new key features in the logs.
BigQueryIO.Read
.named("ReadLogData")
.from("clouddataflow-readonly:samples.log_data")
You want to improve the performance of this data read. What should you do?

  • A. Use of both the Google BigQuery TableSchema and TableFieldSchema classes.
  • B. Use .fromQuery operation to read specific fields from the table.
  • C. Call a transform that returns TableRow objects, where each element in the PCollection represents a single row in the table.
  • D. Specify the TableReference object in the code.

Answer: C

 

NEW QUESTION 118
You are building a model to make clothing recommendations. You know a user's fashion preference is likely to change over time, so you build a data pipeline to stream new data back to the model as it becomes available.
How should you use this data to train the model?

  • A. Train on the existing data while using the new data as your test set.
  • B. Continuously retrain the model on a combination of existing data and the new data.
  • C. Continuously retrain the model on just the new data.
  • D. Train on the new data while using the existing data as your test set.

Answer: A

Explanation:
Explanation
https://cloud.google.com/automl-tables/docs/prepare

 

NEW QUESTION 119
You need to store and analyze social media postings in Google BigQuery at a rate of 10,000 messages per minute in near real-time. Initially, design the application to use streaming inserts for individual postings. Your application also performs data aggregations right after the streaming inserts. You discover that the queries after streaming inserts do not exhibit strong consistency, and reports from the queries might miss in-flight data. How can you adjust your application design?

  • A. Convert the streaming insert code to batch load for individual messages.
  • B. Load the original message to Google Cloud SQL, and export the table every hour to BigQuery via streaming inserts.
  • C. Estimate the average latency for data availability after streaming inserts, and always run queries after waiting twice as long.
  • D. Re-write the application to load accumulated data every 2 minutes.

Answer: C

Explanation:
Explanation
The data is first comes to buffer and then written to Storage. If we are running queries in buffer we will face above mentioned issues. If we wait for the bigquery to write the data to storage then we won't face the issue.
So We need to wait till it's written tio storage

 

NEW QUESTION 120
Which of the following is NOT one of the three main types of triggers that Dataflow supports?

  • A. Trigger based on element size in bytes
  • B. Trigger based on time
  • C. Trigger that is a combination of other triggers
  • D. Trigger based on element count

Answer: A

Explanation:
Explanation
There are three major kinds of triggers that Dataflow supports: 1. Time-based triggers 2. Data-driven triggers.
You can set a trigger to emit results from a window when that window has received a certain number of data elements. 3. Composite triggers. These triggers combine multiple time-based or data-driven triggers in some logical way Reference: https://cloud.google.com/dataflow/model/triggers

 

NEW QUESTION 121
By default, which of the following windowing behavior does Dataflow apply to unbounded data sets?

  • A. Windows at every 1 minute
  • B. Windows at every 100 MB of data
  • C. Single, Global Window
  • D. Windows at every 10 minutes

Answer: C

Explanation:
Explanation
Dataflow's default windowing behavior is to assign all elements of a PCollection to a single, global window, even for unbounded PCollections Reference: https://cloud.google.com/dataflow/model/pcollection

 

NEW QUESTION 122
Which is not a valid reason for poor Cloud Bigtable performance?

  • A. There are issues with the network connection.
  • B. The workload isn't appropriate for Cloud Bigtable.
  • C. The Cloud Bigtable cluster has too many nodes.
  • D. The table's schema is not designed correctly.

Answer: C

Explanation:
The Cloud Bigtable cluster doesn't have enough nodes. If your Cloud Bigtable cluster is overloaded, adding more nodes can improve performance. Use the monitoring tools to check whether the cluster is overloaded.

 

NEW QUESTION 123
You are deploying a new storage system for your mobile application, which is a media streaming service.
You decide the best fit is Google Cloud Datastore. You have entities with multiple properties, some of which can take on multiple values. For example, in the entity `Movie' the property `actors' and the property
`tags' have multiple values but the property `date released' does not. A typical query would ask for all movies with actor=<actorname> ordered by date_released or all movies with tag=Comedy ordered by date_released. How should you avoid a combinatorial explosion in the number of indexes?

  • A. Option A
  • B. Option B.
  • C. Option D
  • D. Option C

Answer: A

 

NEW QUESTION 124
You are designing an Apache Beam pipeline to enrich data from Cloud Pub/Sub with static reference data from BigQuery. The reference data is small enough to fit in memory on a single worker. The pipeline should write enriched results to BigQuery for analysis. Which job type and transforms should this pipeline use?

  • A. Streaming job, PubSubIO, JdbcIO, side-outputs
  • B. Streaming job, PubSubIO, BigQueryIO, side-inputs
  • C. Batch job, PubSubIO, side-inputs
  • D. Streaming job, PubSubIO, BigQueryIO, side-outputs

Answer: C

 

NEW QUESTION 125
Your neural network model is taking days to train. You want to increase the training speed. What can you do?

  • A. Subsample your test dataset.
  • B. Increase the number of input features to your model.
  • C. Subsample your training dataset.
  • D. Increase the number of layers in your neural network.

Answer: D

Explanation:
Explanation/Reference:
Reference: https://towardsdatascience.com/how-to-increase-the-accuracy-of-a-neural-network-
9f5d1c6f407d

 

NEW QUESTION 126
You have a data pipeline that writes data to Cloud Bigtable using well-designed row keys. You want to monitor your pipeline to determine when to increase the size of you Cloud Bigtable cluster. Which two actions can you take to accomplish this? (Choose two.)

  • A. Review Key Visualizer metrics. Increase the size of the Cloud Bigtable cluster when the Write pressure index is above 100.
  • B. Monitor storage utilization. Increase the size of the Cloud Bigtable cluster when utilization increases above 70% of max capacity.
  • C. Monitor the latency of write operations. Increase the size of the Cloud Bigtable cluster when there is a sustained increase in write latency.
  • D. Review Key Visualizer metrics. Increase the size of the Cloud Bigtable cluster when the Read pressure index is above 100.
  • E. Monitor latency of read operations. Increase the size of the Cloud Bigtable cluster of read operations take longer than 100 ms.

Answer: C,D

 

NEW QUESTION 127
If you want to create a machine learning model that predicts the price of a particular stock based on its recent price history, what type of estimator should you use?

  • A. Clustering estimator
  • B. Unsupervised learning
  • C. Regressor
  • D. Classifier

Answer: C

Explanation:
Regression is the supervised learning task for modeling and predicting continuous, numeric variables. Examples include predicting real-estate prices, stock price movements, or student test scores.
Classification is the supervised learning task for modeling and predicting categorical variables. Examples include predicting employee churn, email spam, financial fraud, or student letter grades.
Clustering is an unsupervised learning task for finding natural groupings of observations (i.e. clusters) based on the inherent structure within your dataset. Examples include customer segmentation, grouping similar items in e-commerce, and social network analysis.
Reference: https://elitedatascience.com/machine-learning-algorithms

 

NEW QUESTION 128
You are using Google BigQuery as your data warehouse. Your users report that the following simple query is running very slowly, no matter when they run the query:
SELECT country, state, city FROM [myproject:mydataset.mytable] GROUP BY country You check the query plan for the query and see the following output in the Read section of Stage:1:

What is the most likely cause of the delay for this query?

  • A. Users are running too many concurrent queries in the system
  • B. Either the state or the city columns in the [myproject:mydataset.mytable]table have too many NULL values
  • C. Most rows in the [myproject:mydataset.mytable]table have the same value in the country column, causing data skew
  • D. The [myproject:mydataset.mytable] table has too many partitions

Answer: A

 

NEW QUESTION 129
You are operating a streaming Cloud Dataflow pipeline. Your engineers have a new version of the pipeline with a different windowing algorithm and triggering strategy. You want to update the running pipeline with the new version. You want to ensure that no data is lost during the update. What should you do?

  • A. Update the Cloud Dataflow pipeline inflight by passing the --update option with the --jobName set to the existing job name
  • B. Stop the Cloud Dataflow pipeline with the Drain option. Create a new Cloud Dataflow job with the updated code
  • C. Update the Cloud Dataflow pipeline inflight by passing the --update option with the --jobName set to a new unique job name
  • D. Stop the Cloud Dataflow pipeline with the Cancel option. Create a new Cloud Dataflow job with the updated code

Answer: A

Explanation:
References:

 

NEW QUESTION 130
You currently have a single on-premises Kafka cluster in a data center in the us-east region that is responsible for ingesting messages from IoT devices globally. Because large parts of globe have poor internet connectivity, messages sometimes batch at the edge, come in all at once, and cause a spike in load on your Kafka cluster. This is becoming difficult to manage and prohibitively expensive. What is the Google-recommended cloud native architecture for this scenario?

  • A. A Kafka cluster virtualized on Compute Engine in us-east with Cloud Load Balancing to connect to the devices around the world.
  • B. An IoT gateway connected to Cloud Pub/Sub, with Cloud Dataflow to read and process the messages from Cloud Pub/Sub.
  • C. Cloud Dataflow connected to the Kafka cluster to scale the processing of incoming messages.
  • D. Edge TPUs as sensor devices for storing and transmitting the messages.

Answer: B

 

NEW QUESTION 131
You plan to deploy Cloud SQL using MySQL. You need to ensure high availability in the event of a zone failure.
What should you do?

  • A. Create a Cloud SQL instance in one zone, and create a read replica in another zone within the same region.
  • B. Create a Cloud SQL instance in a region, and configure automatic backup to a Cloud Storage bucket in the same region.
  • C. Create a Cloud SQL instance in one zone, and configure an external read replica in a zone in a different region.
  • D. Create a Cloud SQL instance in one zone, and create a failover replica in another zone within the same region.

Answer: C

 

NEW QUESTION 132
You are a retailer that wants to integrate your online sales capabilities with different in-home assistants, such as Google Home. You need to interpret customer voice commands and issue an order to the backend systems. Which solutions should you choose?

  • A. Cloud Speech-to-Text API
  • B. Cloud AutoML Natural Language
  • C. Dialogflow Enterprise Edition
  • D. Cloud Natural Language API

Answer: C

 

NEW QUESTION 133
You want to use a database of information about tissue samples to classify future tissue samples as either normal or mutated. You are evaluating an unsupervised anomaly detection method for classifying the tissue samples. Which two characteristic support this method? (Choose two.)

  • A. You expect future mutations to have different features from the mutated samples in the database.
  • B. You expect future mutations to have similar features to the mutated samples in the database.
  • C. There are roughly equal occurrences of both normal and mutated samples in the database.
  • D. There are very few occurrences of mutations relative to normal samples.
  • E. You already have labels for which samples are mutated and which are normal in the database.

Answer: B,D

Explanation:
Unsupervised anomaly detection techniques detect anomalies in an unlabeled test data set under the assumption that the majority of the instances in the data set are normal by looking for instances that seem to fit least to the remainder of the data set. https://en.wikipedia.org/wiki/Anomaly_detection

 

NEW QUESTION 134
......


Google Cloud Big Data & Machine Learning Fundamentals course

This course is a gateway to introduce you to Google Cloud's big data and different machine learning functions. However, to successfully pass this training, you have to attain one year of experience in SQL, extract transform, data modeling, machine learning, programming in Python, and load activities. So, the objectives of the course are the following:

  • Hire BigQuery and Cloud SQL for interactive data analysis
  • Utilize Cloud SQL & Dataproc to migrate existing MySQL, Pig, Spark, or Hive workloads to Google Cloud
  • Recognize the purpose of the key Big data and Machine Learning products in Google Cloud
  • Create ML models using BigQuery ML, APIs, and AutoML.

Operationalizing Machine Learning Models

Here the candidates need to demonstrate their expertise in using pre-built Machine Learning models as a service, including Machine Learning APIs (for instance, Speech API, Vision API, etc.), customizing Machine Learning APIs (for instance, Auto ML text, AutoML Vision, etc.), conversational experiences (for instance, Dialogflow). The applicants should also have the skills in deploying the Machine Learning pipeline. This involves the ability to ingest relevant data, perform retraining of machine learning models (BigQuery ML, Cloud Machine Learning Engine, Spark ML, Kubeflow), as well as execute continuous evaluation. Additionally, the students should be able to choose the relevant training & serving infrastructure as well as know how to fulfill measuring, monitoring, and troubleshooting of Machine Learning models.

 

Professional-Data-Engineer Dumps Full Questions with Free PDF Questions to Pass: https://torrentpdf.exam4tests.com/Professional-Data-Engineer-pdf-braindumps.html