Google Cloud Engineer - Big Query interview Questions

 


Google Cloud Engineer - Big Query Questions


QUESTION 1

 How can you ensure that table modifications in BigQuery are ACID compliant?



  • No special accommodations are required, BigQuery table modifications are ACID compliant by design.
  • Enforce ordering of SQL statements to ensure the operation completes atomically.
  • Maintain a separate table that records transactions themselves so changes can be re-applied if any are lost.
  • Add a nominal wait time to any application queries following an update to allow for eventual consistency.


Explain:

All table modifications in BigQuery are ACID compliant. This applies to DML operations, queries with destination tables, and load jobs. A table that goes through inserts, updates, and deletes while serving user queries handles the concurrency gracefully and transitions from one state to the next in an atomic fashion.


QUESTION 2

What could you consider to improve BigQuery performance for queries that use filters or aggregation against particular columns?



  • Use clustering on non-partitioned tables.
  • Use clustered partitioned tables
  • Normalize data based on commonly used filters.
  • Use partitioned tables


Explain:

Clustering can improve the performance of certain types of queries such as queries that use filter clauses and queries that aggregate data. When you submit a query that contains a clause that filters data based on the clustering columns, BigQuery uses the sorted blocks to eliminate scans of unnecessary data.


QUESTION 3

What steps do you need to take to set up BigQuery before use?


  • You must create storage buckets for BigQuery to use.
  • BigQuery is a serverless product and all compute and storage resources are managed for you.
  • You must create a BigQuery cluster.
  • You must create processing nodes in Compute Engine.


Explain

BigQuery is a serverless product for storing and querying massive data sets without configuring or managing hardware or software.


QUESTION 4

Which option shown below is true regarding BigQuery jobs?


  • Jobs run for large batch operations but not for interactive queries in the BigQuery UI.
  • Jobs are actions that BigQuery runs to load, export, query, or copy data.
  • Jobs run for most actions, including those in the BigQuery UI, but only if they are submitted with DML.
  • Jobs run for all scheduled actions, but not for interactive queries in the BigQuery UI.


Explain

When you use the Cloud Console, the classic BigQuery web UI, or the CLI to load, export, query, or copy data, a job resource is automatically created, scheduled, and run.


QUESTION 5

When you run a query in BigQuery, what happens to the results?


  • Query results are stored in a results table, however it can not be used for querying cached results.
  • Query results are either written to a destination table specified by the user, or to a temporary cached results table.
  • Query results exist only in BigQuery memory and are not stored in a table.
  • Query results exist only within the BigQuery UI and are not stored in a table.


Explain

BigQuery writes all query results to a table. The table is either explicitly identified by the user (a destination table), or it is a temporary, cached results table.


QUESTION 6

You are required to share a subset of data from a BigQuery data set with a 3rd party analytics team. The data may change over time, and you should not grant unnecessary projects permissions to this team if you can avoid it. How should you proceed?


  • Create an export of the data subset to a Cloud Storage bucket. Provide a signed URL for the team to download the data from the bucket.
  • Add the team to your GCP project and assign them the BigQuery Data Viewer IAM role for the data set.
  • Create an authorized view based on a specific query for the subset of data, and provide access for the team only to that view.
  • Add the team to your GCP project and assign them the BigQuery Data Editor IAM role for the data set.


Explain

An authorized view allows you to share query results with particular users and groups without giving them access to the underlying tables.


QUESTION 7

Your company stores data in BigQuery for long-term and short-term analytics queries. Most of the jobs only need to study the last 7 days of data. Over time, the cost of queries keeps going up. How can you redesign the database to lower the cost of the most frequent queries?


  • Create a new table every 7 days. Maintain a separate table that duplicates the weekly tables to contain all data.
  • Create a new table every 7 days. Use JOIN statements to conduct long-term analytics queries.
  • Use Integer range partitioned tables.
  • Use DATE partitioned tables.


Explain

A partitioned table is a special table that is divided into segments, called partitions, that make it easier to manage and query your data. By dividing a large table into smaller partitions, you can improve query performance, and you can control costs by reducing the number of bytes read by a query.


QUESTION 8

You are required to load a large volume of data into BigQuery that does contain some duplication. What should you do to ensure the best query performance once the data is loaded?


  • Normalize the data to eliminate duplicates from being stored, and reformat the data as Avro.
  • Denormalize the data by using nested or repeated fields.
  • Normalize the data to eliminate duplicates from being stored, and reformat the data as JSON.
  • Break up the data into multiple CSV files.


Explain

BigQuery performs best when your data is denormalized. Rather than preserving a relational schema, such as a star or snowflake schema, you can improve performance by denormalizing your data and taking advantage of nested and repeated fields.


QUESTION 9

What are some ways to help control costs in BigQuery?


  • There are no ways to control the cost of BigQuery.
  • Use LIMIT clauses in SQL queries.
  • Split data across multiple projects to benefit from multiple free tier allowances.
  • Use the query validator in the Cloud Console.
  • Only query specific columns, avoiding SELECT *.


Explain

  • Using the query validator and only querying the data you need will help you to predict and control costs. 
  • Querying only the data you need and using the query validator will help you to predict and control costs


QUESTION 10


What is a federated data source?

  • An external data source that can be queried directly even though the data is not stored in BigQuery.
  • A public BigQuery data set.
  • A BigQuery table that belongs to a different data set, used in an SQL statement.
  • A BigQuery data set that belongs to a different GCP project, used in an SQL statement.

Explain

An external data source (also known as a federated data source) is a data source that you can query directly even though the data is not stored in BigQuery. Instead of loading or streaming the data, you create a table that references the external data source.

No comments:
Write comments

Please do not enter spam links

Meet US

Services

More Services