Demystifying Big Table - Google Cloud Platform

 


Big Table Architecture

Google Cloud Platform



Google Cloud Bigtable is a fully managed NoSQL database service that is designed to handle large-scale, high-performance workloads. It is a highly scalable, high-performance, and distributed storage system that allows for the storage and processing of massive amounts of structured and unstructured data.


Cloud Bigtable is a column-family based NoSQL database that allows you to store and manage petabytes of data with very low latency and high throughput. It is based on the Google Bigtable technology, which was developed for use within Google's own infrastructure.


Cloud Bigtable provides a number of benefits, including:


  • High scalability: 
    • You can easily scale your Bigtable cluster up or down to meet your needs, and you pay only for what you use.
  • High performance: 
    • Bigtable is designed to deliver low-latency, high-throughput performance for a wide range of workloads.
  • Fully managed service: 
    • You don't need to worry about managing the underlying infrastructure, as Google takes care of all the maintenance, upgrades, and scaling.
  • Integration with other Google Cloud services: 
    • You can easily integrate Bigtable with other Google Cloud services like BigQuery, Cloud Dataflow, and Cloud Dataproc, making it a key component of your data analytics and processing pipeline.
  • Cloud Bigtable is used by a variety of companies and organizations for a wide range of use cases, including IoT data processing, financial analytics, and e-commerce applications, among others.




History of Bigtable

  • Considered one of the originators for a NoSQL industry
  • Developed by Google in 2004
  • Existing database solutions were too slow
  • Needed real-time access to petabytes of data
  • Powers Gmail, YouTube, Google Maps, and others


What is it used for?

  • High throughput analytics
  • Huge datasets at least terabyte range of data. Less than terabyte, it might not be efficient cost usage.
  • It is Preferred for IOT data received from multiple devices.


Use Cases

Google Cloud Bigtable can be used for a variety of use cases that require high scalability, low latency, and high throughput for large-scale data storage and processing. Some of the most common use cases of Google Cloud Bigtable are:

  • Time-series data storage and analysis: 
    • Cloud Bigtable is an excellent option for storing and processing time-series data, such as server logs, sensor data, and IoT data.
  • Financial analytics: 
    • Cloud Bigtable can be used for storing and analyzing large volumes of financial data, including stock prices, trades, and market data.
  • Ad tech: 
    • Ad tech companies use Cloud Bigtable for storing and processing large volumes of user data, such as ad impressions, clicks, and conversions.
  • Gaming: 
    • Game developers use Cloud Bigtable for storing game data, such as player scores, game states, and user-generated content.
  • E-commerce: 
    • E-commerce companies use Cloud Bigtable for storing and analyzing customer data, such as purchase history, browsing behavior, and user profiles.
  • Content management: 
    • Content management systems use Cloud Bigtable for storing and retrieving large volumes of content, such as images, videos, and documents.
  • Machine learning: 
    • Cloud Bigtable can be used as a data store for machine learning applications, such as image recognition, natural language processing, and recommendation engines.


Overall, Cloud Bigtable is a versatile database that can be used in many different industries and applications that require high scalability, low latency, and high throughput for large-scale data storage and processing.


Access Control

  • Project wide or instance level
  • Read/Write/Manage

Architecture

The architecture of Bigtable is based on the Google File System (GFS) and is designed to provide high scalability, availability, and performance.

Bigtable consists of the following components:


  • Tablets: 
    • Bigtable organizes data into tablets, which are partitions of the data stored on different nodes in the cluster. Each tablet contains a contiguous range of rows and a set of columns for those rows.
  • Chubby: 
    • Chubby is a distributed lock service used by Bigtable for coordination and synchronization. Chubby is responsible for managing cluster-wide metadata and coordinating tablet servers.
  • Tablet Servers: 
    • Tablet servers are responsible for serving read and write requests for a set of tablets. Each tablet server can serve multiple tablets and is responsible for maintaining the tablet's data on disk.
  • Master Server: 
    • The master server is responsible for assigning tablets to tablet servers, balancing the workload across the tablet servers, and monitoring the health of the system.
  • Clients: 
    • Clients access Bigtable through the application programming interface (API), which provides a set of functions for reading and writing data to the database.


Request from Client Flows

  • Request received from clients will be distributed by Frontend server pool to different nodes.
  • Data is broken into tablets (SS table)
  • Nodes acts a compute for processing requests or are mechanism handling your server requests.
  • None of the data is stored at the node except for meta data to direct requests to the correct tablet.
  • Big table's table is sharded into block of rows called tablets.
  • Tablets are stored on colossus, Google File system in SSTable format.
  • Storage is separate from the compute nodes, though each tablet is associated with a node.
  • As a result, replication and recovery of node data is very fast, as only metadata pointers need to be updated.
































The architecture of Bigtable is designed to be highly scalable and fault-tolerant. The use of tablets and tablet servers allows Bigtable to handle large amounts of data while maintaining low latency. The use of Chubby for coordination and synchronization ensures that the system is always in a consistent state, even in the presence of failures. The master server monitors the health of the system and ensures that tablets are evenly distributed across the tablet servers.





No comments:
Write comments

Please do not enter spam links

Meet US

Services

More Services