Google Cloud Data Engineering

GCP is broadly divided into the following categories

  • Networking
  • Compute
  • Storage
  • Bigdata and ML Products

Compute

The GCP offers the following compute services.

  • Compute Engine
  • Kubernetes Engine
  • App Engine
  • Cloud Functions
  • Anthos
  • Cloud Run

Compute Engine is Google’s infrastructure-as-a-service (IaaS) offering. It is also the building block for other services that run on top of this compute resource.

Google Kubernetes Engine (GKE) is a managed service providing Kubernetes cluster management and Kubernetes container orchestration. Kubernetes Engine allocates cluster resources, determines where to run containers, performs health checks, and manages VM lifecycles using Compute Engine instance groups. Note, Kubernetes is often abbreviated K8s.

App Engine Standard is a PaaS product that allows developers to run their applications in a serverless environment.

Cloud Functions is a serverless compute service well suited for event processing. The service is designed to respond to and execute code in response to events within the Google Cloud Platform.

Anthos is an application management platform that builds on Kubernetes’ hybrid and multicloud implementations.

Cloud Run is a Google Cloud service for running stateless containers. Cloud Run is available as a managed service or within Anthos. When using the managed service, you pay per use and can have up to 1,000 container instances by default. 

Storage

Highly available storage is storage that is available and functional at nearly all times. The storage services can be grouped into the following categories:

  • Object storage
  • File and block storage/Network-attached
  • Database services
  • Caching

GCP provides four types of storage systems: object storage using Cloud Storage, network-attached storage, databases, and caching.

Cloud Storage is used for unstructured data that is accessed at the object level; there is no way to query or access subsets of data within an object. Object storage is useful for a wide array of use cases, from uploading data from client devices to storing long-term archives.

Network-attached storage is used to store data that is actively processed. Cloud Filestore provides a network filesystem, which is used to share file-structured data across multiple servers.

Cloud Storage has multiple classes: Standard, Nearline, Coldline, and Archive.

Cloud SQL is a managed relational database that can run on a single server. 

Cloud Spanner is a managed database service that supports horizontal scalability across regions.

Cloud Filestore is a network-attached storage service that provides a filesystem that is accessible from Compute Engine and Kubernetes Engine. Cloud Firestore and Cloud Datastore are managed document databases, which are a kind of NoSQL database that uses a flexible JSON-like data structure called a document. 

BigQuery is a managed data warehouse and analytical database solution.

Cloud Bigtable is designed to support petabyte-scale databases for analytic operations. 

Cloud Memorystore is a managed cache service. Cloud Memorystore is a managed cache service that provides Redis and Memcached options.

 An overview of Transactional vs Analytical choice on GCP.

Data Flow

When choosing a cloud to compute resources and designing workflows to meet business requirements, we need to choose products to accomplish the tasks.

Computer System Provisioning

GCP provides an interactive console as well as a command-line utility for creating and managing to compute, storage, and network resources. It also provides the Deployment Manager service that allows you to specify infrastructure as code. Alternatively, we may use Terraform

Leave a Reply