Model Deployment in Distributed machine in Google Cloud Platform - Lab


GCP AI Platform Model Deployment in Distributed machine

 Instance Hands-On Lab


We will go over the process of training a pre-packaged machine learning model on AI Platform, deploying the trained model, and running predictions against it using Distribute Machine.


  • We will focus on AI Platform Aspect not Tenser Flow
  • We will use lot of command on command line/ gcloud usig cloud shell 

Step Summary:

  • Submit training job locally using AI-platform commands
  • Submit training job on AI Platform, both single and distributed
  • Deploy trained model,  and submit predictions

Machine type: 

    • 1 master
    • 4 workers
    • 3 parameter servers 
  • Trained  on 4 machine with 3 parameters server 

If you want a set of scripts to automate much of the process, the scripts and a PDF file of the commands we used can be found at the following public link:  

algae study

If you want to download all scripts and files in a terminal environment, use the below command:

  • gsutil cp gs://acg-gcp-course-exercise-scripts/data-engineer/ai-platform/\* .

(The period at the end is required to copy to your current location).

Step 1: Enable Google cloud API:

  • Go to we console --> Go to Menu --> Search AI platform in Products --> Click on Jobs --> Enable API
  • It will take a few mins to enable, you can also refresh page to check


Step 2: Prepare environment

  • Go to cloud shell and execute commands
    • ### Set region environment variable (i will run in US central 1 environent)
      • REGION=us-central1
    • ### Download and unzip ML github data for demo
      • wget
      • unzip

Step 3: Setup Directory

    • Check the files
      • ls
    • ### Navigate to the cloudml-samples-master > census > estimator directory.
      • cd ~/cloudml-samples-main/census/estimator
    • ### **Develop and validate trainer on local machine**
      • ### Get training data from public GCS bucket
        • mkdir data
        • gsutil -m cp gs://cloud-samples-data/ml-engine/census/data/* data/
    • ### Set path variables for local file paths,  These will change later when we use AI Platform
      • TRAIN_DATA=$(pwd)/data/
      • EVAL_DATA=$(pwd)/data/adult.test.csv

Step 4: Enable TFS version

  • ### Run sample requirements.txt to ensure we're using same version of TF as sample
    • sudo pip install -r ~/cloudml-samples-master/census/requirements.txt

Step 5: Specify output Directory and Clear it

  • ### Specify output directory, set as variable
    • MODEL_DIR=output
  • ### Best practice is to delete contents of output directory in case
  • ### data remains from previous training run
    • rm -rf $MODEL_DIR/*
Step 6: Run a local trainer

  • Runing local traini g env using API platform
    • It will take ML model from from git hup repository and train it using our csv data
    • It will train on 1000 data points and evaluate on 100 test data points, to check accuracy
  • ### Run local training using gcloud
    • Its going to training process in local session
    • gcloud ai-platform local train --module-name trainer.task --package-path trainer/   --job-dir $MODEL_DIR -- --train-files $TRAIN_DATA --eval-files $EVAL_DATA  --train-steps 1000 --eval-steps 100
  • You can view result in visula format in product tersorboard dashboard
    • tensorboard --logdir=$MODEL_DIR --port=8080
    • In putput you will get url which you can directly use in chrome. it is a local viewer of accuracy and status to check progress and loss

you might face error due to issues with code and versions. Please fix in your system

Step 7: Execute trainer on Distributed Instance Instance

  • To run trainer on GCP AI Platform - single instance
  • ### Create regional Cloud Storage bucket used for all output and staging
    • gsutil mb -l $REGION gs://$DEVSHELL_PROJECT_ID-aip-demo
  • ### Upload training and test/eval data to bucket
    • cd ~/cloudml-samples-master/census/estimator
    • gsutil cp -r data gs://$DEVSHELL_PROJECT_ID-aip-demo/data
  • ### Set data variables to point to storage bucket files
    • TRAIN_DATA=gs://$DEVSHELL_PROJECT_ID-aip-demo/data/
    • EVAL_DATA=gs://$DEVSHELL_PROJECT_ID-aip-demo/data/adult.test.csv
  • ### Copy test.json to storage bucket
    • It will be used later for predictions
    • gsutil cp ../test.json gs://$DEVSHELL_PROJECT_ID-aip-demo/data/test.json
  • ### Set TEST\_JSON to point to the same storage bucket file
    • TEST_JSON=gs://$DEVSHELL_PROJECT_ID-aip-demo/data/test.json

Step 8: Setup Jobs

  • ### Set variables for job name and output path
    • JOB_NAME=census_dist_1
  • ### Submit a single process job to AI Platform
    • ### Job name is JOB\_NAME (census\_dist\_1)
    • ### Output path is our Cloud storage bucket/job\_name
  • ### Training and evaluation/test data is in our Cloud Storage bucket
gcloud ai-platform jobs submit training $JOB_NAME \
--job-dir $OUTPUT_PATH \
--runtime-version 1.15 \
--module-name trainer.task \
--package-path trainer/ \
--region $REGION \
--scale-tier STANDARD_1 \
-- \
--train-files $TRAIN_DATA \
--eval-files $EVAL_DATA \
--train-steps 1000 \
--verbosity DEBUG \
--eval-steps 100

Step 9: View in AI Platform

  • ### Can view streaming logs/output with gcloud ai-platform jobs stream-logs $JOB\_NAME
  • ### When complete, inspect output path with gsutil ls -r $OUTPUT\_PATH

Step 9: Prediction Phase

  • ### Deploy a model for prediction, setting variables in the process
    • cd ~/cloudml-samples-master/census/estimator
    • MODEL_NAME=census
  • ### Create the ML Engine model
    • gcloud ai-platform models create $MODEL_NAME --regions=$REGION

 Step 10: Set the job output we want to use and create model version 

  • ###This example uses census\_dist\_1
    • OUTPUT_PATH=gs://$DEVSHELL_PROJECT_ID-aip-demo/census_dist_1`
  • ##### CHANGE census\_dist\_1 to use a different output from previous
  • ## IMPORTANT - Look up and set full path for export trained model binaries
  • ### gsutil ls -r $OUTPUT\_PATH/export
  • ### Look for directory $OUTPUT\_PATH/export/census/ and copy/paste timestamp value (without colon) into the below command
  • MODEL_BINARIES=gs://$DEVSHELL_PROJECT_ID-aip-demo/census_dist_1/export/census/<timestamp>` ###CHANGE ME!
  • ### Create version 1 of your model

gcloud ai-platform versions create v1 \
--model $MODEL_NAME \
--origin $MODEL_BINARIES \
--runtime-version 1.15

Step 11:  Generate online prediction first

  • ### Send an online prediction request to our deployed model using test.json file
  • ### Results come back with a direct response

gcloud ai-platform predict \
--model $MODEL_NAME \
--version v1 \
--json-instances \

Step 12: Send a batch prediction job using same test.json file

  • ### Results are exported to a Cloud Storage bucket location
  • ### Set job name and output path variables
    • JOB_NAME=census_prediction_2
  • ### Submit the prediction job

gcloud ai-platform jobs submit prediction $JOB_NAME \
--model $MODEL_NAME \
--version v1 \
--data-format TEXT \
--region $REGION \
--input-paths $TEST_JSON \
--output-path $OUTPUT_PATH/predictions

Step 13: View prediction results in web console at


