Agent Platform Training & Evaluation Pipeline

Kubeflow Pipelines v2 · Production‑grade MLOps on GCP

Production‑grade ML training, evaluation, gating, and conditional deployment pipeline on Google Agent Platform using Kubeflow Pipelines (KFP v2). Enforces model quality, tracks lineage, and deploys only validated models.

Project Summary

AI + MLOps + Cloud Platform Engineering

Category

AI/ML · MLOps · Platform Engineering

Industry

Cross‑industry Enterprise AI Platform

MLOps Focus

Training · Evaluation · Gating · Conditional Deploy

Key Technologies & Concepts

ML/AI platform primitives

Agent Platform Pipelines (KFP v2) Kubeflow Pipelines SDK Agent Platform Training Agent Platform Metadata Store Google Cloud Storage Artifact Registry Service Accounts & IAM Workload Identity Federation Conditional Pipelines (eval gate) Agent Platform Endpoints Cloud Logging ML Governance · Lineage

Problem & Objective

Why this pipeline exists

Problem

Manual, notebook‑driven ML workflows lack reproducibility, governance, automated evaluation gates, and production discipline. No structured way to enforce model quality before deployment in GCP.

Objective

Build a production‑grade, automated ML training/evaluation pipeline on GCP that enforces quality gates, tracks lineage, and conditionally deploys models to Agent Platform endpoints using native MLOps primitives.

Solution & Architecture

Agent Platform native orchestration

Overview

Agent Platform Pipelines (KFP v2) orchestrates data preparation, model training (RandomForest), evaluation (ROC, confusion matrix, accuracy), quality gating, conditional deployment to Agent Platform Endpoints, and scheduled retraining.

Our platform automates machine learning workflows using Components as modular building blocks for specific tasks. These are orchestrated via a DSL (Domain Specific Language), which serves as the instruction manual for connecting them, while Conditions provide the "if‑then" logic to ensure the pipeline makes smart, real‑time decisions during execution.

Representation: @dsl.component or @component · @dsl.pipeline · with dsl.Condition(accuracy > 0.8):
Managed training • serverless orchestration • artifact persistence in GCS • conditional gates
 GCS → Data prep → Train (Agent Platform Training) → Eval (ROC/CM) → Quality gate → Conditional deploy → Endpoint
1
GitHub / Trigger
2
Agent Platform Pipeline
3
Train (RF)
4
Eval + Gate
5
Deploy / Registry

Skills & Technologies

ML/platform engineering stack

Primary (Advanced)

  • MLOps Architecture
  • Agent Platform Pipelines / KFP v2
  • Cloud AI Platform Engineering
  • Production ML Workflow Design

Secondary

  • Kubeflow Pipelines SDK
  • scikit‑learn · Agent Platform SDK
  • GCS · IAM · Workload Identity
  • GitHub Actions (CI trigger)

Languages & DevOps

PythonYAMLKFP componentsAgent PlatformGitHub Actions

Pipeline Execution & Governance

Conditional gates, lineage, scheduling

Execution

  • Manual / CI trigger → Agent Platform Pipeline run
  • KFP v2 components: data prep, training, evaluation, deploy
  • Artifacts stored in GCS, metrics in Agent Platform Metadata

Governance

  • Explicit evaluation gate (accuracy/ROC threshold)
  • Conditional pipeline branch: deploy only if gate passes
  • Model versioning in Agent Platform Model Registry
  • IAM least‑privilege + Workload Identity Federation

Challenges & Resolutions

Wiring KFP v2 components → Agent Platform Pipelines: used native KFP interfaces.
ROC/metrics logging: sanitized inputs for Agent Platform metrics APIs.
Conditional gates: pipeline condition with threshold check.
Model format for serving: packaged as Agent Platform-compatible artifact.
Notebook to production: refactored into pipeline components.

GCP CI/CD · Architecture & YAML Mapping

Pipeline‑2 model training, evaluation, and governance constructs

Architecture BlockGCP CI/CD / MLOps Construct (Pipeline‑2 – Modelling)YAML / Pipeline Spec Mapping
Source RepositoryGitHub (modeling / pipelines repo)repository, workflow.checkout
Source TriggerManual / CI trigger (GitHub Actions or local notebook execution)on.workflow_dispatch, on.push, notebook_runtime
CI RunnerGitHub Actions Linux Runner (ubuntu-latest), optional for CI-driven runsjobs.pipeline.runs-on: ubuntu-latest
Build / Pipeline ExecutionAgent Platform Pipelines (KFP v2: Data → Train → Evaluate → Condition)pipelineSpec.root, pipelineInfo.name, deploymentSpec
Training OrchestrationAgent Platform Pipelines (KFP v2)@dsl.pipeline, tasks.train
Data ProcessingAgent Platform Pipeline Component (Pandas + Scikit‑Learn preprocessing)@dsl.component, components.data-prep
Model TrainingRandomForestClassifier training pipeline / managed training runtimecomponents.train.container.image, args, model_output
Model EvaluationPipeline component for ROC, Confusion Matrix, Accuracycomponents.evaluate.outputs.metrics, classificationMetrics
Artifact StorageGoogle Cloud Storage (datasets, model artifacts, metrics JSON)pipeline_root: gs://..., artifact_uri, metrics_path
Container RegistryArtifact Registry (Agent Platform managed serving container)image.repository, image.tag
Model RegistryAgent Platform Model Registry (governed model versions)components.upload-model, model.display_name, version_aliases
Approval GatePipeline Condition (metric threshold gate for deployment)with dsl.Condition(accuracy > 0.8), threshold
Security & AuthGCP Service Account + IAM (least privilege for pipelines)service_account, roles/aiplatform.user, roles/storage.objectAdmin
Secrets / ConfigEnvironment variables + GCP IAM, optionally Secret Managerenv.PROJECT_ID, env.REGION, env.BUCKET_URI, secretEnv
Monitoring & LogsAgent Platform Pipelines UI + Cloud Loggingpipeline_job_name, logging.enabled
Lineage & GovernanceAgent Platform Pipelines lineage + Model Registry versionsmetadata, metrics, artifact.uri
Infrastructure BackendAgent Platform Managed Pipelines (no separate IaC needed)managed_pipeline: true, location

Pipeline‑2 standardizes reproducible training workflows, centralized GCS artifacts, metric logging (ROC, confusion matrix, accuracy), and governed model versioning for controlled promotion toward deployment.

Complete Project Details

All content from the Pipeline‑2 PDF

Project Summary

  • Project Name: AI‑GCP Pipeline‑2 – Agent Platform Training & Evaluation Pipeline
  • One‑Line Description: Production‑grade ML training, evaluation, gating, and conditional deployment pipeline on Google Agent Platform using Kubeflow Pipelines (KFP v2).
  • Category: AI + MLOps + Cloud Platform Engineering
  • Industry: Cross‑industry (Enterprise AI Platform / MLOps Infrastructure)
  • Domain: Machine Learning Platform Engineering / AI Model Lifecycle Automation

Key Words

  • Agent Platform Pipelines (KFP v2 Orchestration)
  • Kubeflow Pipelines SDK (Pipeline as Code)
  • Agent Platform Training Jobs (Managed Training Runtime)
  • Agent Platform Metadata Store (Lineage & Governance)
  • Google Cloud Storage (Datasets, Models, Metrics Artifacts)
  • Artifact Registry (Training / Inference Containers)
  • Service Accounts & IAM (Least‑Privilege MLOps Security)
  • Workload Identity Federation (GitHub → GCP Auth)
  • Conditional Pipelines (Evaluation Gate → Deploy)
  • Agent Platform Model Upload (Model Registry Equivalent)
  • Agent Platform Endpoints (Online Inference Targets)
  • Pipeline Scheduling (Agent Platform Pipeline Scheduler)
  • Cloud Logging (Training / Pipeline Logs)
  • ML Governance (Metadata, Metrics, Model Lineage)

Problem Solved

Manual, notebook‑driven ML workflows lack reproducibility, governance, automated evaluation gates, and production deployment discipline. There was no structured way to enforce model quality before deployment in GCP.

Primary Objective

Build a production‑grade, automated ML training and evaluation pipeline on GCP that enforces quality gates, tracks lineage, and conditionally deploys models to Agent Platform endpoints using platform‑native MLOps primitives.

Solution & Architecture

Implemented an Agent Platform Pipelines (KFP v2) based ML pipeline that performs data preparation, model training, evaluation (ROC, confusion matrix, accuracy), quality gating, conditional deployment to Agent Platform Endpoints, and scheduled retraining.

Our platform automates machine learning workflows using Components as modular building blocks for specific tasks. These are orchestrated via a DSL (Domain Specific Language), which serves as the instruction manual for connecting them, while Conditions provide the if‑then logic to ensure smart, real‑time decisions during execution.

  • Representation: @dsl.component or @component; @dsl.pipeline; with dsl.Condition(accuracy > 0.8):
  • Cloud Platform: Google Cloud Platform (Agent Platform)
  • Components: Agent Platform Pipelines, KFP v2 Components, managed training runtime, endpoints, GCS, Artifact Registry, Agent Platform Metadata Store, Service Accounts + IAM
  • Reliability: managed training jobs, serverless orchestration, GCS persistence, idempotent re‑runnable steps, and conditional deployment gates

AI / DevOps Details

  • Focus: Supervised ML training + MLOps automation (training, evaluation, gating, deployment)
  • Implemented: RandomForestClassifier training pipeline; Data → Train → Evaluate → Gate → Deploy; ROC, confusion matrix, accuracy logging; conditional deployment logic; scheduled retraining pipelines
  • CI/CD / Orchestration: GitHub Actions, Kubeflow Pipelines v2, Agent Platform Pipelines, optional Artifact Registry for containerized components

Monitoring, Logging & Optimization

  • Agent Platform Pipelines UI for observability
  • Cloud Logging for job‑level logs
  • Agent Platform Metadata Store for metrics + lineage
  • Model KPI logging with accuracy thresholds for gating

Skills & Technologies Used

  • Primary: MLOps Architecture, Agent Platform Pipelines / KFP v2, Cloud AI Platform Engineering, Production ML Workflow Design — Advanced
  • Secondary: Kubeflow Pipelines SDK, scikit‑learn, Agent Platform SDK (Python), Google Cloud Storage, GitHub Actions
  • Languages: Python (primary), YAML (configuration / pipeline specs where applicable)
  • Cloud & DevOps: Google Agent Platform, GCS, Artifact Registry, IAM / Service Accounts, GitHub Actions, Workload Identity Federation

Challenges & Resolutions

  • Wiring KFP v2 components correctly with Agent Platform Pipelines → used KFP v2 native component interfaces
  • ROC / metrics logging compatibility → sanitized ROC inputs to satisfy metrics APIs
  • Conditional deployment gates → implemented explicit evaluation gates with pipeline conditions
  • Model artifact formats for serving → packaged models to match serving container expectations
  • Notebook‑level code to production pipeline → converted notebook workflows into pipeline‑native components

GCP Production‑Grade Implementation Details

Architecture: Agent Platform Pipelines → Training → Evaluation → Conditional Deployment → Endpoints; artifact persistence in GCS; lineage in Agent Platform Metadata Store.

  • High‑level flow: GitHub Trigger → Agent Platform Pipeline Execution → Data Prep Component → Training Component (Agent Platform Training) → Evaluation Component (ROC / Accuracy / Confusion Matrix) → Quality Gate → Conditional Deployment to Agent Platform Endpoint → Scheduled Re‑training
  • Architecture implemented on GCP: Raw Data → Agent Platform Pipeline (Data Prep → Train → Evaluate → Gate) → Model Artifacts (GCS) → Agent Platform Model Registry → Approved Model for Deployment
  • Top lane — Training & Evaluation Path: Raw Dataset (GCS / External Source) → Data Preparation Component → Custom Training Job → Model Evaluation → Quality Gate → Model Upload
  • Bottom lane — Experiment Tracking & Lineage: Training & Evaluation Runs → Pipelines Lineage / Experiments → Metrics, Artifacts, Parameters stored in GCS → Governed Model Versioning in Model Registry
  • The main project document has detailed view.

Assets & References

Study Material

  • Public: Official documentation of KFP, YAML file for GCP, Python SDK; downloadable PDF if available
  • Restricted: KFP file specific, Colab Google specific; downloadable PDF with access limited to authorised users

Pipeline‑2 Summary

Production‑grade model development and orchestration on Google Cloud using Agent Platform Pipelines (KFP v2) to automate data preparation, model training, evaluation, and quality gates. This layer standardizes reproducible training workflows, centralized artifact storage in GCS, metric logging into experiments, and governed model versioning via the Model Registry, enabling controlled promotion of validated models toward deployment.

Assets & References

Code, diagrams, study material

Repository

Full training/evaluation pipeline code, components, and deployment specs.

vertex-ai-mlops-kfp2

Notebook

KFP v2 notebook for Agent Platform pipeline implementation.

Vertex_AI_kfp2_pipeline.ipynb

Weblink

Published project page for Pipeline‑2.

rajesharigala.com/mlops/ai4/ai4.2

Proof Link

Proof link placeholder from the project brief.

Proof link: later

Study Material Resources

Official docs, restricted KFP guides, Colab notebooks

Request Study Material

Study Material – Agent Platform MLOps

KFP v2 official documentation
Pipeline as Code for Agent Platform
Download
YAML pipeline specs (GCP)
component definitions, conditionals
Download
KFP file specific + Colab notebooks
restricted – authorised users only
Download
Agent Platform Metadata & lineage
governance deep dive
Download