Agent Platform Training & Evaluation Pipeline

Kubeflow Pipelines v2 · Production‑grade MLOps on GCP

Production‑grade ML training, evaluation, gating, and conditional deployment pipeline on Google Agent Platform using Kubeflow Pipelines (KFP v2). Enforces model quality, tracks lineage, and deploys only validated models.

Project Summary

AI + MLOps + Cloud Platform Engineering

Industry

Cross‑industry Enterprise AI Platform

MLOps Focus

Training · Evaluation · Gating · Conditional Deploy

Key Technologies & Concepts

ML/AI platform primitives

Agent Platform Pipelines (KFP v2) Kubeflow Pipelines SDK Agent Platform Training Agent Platform Metadata Store Google Cloud Storage Artifact Registry Service Accounts & IAM Workload Identity Federation Conditional Pipelines (eval gate) Agent Platform Endpoints Cloud Logging ML Governance · Lineage

Problem & Objective

Why this pipeline exists

Problem

Manual, notebook‑driven ML workflows lack reproducibility, governance, automated evaluation gates, and production discipline. No structured way to enforce model quality before deployment in GCP.

Objective

Build a production‑grade, automated ML training/evaluation pipeline on GCP that enforces quality gates, tracks lineage, and conditionally deploys models to Agent Platform endpoints using native MLOps primitives.

Solution & Architecture

Agent Platform native orchestration

Overview

Agent Platform Pipelines (KFP v2) orchestrates data preparation, model training (RandomForest), evaluation (ROC, confusion matrix, accuracy), quality gating, conditional deployment to Agent Platform Endpoints, and scheduled retraining.

Our platform automates machine learning workflows using Components as modular building blocks for specific tasks. These are orchestrated via a DSL (Domain Specific Language), which serves as the instruction manual for connecting them, while Conditions provide the "if‑then" logic to ensure the pipeline makes smart, real‑time decisions during execution.

Representation: @dsl.component or @component · @dsl.pipeline · with dsl.Condition(accuracy > 0.8):

Managed training • serverless orchestration • artifact persistence in GCS • conditional gates

GCS → Data prep → Train (Agent Platform Training) → Eval (ROC/CM) → Quality gate → Conditional deploy → Endpoint

1

GitHub / Trigger

2

Agent Platform Pipeline

3

Train (RF)

4

Eval + Gate

5

Deploy / Registry

Skills & Technologies

ML/platform engineering stack

Primary (Advanced)

MLOps Architecture
Agent Platform Pipelines / KFP v2
Cloud AI Platform Engineering
Production ML Workflow Design

Secondary

Kubeflow Pipelines SDK
scikit‑learn · Agent Platform SDK
GCS · IAM · Workload Identity
GitHub Actions (CI trigger)

Languages & DevOps

PythonYAMLKFP componentsAgent PlatformGitHub Actions

Pipeline Execution & Governance

Conditional gates, lineage, scheduling

Execution

Manual / CI trigger → Agent Platform Pipeline run
KFP v2 components: data prep, training, evaluation, deploy
Artifacts stored in GCS, metrics in Agent Platform Metadata

Governance

Explicit evaluation gate (accuracy/ROC threshold)
Conditional pipeline branch: deploy only if gate passes
Model versioning in Agent Platform Model Registry
IAM least‑privilege + Workload Identity Federation

Challenges & Resolutions

Wiring KFP v2 components → Agent Platform Pipelines: used native KFP interfaces.
ROC/metrics logging: sanitized inputs for Agent Platform metrics APIs.
Conditional gates: pipeline condition with threshold check.
Model format for serving: packaged as Agent Platform-compatible artifact.
Notebook to production: refactored into pipeline components.

GCP CI/CD · Architecture & YAML Mapping

Pipeline‑2 model training, evaluation, and governance constructs

Architecture Block	GCP CI/CD / MLOps Construct (Pipeline‑2 – Modelling)	YAML / Pipeline Spec Mapping
Source Repository	GitHub (modeling / pipelines repo)	`repository`, `workflow.checkout`
Source Trigger	Manual / CI trigger (GitHub Actions or local notebook execution)	`on.workflow_dispatch`, `on.push`, `notebook_runtime`
CI Runner	GitHub Actions Linux Runner (ubuntu-latest), optional for CI-driven runs	`jobs.pipeline.runs-on: ubuntu-latest`
Build / Pipeline Execution	Agent Platform Pipelines (KFP v2: Data → Train → Evaluate → Condition)	`pipelineSpec.root`, `pipelineInfo.name`, `deploymentSpec`
Training Orchestration	Agent Platform Pipelines (KFP v2)	`@dsl.pipeline`, `tasks.train`
Data Processing	Agent Platform Pipeline Component (Pandas + Scikit‑Learn preprocessing)	`@dsl.component`, `components.data-prep`
Model Training	RandomForestClassifier training pipeline / managed training runtime	`components.train.container.image`, `args`, `model_output`
Model Evaluation	Pipeline component for ROC, Confusion Matrix, Accuracy	`components.evaluate.outputs.metrics`, `classificationMetrics`
Artifact Storage	Google Cloud Storage (datasets, model artifacts, metrics JSON)	`pipeline_root: gs://...`, `artifact_uri`, `metrics_path`
Container Registry	Artifact Registry (Agent Platform managed serving container)	`image.repository`, `image.tag`
Model Registry	Agent Platform Model Registry (governed model versions)	`components.upload-model`, `model.display_name`, `version_aliases`
Approval Gate	Pipeline Condition (metric threshold gate for deployment)	`with dsl.Condition(accuracy > 0.8)`, `threshold`
Security & Auth	GCP Service Account + IAM (least privilege for pipelines)	`service_account`, `roles/aiplatform.user`, `roles/storage.objectAdmin`
Secrets / Config	Environment variables + GCP IAM, optionally Secret Manager	`env.PROJECT_ID`, `env.REGION`, `env.BUCKET_URI`, `secretEnv`
Monitoring & Logs	Agent Platform Pipelines UI + Cloud Logging	`pipeline_job_name`, `logging.enabled`
Lineage & Governance	Agent Platform Pipelines lineage + Model Registry versions	`metadata`, `metrics`, `artifact.uri`
Infrastructure Backend	Agent Platform Managed Pipelines (no separate IaC needed)	`managed_pipeline: true`, `location`

Pipeline‑2 standardizes reproducible training workflows, centralized GCS artifacts, metric logging (ROC, confusion matrix, accuracy), and governed model versioning for controlled promotion toward deployment.

Complete Project Details

All content from the Pipeline‑2 PDF

Project Summary

Project Name: AI‑GCP Pipeline‑2 – Agent Platform Training & Evaluation Pipeline
One‑Line Description: Production‑grade ML training, evaluation, gating, and conditional deployment pipeline on Google Agent Platform using Kubeflow Pipelines (KFP v2).
Category: AI + MLOps + Cloud Platform Engineering
Industry: Cross‑industry (Enterprise AI Platform / MLOps Infrastructure)
Domain: Machine Learning Platform Engineering / AI Model Lifecycle Automation

Key Words

Agent Platform Pipelines (KFP v2 Orchestration)
Kubeflow Pipelines SDK (Pipeline as Code)
Agent Platform Training Jobs (Managed Training Runtime)
Agent Platform Metadata Store (Lineage & Governance)
Google Cloud Storage (Datasets, Models, Metrics Artifacts)
Artifact Registry (Training / Inference Containers)
Service Accounts & IAM (Least‑Privilege MLOps Security)
Workload Identity Federation (GitHub → GCP Auth)
Conditional Pipelines (Evaluation Gate → Deploy)
Agent Platform Model Upload (Model Registry Equivalent)
Agent Platform Endpoints (Online Inference Targets)
Pipeline Scheduling (Agent Platform Pipeline Scheduler)
Cloud Logging (Training / Pipeline Logs)
ML Governance (Metadata, Metrics, Model Lineage)

Problem Solved

Manual, notebook‑driven ML workflows lack reproducibility, governance, automated evaluation gates, and production deployment discipline. There was no structured way to enforce model quality before deployment in GCP.

Primary Objective

Build a production‑grade, automated ML training and evaluation pipeline on GCP that enforces quality gates, tracks lineage, and conditionally deploys models to Agent Platform endpoints using platform‑native MLOps primitives.

Solution & Architecture

Implemented an Agent Platform Pipelines (KFP v2) based ML pipeline that performs data preparation, model training, evaluation (ROC, confusion matrix, accuracy), quality gating, conditional deployment to Agent Platform Endpoints, and scheduled retraining.

Our platform automates machine learning workflows using Components as modular building blocks for specific tasks. These are orchestrated via a DSL (Domain Specific Language), which serves as the instruction manual for connecting them, while Conditions provide the if‑then logic to ensure smart, real‑time decisions during execution.

Representation: @dsl.component or @component; @dsl.pipeline; with dsl.Condition(accuracy > 0.8):
Cloud Platform: Google Cloud Platform (Agent Platform)
Components: Agent Platform Pipelines, KFP v2 Components, managed training runtime, endpoints, GCS, Artifact Registry, Agent Platform Metadata Store, Service Accounts + IAM
Reliability: managed training jobs, serverless orchestration, GCS persistence, idempotent re‑runnable steps, and conditional deployment gates

AI / DevOps Details

Focus: Supervised ML training + MLOps automation (training, evaluation, gating, deployment)
Implemented: RandomForestClassifier training pipeline; Data → Train → Evaluate → Gate → Deploy; ROC, confusion matrix, accuracy logging; conditional deployment logic; scheduled retraining pipelines
CI/CD / Orchestration: GitHub Actions, Kubeflow Pipelines v2, Agent Platform Pipelines, optional Artifact Registry for containerized components

Monitoring, Logging & Optimization

Agent Platform Pipelines UI for observability
Cloud Logging for job‑level logs
Agent Platform Metadata Store for metrics + lineage
Model KPI logging with accuracy thresholds for gating

Skills & Technologies Used

Primary: MLOps Architecture, Agent Platform Pipelines / KFP v2, Cloud AI Platform Engineering, Production ML Workflow Design — Advanced
Secondary: Kubeflow Pipelines SDK, scikit‑learn, Agent Platform SDK (Python), Google Cloud Storage, GitHub Actions
Languages: Python (primary), YAML (configuration / pipeline specs where applicable)
Cloud & DevOps: Google Agent Platform, GCS, Artifact Registry, IAM / Service Accounts, GitHub Actions, Workload Identity Federation

Challenges & Resolutions

Wiring KFP v2 components correctly with Agent Platform Pipelines → used KFP v2 native component interfaces
ROC / metrics logging compatibility → sanitized ROC inputs to satisfy metrics APIs
Conditional deployment gates → implemented explicit evaluation gates with pipeline conditions
Model artifact formats for serving → packaged models to match serving container expectations
Notebook‑level code to production pipeline → converted notebook workflows into pipeline‑native components

GCP Production‑Grade Implementation Details

Architecture: Agent Platform Pipelines → Training → Evaluation → Conditional Deployment → Endpoints; artifact persistence in GCS; lineage in Agent Platform Metadata Store.

High‑level flow: GitHub Trigger → Agent Platform Pipeline Execution → Data Prep Component → Training Component (Agent Platform Training) → Evaluation Component (ROC / Accuracy / Confusion Matrix) → Quality Gate → Conditional Deployment to Agent Platform Endpoint → Scheduled Re‑training
Architecture implemented on GCP: Raw Data → Agent Platform Pipeline (Data Prep → Train → Evaluate → Gate) → Model Artifacts (GCS) → Agent Platform Model Registry → Approved Model for Deployment
Top lane — Training & Evaluation Path: Raw Dataset (GCS / External Source) → Data Preparation Component → Custom Training Job → Model Evaluation → Quality Gate → Model Upload
Bottom lane — Experiment Tracking & Lineage: Training & Evaluation Runs → Pipelines Lineage / Experiments → Metrics, Artifacts, Parameters stored in GCS → Governed Model Versioning in Model Registry
The main project document has detailed view.

Assets & References

GitHub / Repository Link: https://github.com/Rajesh-Arigala/vertex-ai-mlops-kfp2
Notebook: Vertex_AI_kfp2_pipeline.ipynb
Weblink: https://rajesharigala.com/mlops/ai4/ai4.2
Proof Link: later

Study Material

Public: Official documentation of KFP, YAML file for GCP, Python SDK; downloadable PDF if available
Restricted: KFP file specific, Colab Google specific; downloadable PDF with access limited to authorised users

Pipeline‑2 Summary

Production‑grade model development and orchestration on Google Cloud using Agent Platform Pipelines (KFP v2) to automate data preparation, model training, evaluation, and quality gates. This layer standardizes reproducible training workflows, centralized artifact storage in GCS, metric logging into experiments, and governed model versioning via the Model Registry, enabling controlled promotion of validated models toward deployment.

Assets & References

Code, diagrams, study material

Repository

Full training/evaluation pipeline code, components, and deployment specs.

vertex-ai-mlops-kfp2

Notebook

KFP v2 notebook for Agent Platform pipeline implementation.

Vertex_AI_kfp2_pipeline.ipynb

Weblink

Published project page for Pipeline‑2.

rajesharigala.com/mlops/ai4/ai4.2

Proof Link

Proof link placeholder from the project brief.

Proof link: later

Study Material Resources

Official docs, restricted KFP guides, Colab notebooks

Request Study Material

Agent Platform Training & Evaluation Pipeline

Project Summary

Category

Industry

MLOps Focus

Key Technologies & Concepts

Problem & Objective

Problem

Objective

Solution & Architecture

Overview

Skills & Technologies

Primary (Advanced)

Secondary

Languages & DevOps

Pipeline Execution & Governance

Execution

Governance

Challenges & Resolutions

GCP CI/CD · Architecture & YAML Mapping

Complete Project Details

Project Summary

Key Words

Problem Solved

Primary Objective

Solution & Architecture

AI / DevOps Details

Monitoring, Logging & Optimization

Skills & Technologies Used

Challenges & Resolutions

GCP Production‑Grade Implementation Details

Assets & References

Study Material

Pipeline‑2 Summary

Assets & References

Repository

Notebook

Weblink

Proof Link

Study Material Resources

Study Material – Agent Platform MLOps