AI‑Kubeflow Pipeline‑2

Training, Evaluation & Model Selection · Kubernetes‑Native MLOps

Production‑style ML training and evaluation pipelines using Kubeflow Pipelines with containerized components, artifact lineage, and metric‑driven model selection (KFP v2, MinIO, MLMD).

kubeflow/pipelines Kubeflow-pipelines-MLOps Later

Project Summary

MLOps / Platform Engineering

Industry

Cross‑industry Enterprise AI Platforms / MLOps Infrastructure

Domain

Model Training Engineering / Pipeline Orchestration

Key Technologies & Concepts

Kubeflow native keywords

MLOps keywords

Kubeflow Pipelines (KFP v2) Pipeline as Code Containerized ML components Training pipelines (LR, DT) Evaluation metrics (accuracy) Experiment tracking Artifact lineage (MLMD) Model comparison Metric-based selection MinIO artifact store Argo Workflows

Problem & Objective

Why this project exists

Problems Solved

Notebook‑driven training lacks reproducibility & governance
Manual execution → inconsistent environments, fragmented metrics
Limited visibility into experiment lineage and outcomes

Primary Objective

Reproducible, pipeline‑driven ML training & evaluation on Kubeflow
Containerized components + standardized artifact storage + metric‑driven model selection

Solution & Architecture

Kubeflow pipeline flow

Orchestration overview

Kubeflow Pipelines workflow orchestrates multiple containerized components for data ingestion, model training (Logistic Regression and Decision Tree), and evaluation. Each pipeline run produces versioned artifacts and metrics, enabling systematic comparison and model selection based on evaluation criteria.

Flow: Data → Train (LR, DT) → Evaluate → Compare Metrics → (conditional selection)

1Data ingestion

2Train LR

3Train DT

4Evaluate

5Compare / select

Components

Download data (component)
Training (scikit‑learn containers)
Evaluation metrics (accuracy)
Kubeflow Pipelines orchestration
MinIO artifact storage

Platform & services

Kubernetes + Kubeflow (KFP v2 runtime)
Argo Workflows (execution engine)
Docker (containerized training components)
MinIO artifact storage for datasets and outputs
Kubeflow Metadata Store (MLMD) for lineage and run metadata
Kubernetes Pods, Services, Namespaces, RBAC

Scalability & reliability

Stateless pipeline components executed as isolated pods
Retry and failure isolation at step level
Artifact durability via object storage (MinIO)
Re-runnable pipelines with deterministic environments via image versioning

AI / DevOps Details

Automation, orchestration, and observability

AI/ML focus

Model Training, Evaluation & Experiment Orchestration (MLOps – Offline Training).

Pipeline automation

Automated data ingestion component
Parallel/serial training components (Logistic Regression, Decision Tree)
Metric computation and logging (accuracy)
Metric-based model comparison and selection
Pipeline as code (compiled KFP YAML)

CI/CD & orchestration

Docker training component images
Kubeflow Pipelines orchestration
Argo Workflows pipeline execution
Kubernetes runtime scheduling

Monitoring & optimisation

Kubeflow Pipelines UI for run status, DAG visualization, and step-level logs
Kubernetes pod logs for component-level debugging
ML Metadata Store for run lineage and artifact traceability
Metric visibility (accuracy) across experiments for comparative analysis
Failure isolation and targeted re-runs of failed steps

Skills & Technologies

Proficiencies demonstrated

Primary skills

Kubeflow Pipelines (advanced)
Kubernetes‑Native MLOps (advanced)
ML Training Pipeline Design (advanced)
Experiment Tracking & Lineage (advanced)

Secondary tools

scikit‑learn (model framework)
Docker (containerization)
YAML (pipeline/component specifications)
Argo Workflows
MinIO, ML Metadata
Kubeflow UI / SDK
Docker Hub

Programming languages

Python (primary), YAML (pipeline/component specs).

Kubeflow DevOps — Architecture & YAML mapping

Reference: Pipeline‑2 modelling

Architecture Block	Kubeflow CI/CD / MLOps Construct (Pipeline‑2)
Source Repository	GitHub (Kubeflow modeling / pipelines repo)
Source Trigger	Manual trigger (Kubeflow UI / SDK) or CI trigger (GitHub Actions / local pipeline submission)
CI Runner	GitHub Actions Linux Runner (ubuntu-latest) (optional for CI-driven pipeline compilation and submission)
Build / Pipeline Execution	Kubeflow Pipelines (KFP v2: Data → Train → Evaluate → Condition)
Training Orchestration	Kubeflow Pipelines (KFP v2 running on Kubernetes via Argo Workflows)
Data Processing	Kubeflow Pipeline Component (Python + scikit‑learn preprocessing in containerized step)
Model Evaluation	Kubeflow Pipeline Component (Python + scikit‑learn evaluation component; accuracy metric)
Artifact Storage	MinIO (S3-compatible object store for datasets, model artifacts, metrics JSON)
Container Registry	Docker Hub (versioned training component images)
Model Registry	Kubeflow Pipelines run history + ML Metadata Store (MLMD) + MinIO artifact versions (open-source equivalent of managed model registries)
Approval Gate	Pipeline Condition (metric threshold gate for downstream deployment pipeline)
Security & Auth	Kubernetes Service Accounts + RBAC (least-privilege execution identity for pipeline pods)
Secrets / Config	Kubernetes Secrets + Environment Variables (mounted into pipeline containers)
Monitoring & Logs	Kubeflow Pipelines UI + Kubernetes Pod Logs (per-step execution logs)
Lineage & Governance	Kubeflow Pipelines lineage + ML Metadata Store (inputs/outputs, run lineage, artifact traceability)
Infrastructure Backend	Self-managed Kubernetes (Minikube / EKS / AKS / GKE) hosting Kubeflow Pipelines runtime (control plane + workflow engine deployed via Kubeflow manifests)

Pipeline‑2 implements Kubernetes‑native ML training and evaluation using Kubeflow Pipelines, providing the open-source equivalent of managed Vertex AI Pipelines for model experimentation, metric-based selection, and governed promotion to deployment workflows.

Challenges & Resolutions

Technical hurdles

Key challenges

Packaging training code into reproducible container images
Wiring artifact paths between components
Consistent metric logging across variants
Experiment lineage and traceability

Resolutions

Standardised Dockerfiles per component
Kubeflow artifact inputs/outputs for clean handoffs
Centralised accuracy logging for model comparison
ML Metadata Store + KFP run history

Assets & References

Code, diagrams, study material

Kubeflow Official Documentation Repo

Official Kubeflow Pipelines repository and documentation source.

kubeflow/pipelines

Kubeflow Pipelines MLOps

Project repository for Kubeflow Pipelines MLOps implementation.

Kubeflow-pipelines-MLOps

Deployment Link

Deployment link will be added later.

Later

MLOps Study Resources

Kubeflow Pipelines – restricted & public materials

Request Study Material

Note: Additional project materials (KFP v2 YAML, component specs) are shared selectively for interview / evaluation.