AI‑Kubeflow Pipeline‑2

Training, Evaluation & Model Selection · Kubernetes‑Native MLOps

Production‑style ML training and evaluation pipelines using Kubeflow Pipelines with containerized components, artifact lineage, and metric‑driven model selection (KFP v2, MinIO, MLMD).

Project Summary

MLOps / Platform Engineering

Category

AI + MLOps + Platform Engineering (Kubernetes / Kubeflow)

Industry

Cross‑industry Enterprise AI Platforms / MLOps Infrastructure

Domain

Model Training Engineering / Pipeline Orchestration

Key Technologies & Concepts

Kubeflow native keywords

MLOps keywords

Kubeflow Pipelines (KFP v2) Pipeline as Code Containerized ML components Training pipelines (LR, DT) Evaluation metrics (accuracy) Experiment tracking Artifact lineage (MLMD) Model comparison Metric-based selection MinIO artifact store Argo Workflows

Problem & Objective

Why this project exists

Problems Solved

  • Notebook‑driven training lacks reproducibility & governance
  • Manual execution → inconsistent environments, fragmented metrics
  • Limited visibility into experiment lineage and outcomes

Primary Objective

  • Reproducible, pipeline‑driven ML training & evaluation on Kubeflow
  • Containerized components + standardized artifact storage + metric‑driven model selection

Solution & Architecture

Kubeflow pipeline flow

Orchestration overview

Kubeflow Pipelines workflow orchestrates multiple containerized components for data ingestion, model training (Logistic Regression and Decision Tree), and evaluation. Each pipeline run produces versioned artifacts and metrics, enabling systematic comparison and model selection based on evaluation criteria.

Flow: Data → Train (LR, DT) → Evaluate → Compare Metrics → (conditional selection)
Kubeflow DAG (KFP v2)
1Data ingestion
2Train LR
3Train DT
4Evaluate
5Compare / select

Components

  • Download data (component)
  • Training (scikit‑learn containers)
  • Evaluation metrics (accuracy)
  • Kubeflow Pipelines orchestration
  • MinIO artifact storage

Platform & services

  • Kubernetes + Kubeflow (KFP v2 runtime)
  • Argo Workflows (execution engine)
  • Docker (containerized training components)
  • MinIO artifact storage for datasets and outputs
  • Kubeflow Metadata Store (MLMD) for lineage and run metadata
  • Kubernetes Pods, Services, Namespaces, RBAC

Scalability & reliability

  • Stateless pipeline components executed as isolated pods
  • Retry and failure isolation at step level
  • Artifact durability via object storage (MinIO)
  • Re-runnable pipelines with deterministic environments via image versioning

AI / DevOps Details

Automation, orchestration, and observability

AI/ML focus

Model Training, Evaluation & Experiment Orchestration (MLOps – Offline Training).

Pipeline automation

  • Automated data ingestion component
  • Parallel/serial training components (Logistic Regression, Decision Tree)
  • Metric computation and logging (accuracy)
  • Metric-based model comparison and selection
  • Pipeline as code (compiled KFP YAML)

CI/CD & orchestration

  • Docker training component images
  • Kubeflow Pipelines orchestration
  • Argo Workflows pipeline execution
  • Kubernetes runtime scheduling

Monitoring & optimisation

  • Kubeflow Pipelines UI for run status, DAG visualization, and step-level logs
  • Kubernetes pod logs for component-level debugging
  • ML Metadata Store for run lineage and artifact traceability
  • Metric visibility (accuracy) across experiments for comparative analysis
  • Failure isolation and targeted re-runs of failed steps

Skills & Technologies

Proficiencies demonstrated

Primary skills

  • Kubeflow Pipelines (advanced)
  • Kubernetes‑Native MLOps (advanced)
  • ML Training Pipeline Design (advanced)
  • Experiment Tracking & Lineage (advanced)

Secondary tools

  • scikit‑learn (model framework)
  • Docker (containerization)
  • YAML (pipeline/component specifications)
  • Argo Workflows
  • MinIO, ML Metadata
  • Kubeflow UI / SDK
  • Docker Hub

Programming languages

Python (primary), YAML (pipeline/component specs).

Kubeflow DevOps — Architecture & YAML mapping

Reference: Pipeline‑2 modelling

Architecture BlockKubeflow CI/CD / MLOps Construct (Pipeline‑2)
Source RepositoryGitHub (Kubeflow modeling / pipelines repo)
Source TriggerManual trigger (Kubeflow UI / SDK) or CI trigger (GitHub Actions / local pipeline submission)
CI RunnerGitHub Actions Linux Runner (ubuntu-latest) (optional for CI-driven pipeline compilation and submission)
Build / Pipeline ExecutionKubeflow Pipelines (KFP v2: Data → Train → Evaluate → Condition)
Training OrchestrationKubeflow Pipelines (KFP v2 running on Kubernetes via Argo Workflows)
Data ProcessingKubeflow Pipeline Component (Python + scikit‑learn preprocessing in containerized step)
Model EvaluationKubeflow Pipeline Component (Python + scikit‑learn evaluation component; accuracy metric)
Artifact StorageMinIO (S3-compatible object store for datasets, model artifacts, metrics JSON)
Container RegistryDocker Hub (versioned training component images)
Model RegistryKubeflow Pipelines run history + ML Metadata Store (MLMD) + MinIO artifact versions (open-source equivalent of managed model registries)
Approval GatePipeline Condition (metric threshold gate for downstream deployment pipeline)
Security & AuthKubernetes Service Accounts + RBAC (least-privilege execution identity for pipeline pods)
Secrets / ConfigKubernetes Secrets + Environment Variables (mounted into pipeline containers)
Monitoring & LogsKubeflow Pipelines UI + Kubernetes Pod Logs (per-step execution logs)
Lineage & GovernanceKubeflow Pipelines lineage + ML Metadata Store (inputs/outputs, run lineage, artifact traceability)
Infrastructure BackendSelf-managed Kubernetes (Minikube / EKS / AKS / GKE) hosting Kubeflow Pipelines runtime (control plane + workflow engine deployed via Kubeflow manifests)

Pipeline‑2 implements Kubernetes‑native ML training and evaluation using Kubeflow Pipelines, providing the open-source equivalent of managed Vertex AI Pipelines for model experimentation, metric-based selection, and governed promotion to deployment workflows.

Challenges & Resolutions

Technical hurdles

Key challenges

  • Packaging training code into reproducible container images
  • Wiring artifact paths between components
  • Consistent metric logging across variants
  • Experiment lineage and traceability

Resolutions

  • Standardised Dockerfiles per component
  • Kubeflow artifact inputs/outputs for clean handoffs
  • Centralised accuracy logging for model comparison
  • ML Metadata Store + KFP run history

Assets & References

Code, diagrams, study material

Kubeflow Official Documentation Repo

Official Kubeflow Pipelines repository and documentation source.

kubeflow/pipelines

Kubeflow Pipelines MLOps

Project repository for Kubeflow Pipelines MLOps implementation.

Kubeflow-pipelines-MLOps

Deployment Link

Deployment link will be added later.

Later

MLOps Study Resources

Kubeflow Pipelines – restricted & public materials

Request Study Material

Note: Additional project materials (KFP v2 YAML, component specs) are shared selectively for interview / evaluation.

Study Material – Kubeflow Pipelines v2

KFP official documentation + YAML guide
Downloadable PDF (public) – KFP v2, Python SDK
Download
Restricted: KFP pipeline‑2 component specs
Access limited to authorised users – container definitions, metrics
Download
Architecture diagrams (training & evaluation)
High‑res: Data → Train(LR/DT) → Evaluate → Compare
Download
ML Metadata & lineage best practices
MLMD + MinIO artifact tracking
Download
Kubeflow + Argo workflows deep dive
Execution engine & pod isolation
Download