SageMaker Modelling Pipeline

Train · Evaluate · Register (MLOps quality gates)

Fully automated ML training pipeline on AWS: data preprocessing, XGBoost training, evaluation with conditional quality gate (MSE), and governed model registration in SageMaker Model Registry. Triggered via GitHub Actions + OIDC.

Project Summary

MLOps · Pipeline-2 (Train/Evaluate/Register)

AI/ML type

Supervised regression (XGBoost) + automated MLOps

Domain

AWS SageMaker · Model building & registration

DevOps focus

CI/CD for ML (GitHub Actions + OIDC + SageMaker Pipelines)

Key technologies

AWS + CI/CD + containers

SageMaker Pipelines GitHub Actions OIDC → IAM Amazon ECR S3 (artifacts) Model Registry CloudWatch XGBoost / Python quality gates

Problem & Objective

Why this pipeline?

Problems solved

  • Manual training → non‑reproducible experiments, inconsistent evaluation
  • No quality gates → bad models could reach production
  • Lack of governance & traceability

Primary objectives

  • Fully automated train/eval/register pipeline on AWS
  • Conditional registration based on MSE threshold
  • Integrate CI/CD with GitHub Actions + OIDC (no static secrets)

Solution & Architecture

SageMaker Pipeline · quality gate

Overview

SageMaker Pipeline with steps: Process → Train → Evaluate → Conditional Register. Triggered via GitHub Actions (Linux runner) that assumes IAM role through OIDC. Evaluation metrics (MSE) are written to JSON; only models that pass threshold are registered in SageMaker Model Registry with approval workflow enabled. Artifacts stored in S3, images in ECR.

Quality gate: if MSE > threshold, pipeline stops – bad model never registered.
GitHub → OIDC → IAM → SageMaker Pipeline (Process ▸ Train ▸ Eval ▸ Condition) → Registry
1
Git push
2
GitHub Actions
3
OIDC assume
4
SageMaker Pipeline
5
Model Registry
SageMaker modelling pipeline reference visual
SageMaker modelling pipeline YAML mapping visual

Skills & Technologies

ML engineering + AWS

Primary skills

  • Amazon SageMaker Pipelines (advanced)
  • MLOps: training/evaluation/registry (advanced)
  • CI/CD with GitHub Actions + OIDC (advanced)
  • Docker, ECR, IAM, CloudWatch

Languages & tools

  • Python (SDK, processing/training scripts)
  • YAML (GitHub workflows, pipeline config)
  • XGBoost / scikit-learn

Pipeline execution & governance

conditional registration + approvals

Execution

  • Trigger: push to main or workflow_dispatch
  • GitHub runner: ubuntu-latest, assumes IAM role via OIDC
  • SageMaker Pipeline execution: processing (prep), training (XGBoost), evaluation (MSE to JSON)
  • Conditional step: if metrics pass → register model (status pending/manual approval)

Governance & controls

  • Quality gate on MSE – auto‑reject underperforming models
  • Model Registry approval workflow (manual approval step before production)
  • Least‑privilege IAM roles + KMS encryption for S3
  • CloudWatch logs for every job (traceability)

AWS CI/CD · YAML mapping

GitHub Actions → SageMaker Pipeline

Architecture blockAWS / YAML construct
Source repositoryGitHub (modeling / pipeline repository)
CI triggeron: push / workflow_dispatch
RunnerGitHub Actions Linux runner (ubuntu-latest)
Authenticationaws-actions/configure-aws-credentials (OIDC → IAM Role)
Runtime setupsetup-python (SageMaker Pipeline SDK + dependencies)
Dependency installpip install requirements
Pipeline packagingPackage pipeline code and dependencies
Pipeline create/updaterun-pipeline creates or updates SageMaker Pipeline
Pipeline executionStart SageMaker Pipeline execution
ProcessingSageMaker Processing Jobs for preprocessing and evaluation
TrainingSageMaker Training Jobs train model and store artifacts in S3
Evaluation metricsmetrics.json written to S3 and checked against quality gates
Conditional gateCondition step blocks failed models from registration
Model registrySageMaker Model Registry registration when thresholds pass
Approval statusModel version set to Pending/Approved for downstream Pipeline‑3 deployment
Artifact storageS3 (datasets, model.tar.gz, metrics.json)
Container registryAmazon ECR (custom images)
EncryptionKMS encryption for S3 artifacts (optional)
Publish statusPublish metrics / pipeline status back to CI
LogsGitHub Actions + SageMaker Pipelines + CloudWatch Logs

YAML steps: checkout, configure AWS credentials, setup-python, pip install requirements, run-pipeline, start execution, wait for completion, publish metrics/status.

AWS DevOps CI/CD – Reference Architecture

Pipeline-2: Modelling / Train–Evaluate–Register

Architecture Flow:

  1. Developer pushes changes to GitHub (modeling / pipeline repository).
  2. GitHub Actions workflow is triggered (push / manual dispatch).
  3. GitHub Actions assumes AWS IAM Role via OIDC (no static credentials).
  4. CI job packages pipeline code and dependencies.
  5. SageMaker Pipeline is created/updated (Process → Train → Evaluate → Condition).
  6. SageMaker Processing Jobs run data preprocessing and evaluation.
  7. SageMaker Training Jobs train models and store artifacts in S3.
  8. Evaluation metrics are written to JSON and checked against quality gates.
  9. If metrics pass thresholds, model is registered in SageMaker Model Registry.
  10. Model version is set to Pending/Approved for downstream deployment (Pipeline-3).
  11. Logs and status are available in GitHub Actions + SageMaker Pipelines + CloudWatch.

YAML Mapping (GitHub Actions → SageMaker Pipelines):

  • Trigger: on push / workflow_dispatch
  • Auth: aws-actions/configure-aws-credentials (OIDC → IAM Role)
  • Runtime: setup-python (pipeline SDK + deps)
  • Steps:
    • pip install requirements
    • run-pipeline (create/update SageMaker Pipeline)
    • start pipeline execution
    • wait for completion (optional)
    • publish metrics / status

Security & Guardrails:

  • OIDC-based IAM roles for CI (no secrets in CI)
  • Least-privilege IAM roles for SageMaker jobs (processing/training)
  • KMS encryption for S3 artifacts (optional)
  • Quality gates on evaluation metrics (block bad models)
  • Model Registry approvals before production promotion

Assets & references

Code, diagrams, study material

MLOps CDK GitHub Action

Infrastructure-as-code and GitHub Actions workflow for platform provisioning.

View GitHub repo

Psitron ML Build

Build repository for the ML pipeline stage and related implementation assets.

View GitHub repo

Psitron ML Deploy

Deployment repository for ML release workflows and production delivery assets.

View GitHub repo

Study Material Resources

Official docs, restricted guides, workflow notes

Request Study Material

SageMaker MLOps · study pack

AWS SageMaker Pipeline deep dive
Official documentation + custom YAML examples
PDF
CDK for ML pipelines
Restricted · infrastructure as code for Pipeline-2
PDF
OIDC + GitHub Actions AWS auth
Step‑by‑step guide with IAM policies
PDF
Model Registry & approval workflows
Governance patterns, manual approval setup
PDF