End-to-end lineage with DVC and Amazon SageMaker AI MLflow apps

April 21, 2026 less than 1 minute read

Production ML teams often struggle to trace the full lineage of a model back to the exact data and code that trained it. In this post, we close that gap by combining DVC for data versioning, Amazon SageMaker AI for scalable processing and training, and Amazon SageMaker AI MLflow Apps for experiment tracking and model registry — turning multi-day audit investigations into a single query.

Full text here, and GitHub repository here

We walk through two deployable patterns you can run end-to-end in your own AWS account: a foundational dataset-level lineage pattern, where every MLflow run logs the DVC commit hash (data_git_commit_id) that points to the exact versioned dataset in Amazon S3; and a record-level lineage pattern for regulated environments (healthcare, financial services, GDPR opt-out scenarios) that adds manifests and a consent registry so you can answer questions like “which models were trained on patient X’s data?” instantly from MLflow artifacts. The result is a clean separation of concerns — DVC owns data-to-training lineage, MLflow owns training-to-deployment lineage, and the Git commit hash ties them together.

Share on

Mastodon Twitter Facebook LinkedIn

Simplify ModelOps with Amazon SageMaker AI Projects using Amazon S3-based templates

January 30, 2026

Managing ModelOps workflows can be complex and time-consuming. Amazon SageMaker AI Projects now offers an easier path with Amazon S3-based templates. With this new capability, you can store AWS CloudFormation templates directly in Amazon S3 and manage their entire lifecycle using familiar S3 feat...

Track LLM model evaluation using Amazon SageMaker managed MLflow and FMEval

January 28, 2025

In this post, we show how to use FMEval and Amazon SageMaker to programmatically evaluate LLMs. FMEval is an open source LLM evaluation library, designed to provide data scientists and machine learning (ML) engineers with a code-first experience to evaluate LLMs for various aspects, including acc...

Secure MLflow in AWS Fine-grained access control with AWS native services

May 08, 2023

MLflow and Amazon SageMaker are two of many tools on the market to help data scientists to implement end-to-end Machine Learning workloads.SageMaker offers the possibility to run these workloads fully end-to-end on its own ecosystem as it has been designed to solve some of the common challenges t...

Paolo Di Francesco

End-to-end lineage with DVC and Amazon SageMaker AI MLflow apps

Share on

You May Also Enjoy

Simplify ModelOps with Amazon SageMaker AI Projects using Amazon S3-based templates

Track LLM model evaluation using Amazon SageMaker managed MLflow and FMEval

Secure MLflow in AWS Fine-grained access control with AWS native services