ML Lineage

What is it?

Artificial intelligence (AI) has reached technological maturity, and its applications are now becoming pervasive across diverse industrial sectors and society as a whole. Simultaneously, the demands from public authorities have become increasingly complex and stringent for sociotechnical services that utilize AI in decision-making. The concept of ML lineage is a framework to holistically capture and connect the required information about ML model development and operations. 

Why is it necessary?

Information related to ML-based systems within production environments throughout the lifecycle journey is needed to ensure and monitor business value and trustworthiness as well as comply broadly with regulations, such as the EU’s AI Act. However, the emphasis on AI regulation, governance, and ethics revolves around high-level concepts, requirements, and individual practices. In contrast, MLOps focuses on pipelines to produce and maintain ML model artifacts.   

The advantage of ML lineage is that it enhances end-to-end accountability, transparency, and evidence for ML-based systems, thereby increasing business value and trustworthiness through thorough lifecycle documentation. ML lineage engages stakeholders at various organizational levels and roles, clarifying accountability, enabling clearer assignment of responsibilities, and facilitating interaction touchpoints. For example, developers become better informed about the business impact, while the quality assurance team gains better oversight of technical details.

How does it work?

ML lineage fundamentally distinguishes between the model and prediction levels, conceptually encompassing separate yet interconnected core domains for the project, experiment, model, and prediction. ML lineage easily integrates with existing MLOps pipelines, workflows, and tools, often requiring minimal additional effort, such as generating model cards or integrating with existing pipelines.

This image has no alt text.
Mikko Raatikainen