github.com/pachyderm/pachyderm@v1.13.4/doc/docs/1.9.x/how-tos/create-ml-workflow.md (about) 1 # Create a Machine Learning Workflow 2 3 Because Pachyderm is a language and framework agnostic and 4 platform, and because it easily distributes analysis over 5 large data sets, data scientists can use any tooling for 6 creating machine learning workflows. Even if that tooling 7 is not familiar to the rest of an engineering organization, 8 data scientists can autonomously develop and deploy scalable 9 solutions by using containers. Moreover, Pachyderm’s 10 pipeline logic paired with data versioning make any results 11 reproducible for debugging purposes or during the development of 12 improvements to a model. 13 14 For maximum leverage of Pachyderm's built functionality, Pachyderm 15 recommends that you combine model training processes, persisted models, 16 and model utilization processes, such as making inferences or 17 generating results, into a single Pachyderm pipeline Directed Acyclic Graph 18 (DAG). 19 20 Such a pipeline enables you to achieve the following goals: 21 22 - Keep a rigorous historical record of which models were used 23 on what data to produce which results. 24 - Automatically update online ML models when training data or 25 parameterization changes. 26 - Easily revert to other versions of an ML model when a new model 27 does not produce an expected result or when *bad data* is 28 introduced into a training data set. 29 30 The following diagram demonstrates an ML pipeline: 31 32  33 34 You can update the training dataset at any time 35 to automatically train a new persisted model. Also, you can use 36 any language or framework, including Apache Spark™, Tensorflow™, 37 scikit-learn™, or other, and output any format of persisted model, 38 such as pickle, XML, POJO, or other. Regardless of the framework, 39 Pachyderm versions the model so that you can track the data that 40 was used to train each model. 41 42 Pachyderm processes new data coming into the input repository with the 43 updated model. Also, you can recompute old predictions with the updated model, 44 or test new models on previously input and versioned data. This feature 45 enables you to avoid manual updates to historical results or swapping 46 ML models in production. 47 48 For examples of ML workflows in Pachyderm see 49 [Machine Learning Examples](../examples/examples.md#machine-learning).