项目作者: criteo

项目描述 :
A fromconfig Launcher for MlFlow
高级语言: Python
项目地址: git://github.com/criteo/fromconfig-mlflow.git
创建时间: 2021-04-22T19:19:49Z
项目社区:https://github.com/criteo/fromconfig-mlflow

开源协议:

下载


FromConfig MlFlow

pypi
ci

A fromconfig Launcher for MlFlow support.

Install

  1. pip install fromconfig_mlflow

Quickstart

To activate MlFlow login, simply add --launcher.log=mlflow to your command

  1. fromconfig config.yaml params.yaml --launcher.log=mlflow - model - train

With

model.py

  1. """Dummy Model."""
  2. import mlflow
  3. class Model:
  4. def __init__(self, learning_rate: float):
  5. self.learning_rate = learning_rate
  6. def train(self):
  7. print(f"Training model with learning_rate {self.learning_rate}")
  8. if mlflow.active_run():
  9. mlflow.log_metric("learning_rate", self.learning_rate)

config.yaml

  1. model:
  2. _attr_: model.Model
  3. learning_rate: "${params.learning_rate}"

params.yaml

  1. params:
  2. learning_rate: 0.001

It should print

  1. Started run: http://127.0.0.1:5000/experiments/0/runs/7fe650dd99574784aec1e4b18fceb73f
  2. Training model with learning_rate 0.001

If you navigate to http://127.0.0.1:5000/experiments/0/runs/7fe650dd99574784aec1e4b18fceb73f you should see your the logged learning_rate metric.

MlFlow server

To setup a local MlFlow tracking server, run

  1. mlflow server

which should print

  1. [INFO] Starting gunicorn 20.0.4
  2. [INFO] Listening at: http://127.0.0.1:5000

We will assume that the tracking URI is http://127.0.0.1:5000 from now on.

Configure MlFlow

You can set the tracking URI either via an environment variable or via the config.

To set the MLFLOW_TRACKING_URI environment variable

  1. export MLFLOW_TRACKING_URI=http://127.0.0.1:5000

Alternatively, you can set the mlflow.tracking_uri config key either via command line with

  1. fromconfig config.yaml params.yaml --launcher.log=mlflow --mlflow.tracking_uri="http://127.0.0.1:5000" - model - train

or in a config file with

launcher.yaml

  1. # Configure mlflow
  2. mlflow:
  3. # tracking_uri: "http://127.0.0.1:5000" # Or set env variable MLFLOW_TRACKING_URI
  4. # experiment_name: "test-experiment" # Which experiment to use
  5. # run_id: 12345 # To restore a previous run
  6. # run_name: test # To give a name to your new run
  7. # artifact_location: "path/to/artifacts" # Used only when creating a new experiment
  8. # Configure launcher
  9. launcher:
  10. log: mlflow

and run

  1. fromconfig config.yaml params.yaml launcher.yaml - model - train

Artifacts and Parameters

In this example, we add logging of the config and parameters.

Re-using the quickstart code, modify the launcher.yaml file

  1. # Configure logging
  2. logging:
  3. level: 20
  4. # Configure mlflow
  5. mlflow:
  6. # tracking_uri: "http://127.0.0.1:5000" # Or set env variable MLFLOW_TRACKING_URI
  7. # experiment_name: "test-experiment" # Which experiment to use
  8. # run_id: 12345 # To restore a previous run
  9. # run_name: test # To give a name to your new run
  10. # artifact_location: "path/to/artifacts" # Used only when creating a new experiment
  11. # include_keys: # Only log params that match *model*
  12. # - model
  13. # Configure launcher
  14. launcher:
  15. log:
  16. - logging
  17. - mlflow
  18. parse:
  19. - mlflow.log_artifacts
  20. - parser
  21. - mlflow.log_params

and run

  1. fromconfig config.yaml params.yaml launcher.yaml - model - train

which prints

  1. INFO:fromconfig_mlflow.launcher:Started run: http://127.0.0.1:5000/experiments/0/runs/<MLFLOW_RUN_ID>
  2. Training model with learning_rate 0.001

If you navigate to the MlFlow run URL, you should see

  • the parameters, a flattened version of the parsed config (model.learning_rate is 0.001 and not ${params.learning_rate})
  • the original config, saved as config.yaml
  • the parsed config, saved as parsed.yaml

Usage-Reference

StartRunLauncher

To configure MlFlow, add a mlflow entry to your config and set the following parameters

  • run_id: if you wish to restart an existing run
  • run_name: if you wish to give a name to your new run
  • tracking_uri: to configure the tracking remote
  • experiment_name: to use a different experiment than the custom
    experiment
  • artifact_location: the location of the artifacts (config files)

Additionally, the launcher can be initialized with the following attributes

  • set_env_vars: if True (default is True), set MLFLOW_RUN_ID and MLFLOW_TRACKING_URI
  • set_run_id: if True (default is False), set mlflow.run_id in config.

For example,

  1. # Configure logging
  2. logging:
  3. level: 20
  4. # Configure mlflow
  5. mlflow:
  6. # tracking_uri: "http://127.0.0.1:5000" # Or set env variable MLFLOW_TRACKING_URI
  7. # experiment_name: "test-experiment" # Which experiment to use
  8. # run_id: 12345 # To restore a previous run
  9. # run_name: test # To give a name to your new run
  10. # artifact_location: "path/to/artifacts" # Used only when creating a new experiment
  11. # Configure Launcher
  12. launcher:
  13. log:
  14. - logging
  15. - _attr_: mlflow
  16. set_env_vars: true
  17. set_run_id: true

LogArtifactsLauncher

The launcher can be initialized with the following attributes

  • path_command: Name for the command file. If None, don’t log the command.
  • path_config: Name for the config file. If None, don’t log the config.

For example,

  1. # Configure logging
  2. logging:
  3. level: 20
  4. # Configure mlflow
  5. mlflow:
  6. # tracking_uri: "http://127.0.0.1:5000" # Or set env variable MLFLOW_TRACKING_URI
  7. # experiment_name: "test-experiment" # Which experiment to use
  8. # run_id: 12345 # To restore a previous run
  9. # run_name: test # To give a name to your new run
  10. # artifact_location: "path/to/artifacts" # Used only when creating a new experiment
  11. # Configure launcher
  12. launcher:
  13. log:
  14. - logging
  15. - mlflow
  16. parse:
  17. - _attr_: mlflow.log_artifacts
  18. path_command: launch.sh
  19. path_config: config.yaml
  20. - parser
  21. - _attr_: mlflow.log_artifacts
  22. path_command: null
  23. path_config: parsed.yaml

LogParamsLauncher

The launcher will use include_keys and ignore_keys if present in the config in the mlflow key.

  • ignore_keys : If given, don’t log some parameters that have some substrings.
  • include_keys : If given, only log some parameters that have some substrings. Also shorten the flattened parameter to start at the first match. For example, if the config is {"foo": {"bar": 1}} and include_keys=("bar",), then the logged parameter will be "bar".

For example,

  1. # Configure logging
  2. logging:
  3. level: 20
  4. # Configure mlflow
  5. mlflow:
  6. # tracking_uri: "http://127.0.0.1:5000" # Or set env variable MLFLOW_TRACKING_URI
  7. # experiment_name: "test-experiment" # Which experiment to use
  8. # run_id: 12345 # To restore a previous run
  9. # run_name: test # To give a name to your new run
  10. # artifact_location: "path/to/artifacts" # Used only when creating a new experiment
  11. include_keys: # Only log params that match *model*
  12. - model
  13. # Configure launcher
  14. launcher:
  15. log:
  16. - logging
  17. - mlflow
  18. parse:
  19. - parser
  20. - mlflow.log_params