Base MLflow API

pydantic model mlopus.mlflow.BaseMlflowApi[source]

Bases: MlflowApiContract, ABC

Base class for API clients that use “MLflow-like” backends for experiment tracking and model registry.

Important

Implementations of this interface are meant to be thread-safe and independent of env vars/globals, so multiple API instances can coexist in the same program if necessary.

field always_pull_artifacts: bool = False

When accessing a cached artifact file or dir, re-sync it with the remote artifacts repository, even on a cache hit. Prevents accessing stale data if the remote artifact has been changed in the meantime. The default data transfer utility (based on rclone) is pretty efficient for syncing directories, but enabling this option may still add some overhead of calculating checksums if they contain many files.

field cache_dir: Optional[Path] = None

Root path for cached artifacts and metadata. If not specified, then a default is determined by the respective API plugin.

field cache_local_artifacts: bool = False

Use local cache even if the run artifacts repository is in the local file system. May be used for testing cache without connecting to a remote MLflow server.Not recommended in production because of unecessary duplicated disk usage.

field entity_serializer: EntitySerializer [Optional]

Utility for (de)serializing entity metadata (i.e.: exp, runs, models, versions).Users may replace this with a different implementation when subclassing the API.

field file_transfer: FileTransfer [Optional]

Utility for uploading/downloading artifact files or dirs. Also used for listing files. Based on RClone by default. Users may replace this with a different implementation when subclassing the API.

field offline_mode: bool = False

If True, block all operations that attempt communication with the MLflow server (i.e.: only use cached metadata). Artifacts are still accessible if they are cached or if pull_artifacts_in_offline_mode is True.

field pull_artifacts_in_offline_mode: bool = False

If True, allow pulling artifacts from storage to cache in offline mode. Useful if caching metadata only and pulling artifacts on demand (the artifact’s URL must be known beforehand, e.g. by caching the metadata of its parent entity).

field temp_artifacts_dir: Path = None

Path for temporary artifacts that are stored by artifact dumpers before being published and preserved after a publish error (e.g.: an upload interruption). Defaults to a path inside the local cache.

cache_exp_meta(exp)[source]

Get latest Experiment metadata and save to local cache.

Parameters:

exp (Experiment | str | ExpApi) – Experiment ID or object.

Return type:

ExpApi

cache_model_artifact(model_version)[source]

Pull model version artifact from MLflow server to local cache.

Parameters:

model_version (Union[ModelVersion, Tuple[str, str], ModelVersionApi]) – Model version object or (name, version) tuple.

Return type:

Path

cache_model_meta(model)[source]

Get latest Model metadata and save to local cache.

Parameters:

model (Model | str | ModelApi) – Model name or object.

Return type:

ModelApi

cache_model_version_meta(model_version)[source]

Get latest model version metadata and save to local cache.

Parameters:

model_version (Union[ModelVersion, Tuple[str, str], ModelVersionApi]) – Model version object or (name, version) tuple.

Return type:

ModelVersionApi

cache_run_artifact(run, path_in_run='')[source]

Pull run artifact from MLflow server to local cache.

Parameters:
Return type:

Path

cache_run_meta(run)[source]

Get latest Run metadata and save to local cache.

Parameters:

run (Run | str | RunApi) – Run ID or object.

Return type:

RunApi

clean_all_cache()[source]

Clean all cached metadata and artifacts.

clean_cached_model_artifact(model_version)[source]

Clean cached artifact for specified model version.

Parameters:

model_version (Union[ModelVersion, Tuple[str, str], ModelVersionApi]) – Model version object or (name, version) tuple.

clean_cached_run_artifact(run, path_in_run='')[source]

Clean cached artifact for specified run.

Parameters:
clean_temp_artifacts()[source]

Clean temporary artifacts.

create_exp(name, tags=None)[source]

Create Experiment and return its API.

Parameters:
Return type:

ExpApi

create_model(name, tags=None)[source]

Create registered model and return its API.

Parameters:
Return type:

ModelApi

create_run(exp, name=None, tags=None, repo=None, parent=None)[source]

Declare a new experiment run to be used later.

Parameters:
Return type:

RunApi

end_run(run, succeeded=True)[source]

End experiment run.

Parameters:
Return type:

RunApi

export_exp_meta(exp, target)[source]

Export experiment metadata cache to target.

Parameters:
Return type:

ExpApi

export_model_artifact(model_version, target)[source]

Export model version artifact cache to target path while keeping the original cache structure.

The target path can then be used as cache dir by the generic MLflow API in offline mode.

Parameters:
Return type:

Path

export_model_meta(model, target)[source]

Export model metadata cache to target.

Parameters:
Return type:

ModelApi

export_model_version_meta(mv, target)[source]

Export model version metadata cache to target.

Parameters:
Return type:

ModelVersionApi

export_run_artifact(run, target, path_in_run='')[source]

Export run artifact cache to target path while keeping the original cache structure.

The target path can then be used as cache dir by the generic MLflow API in offline mode.

Parameters:
Return type:

Path

export_run_meta(run, target)[source]

Export run metadata cache to target.

Parameters:
Return type:

RunApi

find_child_runs(parent)[source]

Find child runs.

Parameters:

parent (Run | str | RunApi) – Run ID or object.

Return type:

Iterator[RunApi]

find_exps(query=None, sorting=None)[source]

Search experiments with query in MongoDB query language.

Parameters:
  • query (Optional[Dict[str, Any]]) – Query in MongoDB query language.

  • sorting (Optional[List[Tuple[str, Literal[1, -1]]]]) – Sorting criteria (e.g.: [(“asc_field”, 1), (“desc_field”, -1)]).

Return type:

Iterator[ExpApi]

find_model_versions(query=None, sorting=None)[source]

Search model versions with query in MongoDB query language.

Parameters:
  • query (Optional[Dict[str, Any]]) – Query in MongoDB query language.

  • sorting (Optional[List[Tuple[str, Literal[1, -1]]]]) – Sorting criteria (e.g.: [(“asc_field”, 1), (“desc_field”, -1)]).

Return type:

Iterator[ModelVersionApi]

find_models(query=None, sorting=None)[source]

Search registered models with query in MongoDB query language.

Parameters:
  • query (Optional[Dict[str, Any]]) – Query in MongoDB query language.

  • sorting (Optional[List[Tuple[str, Literal[1, -1]]]]) – Sorting criteria (e.g.: [(“asc_field”, 1), (“desc_field”, -1)]).

Return type:

Iterator[ModelApi]

find_runs(query=None, sorting=None)[source]

Search runs with query in MongoDB query language.

Parameters:
  • query (Optional[Dict[str, Any]]) – Query in MongoDB query language.

  • sorting (Optional[List[Tuple[str, Literal[1, -1]]]]) – Sorting criteria (e.g.: [(“asc_field”, 1), (“desc_field”, -1)]).

Return type:

Iterator[RunApi]

get_exp(exp, **cache_opts)[source]

Get Experiment API by ID.

Parameters:

exp (Experiment | str | ExpApi) – Exp ID or object.

Return type:

ExpApi

get_exp_url(exp)[source]

Get Experiment URL.

Parameters:

exp (Experiment | str | ExpApi) – Exp ID or object.

Return type:

str

get_model(model, **cache_opts)[source]

Get Model API by name.

Parameters:

model (Model | str | ModelApi) – Model name or object.

Return type:

ModelApi

get_model_artifact(model_version)[source]

Get local path to model artifact.

Triggers a cache pull on a cache miss or if always_pull_artifacts.

Parameters:

model_version (Union[ModelVersion, Tuple[str, str], ModelVersionApi]) – Model version object or (name, version) tuple.

Return type:

Path

get_model_url(model)[source]

Get URL to registered model.

Parameters:

model (Model | str | ModelApi) – Model name or object.

Return type:

str

get_model_version(model_version, **cache_opts)[source]

Get ModelVersion API by name and version.

Parameters:

model_version (Union[ModelVersion, Tuple[str, str], ModelVersionApi]) – Model version object or (name, version) tuple.

Return type:

ModelVersionApi

get_model_version_url(model_version)[source]

Get model version URL.

Parameters:

model_version (Union[ModelVersion, Tuple[str, str], ModelVersionApi]) – Model version object or (name, version) tuple.

Return type:

str

get_or_create_exp(name)[source]

Get or create Experiment and return its API.

Parameters:

name (str) – See schema.Experiment.name.

Return type:

ExpApi

get_or_create_model(name)[source]

Get or create registered Model and return its API.

Parameters:

name (str) – See schema.Model.name.

Return type:

ModelApi

get_run(run, **cache_opts)[source]

Get Run API by ID.

Parameters:

run (Run | str | RunApi) – Run ID or object.

Return type:

RunApi

get_run_artifact(run, path_in_run='')[source]

Get local path to run artifact.

Triggers a cache pull on a cache miss or if always_pull_artifacts.

Parameters:
Return type:

Path

get_run_url(run, exp=None)[source]

Get Run URL.

Parameters:
Return type:

str

Caveats:
  • exp must be specified on offline_mode if run is an ID and the run metadata is not in cache.

list_model_artifact(model_version, path_suffix='')[source]

List model version artifacts in repo.

Parameters:
Return type:

Union[List[ObjMeta], ObjMeta]

list_run_artifacts(run, path_in_run='')[source]

List run artifacts in repo.

Parameters:
Return type:

Union[List[ObjMeta], ObjMeta]

load_model_artifact(model_version, loader)[source]

Load model version artifact.

Triggers a cache pull on a cache miss or if always_pull_artifacts.

Parameters:
Return type:

TypeVar(A)

load_run_artifact(run, loader, path_in_run='')[source]

Load run artifact.

Triggers a cache pull on a cache miss or if always_pull_artifacts.

Parameters:
  • run (Run | str | RunApi) – Run ID or object.

  • loader (Callable[[Path], TypeVar(A)]) – Loader callback.

  • path_in_run (str) – Plain relative path inside run artifacts (e.g.: a/b/c)

Return type:

TypeVar(A)

log_metrics(run, metrics)[source]

Log metrics to experiment run.

Parameters:
log_model_version(model, run, source, path_in_run=None, keep_the_source=None, allow_duplication=None, use_cache=None, version=None, tags=None)[source]

Publish artifact file or dir as model version inside the specified experiment run.

Parameters:
Return type:

ModelVersionApi

Returns:

New model version metadata with API handle.

log_params(run, params)[source]

Log params to experiment run.

Parameters:
log_run_artifact(run, source, path_in_run=None, keep_the_source=None, allow_duplication=None, use_cache=None)[source]

Publish artifact file or dir to experiment run.

The flags keep_the_source, allow_duplication and use_cache are experimental and may conflict with one another. It is recommended to leave them unspecified, so this method will do a best-effort to use cache if it makes sense to, keep the source files if it makes sense to (possibly as a symbolic link) and avoid duplicated disk usage when possible.

Parameters:
  • run (Run | str | RunApi) –

    Run ID or object.

  • source (Union[Path, Callable[[Path], None]]) –

    Path to artifact file or dir, or a dumper callback.
    If it’s a callback and the upload is interrupted, the temporary artifact is kept.

  • path_in_run (Optional[str]) –

    Plain relative path inside run artifacts (e.g.: a/b/c)

    • If source is a Path: Defaults to file or dir name.

    • If source is a callback: No default available.

  • keep_the_source (Optional[bool]) –

    • If source is a Path: Keep that file or dir (defaults to True).

    • If source is a callback: Keep the temporary artifact, even after a successful upload (defaults to False).

  • allow_duplication (Optional[bool]) –

    If False, a source file or dir may be replaced with a symbolic link to the local cache in order to avoid duplicated disk usage.
    Defaults to True if keep_the_source is True and the run artifacts repo is local.

  • use_cache (Optional[bool]) –

    If True, keep artifact in local cache after publishing.
    Defaults to True if the run artifacts repo is remote.

place_model_artifact(model_version, target, overwrite=False, link=True)[source]

Place model version artifact on target path.

Triggers a cache pull on a cache miss or if always_pull_artifacts.

Parameters:
place_run_artifact(run, target, path_in_run='', overwrite=False, link=True)[source]

Place run artifact on target path.

Triggers a cache pull on a cache miss or if always_pull_artifacts. The resulting files are always write-protected, but directories are not.

Parameters:
  • run (Run | str | RunApi) – Run ID or object.

  • target (Path) – Target path.

  • path_in_run (str) – Plain relative path inside run artifacts (e.g.: a/b/c)

  • overwrite (bool) – Overwrite target path if exists.

  • link (bool) – Use symbolic link instead of copy.

resume_run(run)[source]

Resume a previous experiment run.

Parameters:

run (Run | str | RunApi) – Run ID or object.

Return type:

RunApi

set_tags_on_exp(exp, tags)[source]

Set tags on experiment.

Parameters:
set_tags_on_model(model, tags)[source]

Set tags on registered model.

Parameters:
set_tags_on_model_version(model_version, tags)[source]

Set tags on model version.

Parameters:
set_tags_on_run(run, tags)[source]

Set tags on experiment run.

Parameters:
start_run(exp, name=None, tags=None, repo=None, parent=None)[source]

Start a new experiment run.

Parameters:
Return type:

RunApi

property in_offline_mode: BaseMlflowApi

Get an offline copy of this API.

pydantic model mlopus.mlflow.api.common.transfer.FileTransfer[source]

Bases: BaseModel

File transfer wrapper for MLflow API.

field prog_bar: bool = True

Show progress bar when transfering files.

field tool: Any = 'rclone_python.rclone'

Fully qualified path of module, class or object that exposes the methods/functions ls, copyto and sync, with signatures compatible with the ones exposed in rclone_python.rclone.

field extra_args: dict[str, list[str]] = {'sync': ['--copy-links']}

Dict of extra arguments to pass to each of the functions exposed by the tool.

field use_scheme: Optional[str] = None

Replace remote URL schemes with this one. Incompatible with map_scheme.

field map_scheme: Optional[dict[str | Pattern, str]] = None

Replace remote URL schemes with the first value in this mapping whose key (regexp) matches the URL. Incompatible with use_scheme.