Francisco Jiménez Cabrera
Francisco Jiménez Cabrera I have always been interested in computer and technology scene. Currently, I spend my days working as a web developer at Canonical.

Validating JSON Outputs in MLflow with Custom Metrics

Validating JSON Outputs in MLflow with Custom Metrics

Welcome to a new post on my blog. We’ll delve into an interesting aspect of working with MLflow – adding a custom metric to validate JSON outputs. This tutorial is particularly useful for developers and data scientists looking to ensure the integrity and structure of their model’s outputs when JSON is expected.

Why Validate JSON Outputs?

In the world of machine learning and data science, models often need to output data in structured formats, JSON being one of the most popular due to its versatility and wide adoption in web services and applications. Ensuring your model reliably produces valid JSON responses is crucial, especially in production environments where data consistency and integrity are paramount.

Introducing Custom Metrics in MLflow

MLflow, an open-source platform for the machine learning lifecycle, includes capabilities for tracking experiments, packaging code into reproducible runs, and managing models. However, it might not natively support specific validation checks like verifying JSON output. This is where custom metrics come into play.

Step-by-Step Guide to Creating a JSON Validity Metric

Below is a detailed walkthrough on how to implement a custom metric in MLflow for checking JSON validity. This script demonstrates adding such a metric and using it to evaluate a model’s predictions.

1. Define the JSON Validity Evaluation Function

First, we define an evaluation function that checks if a given string is valid JSON. This function iterates through each model prediction, validating each one and appending the result (1 for valid, 0 for invalid) to a list of scores.

1
2
3
4
5
6
7
8
9
def _json_validity_eval_fn(outputs, references):
    validity_scores = []
    for _, row in outputs.iterrows():
        prediction = row["prediction"]
        if _is_valid_json(prediction):
            validity_scores.append(1)
        else:
            validity_scores.append(0)
    return MetricValue(validity_scores)
2. Implement a Helper Function to Check JSON Validity

A helper function uses Python’s json.loads method to determine if a string is a valid JSON. It returns True for valid JSON strings and False otherwise.

1
2
3
4
5
6
def _is_valid_json(s):
    try:
        json.loads(s)
        return True
    except ValueError:
        return False
3. Create the Custom Metric

We then wrap our evaluation function in a custom metric definition using MLflow’s make_metric function, specifying our evaluation function, whether a higher score is better, and a name for the metric.

1
2
3
4
5
6
def json_validity() -> EvaluationMetric:
    return make_metric(
        eval_fn=_json_validity_eval_fn,
        greater_is_better=True,
        name="json_validity",
    )
4. Evaluate the Model

With the custom metric defined, we can now use it to evaluate a model’s output. In this example, we use a remote tracking server plus MLflow deployments, feel free to adapt this to your needs, the DataFrame in my example are two inputs designed to produce JSON outputs and invoke mlflow.evaluate with our custom metric.

1
2
3
4
5
6
7
8
9
10
11
12
# Point the client to the local MLflow Deployments Server and set tracking URI
set_deployments_target("http://localhost:7000")
mlflow.set_tracking_uri("http://localhost:5000")

# Evaluate the model with the custom metric
with mlflow.start_run() as run:
    results = mlflow.evaluate(
        model="endpoints:/chatgpt-35-turbo",
        data=eval_data,
        inference_params={"max_tokens": 100, "temperature": 0.0},
        extra_metrics=[json_validity()],
    )
Final code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
import json
import mlflow
import pandas as pd
from mlflow.deployments import set_deployments_target
from mlflow.metrics.base import MetricValue
from mlflow.models import EvaluationMetric, make_metric


def _json_validity_eval_fn(outputs, references):
    # Initialize a list to store validity scores
    validity_scores = []

    # Iterate over each row in the DataFrame
    for _, row in outputs.iterrows():
        # Extract the prediction from the current row
        prediction = row["prediction"]

        # Check if the prediction is a valid JSON
        if _is_valid_json(prediction):
            validity_scores.append(1)  # Valid JSON
        else:
            validity_scores.append(0)  # Invalid JSON

    # Return a MetricValue object with the scores
    return MetricValue(validity_scores)


def _is_valid_json(s):
    """
    Helper function to check if a string is a valid JSON.
    """
    try:
        json.loads(s)
        return True
    # json.decoder.JSONDecodeError inherits from ValueError
    except ValueError:
        return False


def json_validity() -> EvaluationMetric:
    """
    Creates a metric for evaluating the validity of JSON strings produced by a model.
    """
    return make_metric(
        eval_fn=_json_validity_eval_fn,
        greater_is_better=True,
        name="json_validity",
    )


# Point the client to the local MLflow Deployments Server
set_deployments_target("http://localhost:7000")
mlflow.set_tracking_uri("http://localhost:5000")

# Create a test case of inputs that will be passed into the model and ground_truth
eval_data = pd.DataFrame(
    {
        "inputs": [
            'Convert the following description into a JSON object: MLflow is an open source platform for the machine learning lifecycle, including experimentation, reproducibility, and deployment. Structure the JSON with keys for "name", "description", and "features".',
            "Provide a brief explanation of what Apache Spark is.",
        ]
    }
)

with mlflow.start_run() as run:
    results = mlflow.evaluate(
        model="endpoints:/chatgpt-35-turbo",
        data=eval_data,
        inference_params={"max_tokens": 100, "temperature": 0.0},
        # model_type="question-answering",
        # Include the custom metric for JSON validation
        extra_metrics=[json_validity()],
    )

    # Print aggregated evaluation results
    print(f"Aggregated evaluation results: \n{results.metrics}")

    # Evaluation result for each data record is available in results.tables
    eval_table = results.tables["eval_results_table"]
    print(f"Evaluation table: \n{eval_table}")

Conclusion

Adding custom metrics to MLflow allows for flexible and precise evaluation of your models, tailored to your specific needs. By validating JSON outputs, you ensure that your model meets the requirements for structured data output, enhancing its reliability and applicability in real-world scenarios.

I hope this tutorial has been helpful. As always, I encourage you to experiment with this script and adapt it to your projects. Happy coding!

Enjoyed this article? Support my work!

If you found this content helpful, consider buying me a coffee to show your appreciation.

Buy me a coffee