How does it work?
The key technologies are HTTP, Containers, JSON, Python, Make, FastAPI and JSON Schema.
The assumptions are
- Request payload is presented as JSON in the HTTP request body.
- For each model, the user supplies
- a JSON schema for prediction and scoring inputs,
- a couple of key methods that are missing in the Predictor base class, and
- test inputs for predict and score to honestly test the model availability.
show more As the HTTP-framework that serves the API is FastAPI, a Swagger documentation of the API is also served up to the user-defined JSON schema.
The user derives a custom class from the Predictor class to wrap the model, e.g.:
class LogReg(Predictor):
def convert_prediction_input(self, x):
return np.asfarray(x)
def convert_score_input(self, in_object):
X = [item[“x”] for item in in_object]
y = [item[“y”] for item in in_object]
return np.asfarray(X), np.array(y)
def run_scores(self, X, y):
return self.model.score(X, y)
def run_predict(self, X):
return self.model.predict(X)
def convert_output(self, res):
return res.tolist()
The services of the API enable monitoring of the models, queries to support their use and calls to the model prediction. The services are
- health services such as: livez, healthz, readyz, score, variants of the above
- catalogues
- list the models being served
- version of each model
- created tells the creation time
- predict_schema and score_schema return the JSON schemas for prediction and scoring input
- predict – of course.
More information:
ml-py-stevedore at GitHub
show less