Scalable Inference Serving - Model Inference Operations
Workflow capability for ML engineers and data scientists performing model inference operations, health monitoring, and metadata inspection against OIP-compliant inference servers. Imports the KServe Open Inference Protocol shared definition and exposes a unified workflow-oriented API and MCP server for AI-assisted inference workflows.
What You Can Do
MCP Tools
check-server-liveness
Check if the KServe inference server is live and able to receive requests
check-server-readiness
Check if all models are loaded and the inference server is ready
get-server-metadata
Get inference server name, version, and supported protocol extensions
check-model-readiness
Check if a specific model is ready for inference
get-model-metadata
Get model input/output tensor specifications, available versions, and serving platform
run-inference
Submit input tensors to a deployed model and receive inference output tensors. Use get-model-metadata first to discover the correct input names, shapes, and datatypes.
run-model-version-inference
Run inference against a pinned model version for A/B testing, canary evaluation, or version-specific integration