Triton Model Inference and Management
Workflow capability for deploying, managing, and running inference against machine learning models on NVIDIA Triton Inference Server. Enables model lifecycle management including loading, health checks, inference execution, statistics monitoring, and observability configuration.
What You Can Do
MCP Tools
server-live
Check if Triton inference server is alive
server-ready
Check if Triton server is ready to accept inference requests
server-metadata
Get Triton server name, version, and supported extensions
list-models
List all models available in the Triton model repository
model-metadata
Get metadata for a specific model including input/output tensor shapes
model-config
Get the full configuration for a specific model
model-ready
Check if a specific model is ready to accept inference requests
model-infer
Run inference against a loaded model with input tensors
model-load
Load or reload a model from the repository into Triton
model-unload
Unload a model from Triton to free resources
model-statistics
Get inference statistics for a specific model
all-model-statistics
Get inference statistics for all loaded models
get-trace-settings
Get current global request tracing configuration
update-trace-settings
Update request tracing levels and sampling rate
get-log-settings
Get current server logging configuration
update-log-settings
Update server logging level and format