Triton Inference Server · Capability

Triton Model Inference and Management

Workflow capability for deploying, managing, and running inference against machine learning models on NVIDIA Triton Inference Server. Enables model lifecycle management including loading, health checks, inference execution, statistics monitoring, and observability configuration.

Run with Naftiko AIDeep LearningInferenceModel ServingMachine LearningNVIDIAKServe

What You Can Do

GET
Server live — Check if Triton server is alive
/v1/health
GET
Server metadata — Get Triton server name, version, and extensions
/v1/server
GET
List models — List all models available in the repository
/v1/models
GET
Model metadata — Get model metadata including tensor definitions
/v1/models/{model_name}
GET
Model config — Get full model configuration
/v1/models/{model_name}/config
POST
Model infer — Submit an inference request to a model
/v1/models/{model_name}/infer
POST
Model load — Load or reload a model from the repository
/v1/models/{model_name}/load
POST
Model unload — Unload a model from Triton
/v1/models/{model_name}/unload
GET
Model statistics — Get inference statistics for a specific model
/v1/models/{model_name}/stats
GET
All model statistics — Get inference statistics for all loaded models
/v1/stats
GET
Get trace settings — Get current global trace settings
/v1/trace
POST
Update trace settings — Update request tracing configuration
/v1/trace
GET
Get log settings — Get current logging settings
/v1/logging
POST
Update log settings — Update server logging configuration
/v1/logging

MCP Tools

server-live

Check if Triton inference server is alive

read-only
server-ready

Check if Triton server is ready to accept inference requests

read-only
server-metadata

Get Triton server name, version, and supported extensions

read-only
list-models

List all models available in the Triton model repository

read-only
model-metadata

Get metadata for a specific model including input/output tensor shapes

read-only
model-config

Get the full configuration for a specific model

read-only
model-ready

Check if a specific model is ready to accept inference requests

read-only
model-infer

Run inference against a loaded model with input tensors

model-load

Load or reload a model from the repository into Triton

idempotent
model-unload

Unload a model from Triton to free resources

idempotent
model-statistics

Get inference statistics for a specific model

read-only
all-model-statistics

Get inference statistics for all loaded models

read-only
get-trace-settings

Get current global request tracing configuration

read-only
update-trace-settings

Update request tracing levels and sampling rate

idempotent
get-log-settings

Get current server logging configuration

read-only
update-log-settings

Update server logging level and format

idempotent

APIs Used

triton

Capability Spec

Raw ↑
naftiko: "1.0.0-alpha1"

info:
  label: "Triton Model Inference and Management"
  description: >-
    Workflow capability for deploying, managing, and running inference against machine
    learning models on NVIDIA Triton Inference Server. Enables model lifecycle management
    including loading, health checks, inference execution, statistics monitoring, and
    observability configuration.
  tags:
    - AI
    - Deep Learning
    - Inference
    - Model Serving
    - Machine Learning
    - NVIDIA
    - KServe
  created: "2026-05-03"
  modified: "2026-05-03"

capability:
  consumes:
    - import: triton
      location: ./shared/triton-http-rest.yaml

  exposes:
    - type: rest
      port: 8080
      namespace: triton-inference-api
      description: "Unified REST API for Triton model lifecycle management and inference."
      resources:
        - path: /v1/health
          name: health
          description: "Server and model health status"
          operations:
            - method: GET
              name: server-live
              description: "Check if Triton server is alive"
              call: "triton.server-live"
              outputParameters:
                - type: object
                  mapping: "$."

        - path: /v1/server
          name: server
          description: "Server metadata and information"
          operations:
            - method: GET
              name: server-metadata
              description: "Get Triton server name, version, and extensions"
              call: "triton.server-metadata"
              outputParameters:
                - type: object
                  mapping: "$."

        - path: /v1/models
          name: models
          description: "Model repository and management"
          operations:
            - method: GET
              name: list-models
              description: "List all models available in the repository"
              call: "triton.repository-index"
              outputParameters:
                - type: object
                  mapping: "$."

        - path: /v1/models/{model_name}
          name: model
          description: "Individual model operations"
          operations:
            - method: GET
              name: model-metadata
              description: "Get model metadata including tensor definitions"
              call: "triton.model-metadata"
              with:
                model_name: "rest.model_name"
              outputParameters:
                - type: object
                  mapping: "$."

        - path: /v1/models/{model_name}/config
          name: model-config
          description: "Model configuration"
          operations:
            - method: GET
              name: model-config
              description: "Get full model configuration"
              call: "triton.model-config"
              with:
                model_name: "rest.model_name"
              outputParameters:
                - type: object
                  mapping: "$."

        - path: /v1/models/{model_name}/infer
          name: model-inference
          description: "Run model inference"
          operations:
            - method: POST
              name: model-infer
              description: "Submit an inference request to a model"
              call: "triton.model-infer"
              with:
                model_name: "rest.model_name"
              outputParameters:
                - type: object
                  mapping: "$."

        - path: /v1/models/{model_name}/load
          name: model-load
          description: "Load model into server"
          operations:
            - method: POST
              name: model-load
              description: "Load or reload a model from the repository"
              call: "triton.model-load"
              with:
                model_name: "rest.model_name"
              outputParameters:
                - type: object
                  mapping: "$."

        - path: /v1/models/{model_name}/unload
          name: model-unload
          description: "Unload model from server"
          operations:
            - method: POST
              name: model-unload
              description: "Unload a model from Triton"
              call: "triton.model-unload"
              with:
                model_name: "rest.model_name"
              outputParameters:
                - type: object
                  mapping: "$."

        - path: /v1/models/{model_name}/stats
          name: model-stats
          description: "Model inference statistics"
          operations:
            - method: GET
              name: model-statistics
              description: "Get inference statistics for a specific model"
              call: "triton.model-statistics"
              with:
                model_name: "rest.model_name"
              outputParameters:
                - type: object
                  mapping: "$."

        - path: /v1/stats
          name: all-stats
          description: "Statistics for all models"
          operations:
            - method: GET
              name: all-model-statistics
              description: "Get inference statistics for all loaded models"
              call: "triton.all-model-statistics"
              outputParameters:
                - type: object
                  mapping: "$."

        - path: /v1/trace
          name: trace
          description: "Trace configuration"
          operations:
            - method: GET
              name: get-trace-settings
              description: "Get current global trace settings"
              call: "triton.get-trace-setting"
              outputParameters:
                - type: object
                  mapping: "$."
            - method: POST
              name: update-trace-settings
              description: "Update request tracing configuration"
              call: "triton.update-trace-setting"
              outputParameters:
                - type: object
                  mapping: "$."

        - path: /v1/logging
          name: logging
          description: "Logging configuration"
          operations:
            - method: GET
              name: get-log-settings
              description: "Get current logging settings"
              call: "triton.get-log-settings"
              outputParameters:
                - type: object
                  mapping: "$."
            - method: POST
              name: update-log-settings
              description: "Update server logging configuration"
              call: "triton.update-log-settings"
              outputParameters:
                - type: object
                  mapping: "$."

    - type: mcp
      port: 9090
      namespace: triton-inference-mcp
      transport: http
      description: "MCP server for AI-assisted model deployment and inference management on Triton."
      tools:
        - name: server-live
          description: "Check if Triton inference server is alive"
          hints:
            readOnly: true
            openWorld: false
          call: "triton.server-live"
          outputParameters:
            - type: object
              mapping: "$."

        - name: server-ready
          description: "Check if Triton server is ready to accept inference requests"
          hints:
            readOnly: true
            openWorld: false
          call: "triton.server-ready"
          outputParameters:
            - type: object
              mapping: "$."

        - name: server-metadata
          description: "Get Triton server name, version, and supported extensions"
          hints:
            readOnly: true
            openWorld: false
          call: "triton.server-metadata"
          outputParameters:
            - type: object
              mapping: "$."

        - name: list-models
          description: "List all models available in the Triton model repository"
          hints:
            readOnly: true
            openWorld: true
          call: "triton.repository-index"
          with:
            ready_only: "tools.ready_only"
          outputParameters:
            - type: object
              mapping: "$."

        - name: model-metadata
          description: "Get metadata for a specific model including input/output tensor shapes"
          hints:
            readOnly: true
            openWorld: false
          call: "triton.model-metadata"
          with:
            model_name: "tools.model_name"
          outputParameters:
            - type: object
              mapping: "$."

        - name: model-config
          description: "Get the full configuration for a specific model"
          hints:
            readOnly: true
            openWorld: false
          call: "triton.model-config"
          with:
            model_name: "tools.model_name"
          outputParameters:
            - type: object
              mapping: "$."

        - name: model-ready
          description: "Check if a specific model is ready to accept inference requests"
          hints:
            readOnly: true
            openWorld: false
          call: "triton.model-ready"
          with:
            model_name: "tools.model_name"
          outputParameters:
            - type: object
              mapping: "$."

        - name: model-infer
          description: "Run inference against a loaded model with input tensors"
          hints:
            readOnly: false
            destructive: false
            idempotent: false
          call: "triton.model-infer"
          with:
            model_name: "tools.model_name"
            inputs: "tools.inputs"
            outputs: "tools.outputs"
          outputParameters:
            - type: object
              mapping: "$."

        - name: model-load
          description: "Load or reload a model from the repository into Triton"
          hints:
            readOnly: false
            destructive: false
            idempotent: true
          call: "triton.model-load"
          with:
            model_name: "tools.model_name"
          outputParameters:
            - type: object
              mapping: "$."

        - name: model-unload
          description: "Unload a model from Triton to free resources"
          hints:
            readOnly: false
            destructive: true
            idempotent: true
          call: "triton.model-unload"
          with:
            model_name: "tools.model_name"
          outputParameters:
            - type: object
              mapping: "$."

        - name: model-statistics
          description: "Get inference statistics for a specific model"
          hints:
            readOnly: true
            openWorld: false
          call: "triton.model-statistics"
          with:
            model_name: "tools.model_name"
          outputParameters:
            - type: object
              mapping: "$."

        - name: all-model-statistics
          description: "Get inference statistics for all loaded models"
          hints:
            readOnly: true
            openWorld: true
          call: "triton.all-model-statistics"
          outputParameters:
            - type: object
              mapping: "$."

        - name: get-trace-settings
          description: "Get current global request tracing configuration"
          hints:
            readOnly: true
            openWorld: false
          call: "triton.get-trace-setting"
          outputParameters:
            - type: object
              mapping: "$."

        - name: update-trace-settings
          description: "Update request tracing levels and sampling rate"
          hints:
            readOnly: false
            destructive: false
            idempotent: true
          call: "triton.update-trace-setting"
          with:
            trace_level: "tools.trace_level"
            trace_rate: "tools.trace_rate"
          outputParameters:
            - type: object
              mapping: "$."

        - name: get-log-settings
          description: "Get current server logging configuration"
          hints:
            readOnly: true
            openWorld: false
          call: "triton.get-log-settings"
          outputParameters:
            - type: object
              mapping: "$."

        - name: update-log-settings
          description: "Update server logging level and format"
          hints:
            readOnly: false
            destructive: false
            idempotent: true
          call: "triton.update-log-settings"
          with:
            log_info: "tools.log_info"
            log_verbose_level: "tools.log_verbose_level"
          outputParameters:
            - type: object
              mapping: "$."