Triton Inference Server · Capability
Triton Inference Server NVIDIA Triton Inference Server HTTP/REST API — CUDA Shared Memory
Triton Inference Server NVIDIA Triton Inference Server HTTP/REST API — CUDA Shared Memory. 4 operations. Lead operation: Triton Inference Server Register a CUDA Shared Memory Region. Self-contained Naftiko capability covering one Triton business surface.
What You Can Do
POST
Cudasharedmemoryregister
— Triton Inference Server Register a CUDA Shared Memory Region
/v1/v2/cudasharedmemory/region/{region-name}/register
POST
Cudasharedmemoryunregister
— Triton Inference Server Unregister a CUDA Shared Memory Region
/v1/v2/cudasharedmemory/region/{region-name}/unregister
GET
Cudasharedmemorystatus
— Triton Inference Server Get CUDA Shared Memory Status
/v1/v2/cudasharedmemory/status
POST
Cudasharedmemoryunregisterall
— Triton Inference Server Unregister All CUDA Shared Memory Regions
/v1/v2/cudasharedmemory/unregister
MCP Tools
triton-inference-server-register-cuda
Triton Inference Server Register a CUDA Shared Memory Region
triton-inference-server-unregister-cuda
Triton Inference Server Unregister a CUDA Shared Memory Region
triton-inference-server-get-cuda
Triton Inference Server Get CUDA Shared Memory Status
read-only
idempotent
triton-inference-server-unregister-all
Triton Inference Server Unregister All CUDA Shared Memory Regions