Package inference

Interface GrpcService.InferStatisticsOrBuilder

All Superinterfaces:
com.google.protobuf.MessageLiteOrBuilder, com.google.protobuf.MessageOrBuilder
All Known Implementing Classes:
GrpcService.InferStatistics, GrpcService.InferStatistics.Builder
Enclosing class:
GrpcService

public static interface GrpcService.InferStatisticsOrBuilder extends com.google.protobuf.MessageOrBuilder
  • Method Details

    • hasSuccess

      boolean hasSuccess()
      @@  .. cpp:var:: StatisticDuration success
      @@
      @@     Cumulative count and duration for successful inference
      @@     request. The "success" count and cumulative duration includes
      @@     cache hits.
      @@
       
      .inference.StatisticDuration success = 1;
      Returns:
      Whether the success field is set.
    • getSuccess

      @@  .. cpp:var:: StatisticDuration success
      @@
      @@     Cumulative count and duration for successful inference
      @@     request. The "success" count and cumulative duration includes
      @@     cache hits.
      @@
       
      .inference.StatisticDuration success = 1;
      Returns:
      The success.
    • getSuccessOrBuilder

      @@  .. cpp:var:: StatisticDuration success
      @@
      @@     Cumulative count and duration for successful inference
      @@     request. The "success" count and cumulative duration includes
      @@     cache hits.
      @@
       
      .inference.StatisticDuration success = 1;
    • hasFail

      boolean hasFail()
      @@  .. cpp:var:: StatisticDuration fail
      @@
      @@     Cumulative count and duration for failed inference
      @@     request.
      @@
       
      .inference.StatisticDuration fail = 2;
      Returns:
      Whether the fail field is set.
    • getFail

      @@  .. cpp:var:: StatisticDuration fail
      @@
      @@     Cumulative count and duration for failed inference
      @@     request.
      @@
       
      .inference.StatisticDuration fail = 2;
      Returns:
      The fail.
    • getFailOrBuilder

      @@  .. cpp:var:: StatisticDuration fail
      @@
      @@     Cumulative count and duration for failed inference
      @@     request.
      @@
       
      .inference.StatisticDuration fail = 2;
    • hasQueue

      boolean hasQueue()
      @@  .. cpp:var:: StatisticDuration queue
      @@
      @@     The count and cumulative duration that inference requests wait in
      @@     scheduling or other queues. The "queue" count and cumulative
      @@     duration includes cache hits.
      @@
       
      .inference.StatisticDuration queue = 3;
      Returns:
      Whether the queue field is set.
    • getQueue

      @@  .. cpp:var:: StatisticDuration queue
      @@
      @@     The count and cumulative duration that inference requests wait in
      @@     scheduling or other queues. The "queue" count and cumulative
      @@     duration includes cache hits.
      @@
       
      .inference.StatisticDuration queue = 3;
      Returns:
      The queue.
    • getQueueOrBuilder

      @@  .. cpp:var:: StatisticDuration queue
      @@
      @@     The count and cumulative duration that inference requests wait in
      @@     scheduling or other queues. The "queue" count and cumulative
      @@     duration includes cache hits.
      @@
       
      .inference.StatisticDuration queue = 3;
    • hasComputeInput

      boolean hasComputeInput()
      @@  .. cpp:var:: StatisticDuration compute_input
      @@
      @@     The count and cumulative duration to prepare input tensor data as
      @@     required by the model framework / backend. For example, this duration
      @@     should include the time to copy input tensor data to the GPU.
      @@     The "compute_input" count and cumulative duration do not account for
      @@     requests that were a cache hit. See the "cache_hit" field for more
      @@     info.
      @@
       
      .inference.StatisticDuration compute_input = 4;
      Returns:
      Whether the computeInput field is set.
    • getComputeInput

      @@  .. cpp:var:: StatisticDuration compute_input
      @@
      @@     The count and cumulative duration to prepare input tensor data as
      @@     required by the model framework / backend. For example, this duration
      @@     should include the time to copy input tensor data to the GPU.
      @@     The "compute_input" count and cumulative duration do not account for
      @@     requests that were a cache hit. See the "cache_hit" field for more
      @@     info.
      @@
       
      .inference.StatisticDuration compute_input = 4;
      Returns:
      The computeInput.
    • getComputeInputOrBuilder

      GrpcService.StatisticDurationOrBuilder getComputeInputOrBuilder()
      @@  .. cpp:var:: StatisticDuration compute_input
      @@
      @@     The count and cumulative duration to prepare input tensor data as
      @@     required by the model framework / backend. For example, this duration
      @@     should include the time to copy input tensor data to the GPU.
      @@     The "compute_input" count and cumulative duration do not account for
      @@     requests that were a cache hit. See the "cache_hit" field for more
      @@     info.
      @@
       
      .inference.StatisticDuration compute_input = 4;
    • hasComputeInfer

      boolean hasComputeInfer()
      @@  .. cpp:var:: StatisticDuration compute_infer
      @@
      @@     The count and cumulative duration to execute the model.
      @@     The "compute_infer" count and cumulative duration do not account for
      @@     requests that were a cache hit. See the "cache_hit" field for more
      @@     info.
      @@
       
      .inference.StatisticDuration compute_infer = 5;
      Returns:
      Whether the computeInfer field is set.
    • getComputeInfer

      @@  .. cpp:var:: StatisticDuration compute_infer
      @@
      @@     The count and cumulative duration to execute the model.
      @@     The "compute_infer" count and cumulative duration do not account for
      @@     requests that were a cache hit. See the "cache_hit" field for more
      @@     info.
      @@
       
      .inference.StatisticDuration compute_infer = 5;
      Returns:
      The computeInfer.
    • getComputeInferOrBuilder

      GrpcService.StatisticDurationOrBuilder getComputeInferOrBuilder()
      @@  .. cpp:var:: StatisticDuration compute_infer
      @@
      @@     The count and cumulative duration to execute the model.
      @@     The "compute_infer" count and cumulative duration do not account for
      @@     requests that were a cache hit. See the "cache_hit" field for more
      @@     info.
      @@
       
      .inference.StatisticDuration compute_infer = 5;
    • hasComputeOutput

      boolean hasComputeOutput()
      @@  .. cpp:var:: StatisticDuration compute_output
      @@
      @@     The count and cumulative duration to extract output tensor data
      @@     produced by the model framework / backend. For example, this duration
      @@     should include the time to copy output tensor data from the GPU.
      @@     The "compute_output" count and cumulative duration do not account for
      @@     requests that were a cache hit. See the "cache_hit" field for more
      @@     info.
      @@
       
      .inference.StatisticDuration compute_output = 6;
      Returns:
      Whether the computeOutput field is set.
    • getComputeOutput

      GrpcService.StatisticDuration getComputeOutput()
      @@  .. cpp:var:: StatisticDuration compute_output
      @@
      @@     The count and cumulative duration to extract output tensor data
      @@     produced by the model framework / backend. For example, this duration
      @@     should include the time to copy output tensor data from the GPU.
      @@     The "compute_output" count and cumulative duration do not account for
      @@     requests that were a cache hit. See the "cache_hit" field for more
      @@     info.
      @@
       
      .inference.StatisticDuration compute_output = 6;
      Returns:
      The computeOutput.
    • getComputeOutputOrBuilder

      GrpcService.StatisticDurationOrBuilder getComputeOutputOrBuilder()
      @@  .. cpp:var:: StatisticDuration compute_output
      @@
      @@     The count and cumulative duration to extract output tensor data
      @@     produced by the model framework / backend. For example, this duration
      @@     should include the time to copy output tensor data from the GPU.
      @@     The "compute_output" count and cumulative duration do not account for
      @@     requests that were a cache hit. See the "cache_hit" field for more
      @@     info.
      @@
       
      .inference.StatisticDuration compute_output = 6;
    • hasCacheHit

      boolean hasCacheHit()
      @@  .. cpp:var:: StatisticDuration cache_hit
      @@
      @@     The count of response cache hits and cumulative duration to lookup
      @@     and extract output tensor data from the Response Cache on a cache
      @@     hit. For example, this duration should include the time to copy
      @@     output tensor data from the Response Cache to the response object.
      @@     On cache hits, triton does not need to go to the model/backend
      @@     for the output tensor data, so the "compute_input", "compute_infer",
      @@     and "compute_output" fields are not updated. Assuming the response
      @@     cache is enabled for a given model, a cache hit occurs for a
      @@     request to that model when the request metadata (model name,
      @@     model version, model inputs) hashes to an existing entry in the
      @@     cache. On a cache miss, the request hash and response output tensor
      @@     data is added to the cache. See response cache docs for more info:
      @@
       https://github.com/triton-inference-server/server/blob/main/docs/response_cache.md
      @@
       
      .inference.StatisticDuration cache_hit = 7;
      Returns:
      Whether the cacheHit field is set.
    • getCacheHit

      @@  .. cpp:var:: StatisticDuration cache_hit
      @@
      @@     The count of response cache hits and cumulative duration to lookup
      @@     and extract output tensor data from the Response Cache on a cache
      @@     hit. For example, this duration should include the time to copy
      @@     output tensor data from the Response Cache to the response object.
      @@     On cache hits, triton does not need to go to the model/backend
      @@     for the output tensor data, so the "compute_input", "compute_infer",
      @@     and "compute_output" fields are not updated. Assuming the response
      @@     cache is enabled for a given model, a cache hit occurs for a
      @@     request to that model when the request metadata (model name,
      @@     model version, model inputs) hashes to an existing entry in the
      @@     cache. On a cache miss, the request hash and response output tensor
      @@     data is added to the cache. See response cache docs for more info:
      @@
       https://github.com/triton-inference-server/server/blob/main/docs/response_cache.md
      @@
       
      .inference.StatisticDuration cache_hit = 7;
      Returns:
      The cacheHit.
    • getCacheHitOrBuilder

      @@  .. cpp:var:: StatisticDuration cache_hit
      @@
      @@     The count of response cache hits and cumulative duration to lookup
      @@     and extract output tensor data from the Response Cache on a cache
      @@     hit. For example, this duration should include the time to copy
      @@     output tensor data from the Response Cache to the response object.
      @@     On cache hits, triton does not need to go to the model/backend
      @@     for the output tensor data, so the "compute_input", "compute_infer",
      @@     and "compute_output" fields are not updated. Assuming the response
      @@     cache is enabled for a given model, a cache hit occurs for a
      @@     request to that model when the request metadata (model name,
      @@     model version, model inputs) hashes to an existing entry in the
      @@     cache. On a cache miss, the request hash and response output tensor
      @@     data is added to the cache. See response cache docs for more info:
      @@
       https://github.com/triton-inference-server/server/blob/main/docs/response_cache.md
      @@
       
      .inference.StatisticDuration cache_hit = 7;
    • hasCacheMiss

      boolean hasCacheMiss()
      @@  .. cpp:var:: StatisticDuration cache_miss
      @@
      @@     The count of response cache misses and cumulative duration to lookup
      @@     and insert output tensor data from the computed response to the
       cache.
      @@     For example, this duration should include the time to copy
      @@     output tensor data from the response object to the Response Cache.
      @@     Assuming the response cache is enabled for a given model, a cache
      @@     miss occurs for a request to that model when the request metadata
      @@     does NOT hash to an existing entry in the cache. See the response
      @@     cache docs for more info:
      @@
       https://github.com/triton-inference-server/server/blob/main/docs/response_cache.md
      @@
       
      .inference.StatisticDuration cache_miss = 8;
      Returns:
      Whether the cacheMiss field is set.
    • getCacheMiss

      @@  .. cpp:var:: StatisticDuration cache_miss
      @@
      @@     The count of response cache misses and cumulative duration to lookup
      @@     and insert output tensor data from the computed response to the
       cache.
      @@     For example, this duration should include the time to copy
      @@     output tensor data from the response object to the Response Cache.
      @@     Assuming the response cache is enabled for a given model, a cache
      @@     miss occurs for a request to that model when the request metadata
      @@     does NOT hash to an existing entry in the cache. See the response
      @@     cache docs for more info:
      @@
       https://github.com/triton-inference-server/server/blob/main/docs/response_cache.md
      @@
       
      .inference.StatisticDuration cache_miss = 8;
      Returns:
      The cacheMiss.
    • getCacheMissOrBuilder

      @@  .. cpp:var:: StatisticDuration cache_miss
      @@
      @@     The count of response cache misses and cumulative duration to lookup
      @@     and insert output tensor data from the computed response to the
       cache.
      @@     For example, this duration should include the time to copy
      @@     output tensor data from the response object to the Response Cache.
      @@     Assuming the response cache is enabled for a given model, a cache
      @@     miss occurs for a request to that model when the request metadata
      @@     does NOT hash to an existing entry in the cache. See the response
      @@     cache docs for more info:
      @@
       https://github.com/triton-inference-server/server/blob/main/docs/response_cache.md
      @@
       
      .inference.StatisticDuration cache_miss = 8;