Class TritonModelStatistics

java.lang.Object
com.gencior.triton.core.pojo.TritonModelStatistics

public final class TritonModelStatistics extends Object
Encapsulates comprehensive statistics for a deployed Triton model.

This class aggregates performance metrics for a specific model including inference counts, timing statistics, memory usage, and response statistics. It provides convenient methods to calculate derived metrics such as average latency, success rate, and memory usage.

This is an immutable object that wraps the gRPC message ModelStatistics.

Convenience Methods:

Since:
1.0.0
Author:
sachachoumiloff
  • Method Details

    • fromProto

      public static TritonModelStatistics fromProto(GrpcService.ModelStatistics proto)
    • getAverageComputeMs

      public double getAverageComputeMs()
      Returns the average inference computation time in milliseconds.

      This is calculated as total compute time divided by the number of inferences.

      Returns:
      the average computation time in milliseconds, or 0.0 if no inferences
    • getAverageQueueMs

      public double getAverageQueueMs()
      Returns the average queue wait time in milliseconds.

      This represents the average time requests spent waiting before being processed.

      Returns:
      the average queue wait time in milliseconds, or 0.0 if no inferences
    • getSuccessRate

      public double getSuccessRate()
      Returns the success rate as a ratio between 0.0 and 1.0.

      This is calculated as successful inferences divided by total inferences (successful + failed).

      Returns:
      the success rate (0.0 = all failed, 1.0 = all successful)
    • getBatchingEfficiency

      public double getBatchingEfficiency()
      Returns the batching efficiency ratio.

      This is calculated as total inferences divided by total execution batches. A higher ratio indicates better batching efficiency.

      Returns:
      the batching efficiency ratio (inferences per execution), or 0.0 if no executions
    • getTotalGpuMemoryUsage

      public long getTotalGpuMemoryUsage()
      Returns the total GPU memory usage in bytes across all model instances.
      Returns:
      total GPU memory usage in bytes
    • hasCacheHits

      public boolean hasCacheHits()
      Checks if there have been any cache hits for this model.
      Returns:
      true if at least one cache hit occurred, false otherwise
    • getModelIdentifier

      public String getModelIdentifier()
      Returns a human-readable identifier for this model.

      Format: "model_name (vX.Y.Z)"

      Returns:
      the model identifier string
    • getName

      public String getName()
      Returns the name of the model.
      Returns:
      the model name
    • getInferenceCount

      public long getInferenceCount()
      Returns the total number of inferences executed by this model.
      Returns:
      the total inference count
    • getVersion

      public String getVersion()
      Returns the version of the model.
      Returns:
      the model version string
    • getLastInference

      public long getLastInference()
      Returns the timestamp of the last inference in microseconds (Unix epoch).
      Returns:
      the last inference timestamp
    • getExecustionCount

      public long getExecustionCount()
      Returns the total number of execution batches for this model.
      Returns:
      the execution count
    • getTritonInferStatistics

      public TritonInferStatistics getTritonInferStatistics()
      Returns detailed inference timing statistics.
      Returns:
      the inference statistics object containing compute, queue, and status timings