com.gencior.triton.core.pojo.TritonModelStatistics

public final class TritonModelStatistics extends Object

Encapsulates comprehensive statistics for a deployed Triton model.

This class aggregates performance metrics for a specific model including inference counts, timing statistics, memory usage, and response statistics. It provides convenient methods to calculate derived metrics such as average latency, success rate, and memory usage.

This is an immutable object that wraps the gRPC message ModelStatistics.

Convenience Methods:

getAverageComputeMs() - Calculate average inference computation time
getSuccessRate() - Calculate percentage of successful inferences
getBatchingEfficiency() - Calculate batching efficiency ratio
getTotalGpuMemoryUsage() - Sum GPU memory usage across all instances

Since:: 1.0.0
Author:: sachachoumiloff

Method Summary

Modifier and Type

Method

Description

static TritonModelStatistics

fromProto(GrpcService.ModelStatistics proto)

double

getAverageComputeMs()

Returns the average inference computation time in milliseconds.

double

getAverageQueueMs()

Returns the average queue wait time in milliseconds.

double

getBatchingEfficiency()

Returns the batching efficiency ratio.

long

getExecustionCount()

Returns the total number of execution batches for this model.

long

getInferenceCount()

Returns the total number of inferences executed by this model.

long

getLastInference()

Returns the timestamp of the last inference in microseconds (Unix epoch).

String

getModelIdentifier()

Returns a human-readable identifier for this model.

String

getName()

Returns the name of the model.

double

getSuccessRate()

Returns the success rate as a ratio between 0.0 and 1.0.

long

getTotalGpuMemoryUsage()

Returns the total GPU memory usage in bytes across all model instances.

TritonInferStatistics

getTritonInferStatistics()

Returns detailed inference timing statistics.

String

getVersion()

Returns the version of the model.

boolean

hasCacheHits()

Checks if there have been any cache hits for this model.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Method Details
- fromProto
  
  public static TritonModelStatistics fromProto(GrpcService.ModelStatistics proto)
- getAverageComputeMs
  
  public double getAverageComputeMs()
  
  Returns the average inference computation time in milliseconds.
  This is calculated as total compute time divided by the number of inferences.
  
  Returns:
  
  the average computation time in milliseconds, or 0.0 if no inferences
- getAverageQueueMs
  
  public double getAverageQueueMs()
  
  Returns the average queue wait time in milliseconds.
  This represents the average time requests spent waiting before being processed.
  
  Returns:
  
  the average queue wait time in milliseconds, or 0.0 if no inferences
- getSuccessRate
  
  public double getSuccessRate()
  
  Returns the success rate as a ratio between 0.0 and 1.0.
  This is calculated as successful inferences divided by total inferences (successful + failed).
  
  Returns:
  
  the success rate (0.0 = all failed, 1.0 = all successful)
- getBatchingEfficiency
  
  public double getBatchingEfficiency()
  
  Returns the batching efficiency ratio.
  This is calculated as total inferences divided by total execution batches. A higher ratio indicates better batching efficiency.
  
  Returns:
  
  the batching efficiency ratio (inferences per execution), or 0.0 if no executions
- getTotalGpuMemoryUsage
  
  public long getTotalGpuMemoryUsage()
  
  Returns the total GPU memory usage in bytes across all model instances.
  
  Returns:
  
  total GPU memory usage in bytes
- hasCacheHits
  
  public boolean hasCacheHits()
  
  Checks if there have been any cache hits for this model.
  
  Returns:
  
  true if at least one cache hit occurred, false otherwise
- getModelIdentifier
  
  public String getModelIdentifier()
  
  Returns a human-readable identifier for this model.
  Format: "model_name (vX.Y.Z)"
  
  Returns:
  
  the model identifier string
- getName
  
  public String getName()
  
  Returns the name of the model.
  
  Returns:
  
  the model name
- getInferenceCount
  
  public long getInferenceCount()
  
  Returns the total number of inferences executed by this model.
  
  Returns:
  
  the total inference count
- getVersion
  
  public String getVersion()
  
  Returns the version of the model.
  
  Returns:
  
  the model version string
- getLastInference
  
  public long getLastInference()
  
  Returns the timestamp of the last inference in microseconds (Unix epoch).
  
  Returns:
  
  the last inference timestamp
- getExecustionCount
  
  public long getExecustionCount()
  
  Returns the total number of execution batches for this model.
  
  Returns:
  
  the execution count
- getTritonInferStatistics
  
  public TritonInferStatistics getTritonInferStatistics()
  
  Returns detailed inference timing statistics.
  
  Returns:
  
  the inference statistics object containing compute, queue, and status timings

Class TritonModelStatistics

Convenience Methods:

Method Summary

Methods inherited from class java.lang.Object

Method Details

fromProto

getAverageComputeMs

getAverageQueueMs

getSuccessRate

getBatchingEfficiency

getTotalGpuMemoryUsage

hasCacheHits

getModelIdentifier

getName

getInferenceCount

getVersion

getLastInference

getExecustionCount

getTritonInferStatistics