Package inference

Interface ModelConfigOuterClass.ModelSequenceBatching.StrategyOldestOrBuilder

All Superinterfaces:
com.google.protobuf.MessageLiteOrBuilder, com.google.protobuf.MessageOrBuilder
All Known Implementing Classes:
ModelConfigOuterClass.ModelSequenceBatching.StrategyOldest, ModelConfigOuterClass.ModelSequenceBatching.StrategyOldest.Builder
Enclosing class:
ModelConfigOuterClass.ModelSequenceBatching

public static interface ModelConfigOuterClass.ModelSequenceBatching.StrategyOldestOrBuilder extends com.google.protobuf.MessageOrBuilder
  • Method Details

    • getMaxCandidateSequences

      int getMaxCandidateSequences()
      @@    .. cpp:var:: int32 max_candidate_sequences
      @@
      @@       Maximum number of candidate sequences that the batcher
      @@       maintains. Excess sequences are kept in an ordered backlog
      @@       and become candidates when existing candidate sequences
      @@       complete.
      @@
       
      int32 max_candidate_sequences = 1;
      Returns:
      The maxCandidateSequences.
    • getPreferredBatchSizeList

      List<Integer> getPreferredBatchSizeList()
      @@    .. cpp:var:: int32 preferred_batch_size (repeated)
      @@
      @@       Preferred batch sizes for dynamic batching of candidate
      @@       sequences. If a batch of one of these sizes can be formed
      @@       it will be executed immediately. If not specified a
      @@       preferred batch size will be chosen automatically
      @@       based on model and GPU characteristics.
      @@
       
      repeated int32 preferred_batch_size = 2;
      Returns:
      A list containing the preferredBatchSize.
    • getPreferredBatchSizeCount

      int getPreferredBatchSizeCount()
      @@    .. cpp:var:: int32 preferred_batch_size (repeated)
      @@
      @@       Preferred batch sizes for dynamic batching of candidate
      @@       sequences. If a batch of one of these sizes can be formed
      @@       it will be executed immediately. If not specified a
      @@       preferred batch size will be chosen automatically
      @@       based on model and GPU characteristics.
      @@
       
      repeated int32 preferred_batch_size = 2;
      Returns:
      The count of preferredBatchSize.
    • getPreferredBatchSize

      int getPreferredBatchSize(int index)
      @@    .. cpp:var:: int32 preferred_batch_size (repeated)
      @@
      @@       Preferred batch sizes for dynamic batching of candidate
      @@       sequences. If a batch of one of these sizes can be formed
      @@       it will be executed immediately. If not specified a
      @@       preferred batch size will be chosen automatically
      @@       based on model and GPU characteristics.
      @@
       
      repeated int32 preferred_batch_size = 2;
      Parameters:
      index - The index of the element to return.
      Returns:
      The preferredBatchSize at the given index.
    • getMaxQueueDelayMicroseconds

      long getMaxQueueDelayMicroseconds()
      @@    .. cpp:var:: uint64 max_queue_delay_microseconds
      @@
      @@       The maximum time, in microseconds, a candidate request
      @@       will be delayed in the dynamic batch scheduling queue to
      @@       wait for additional requests for batching. Default is 0.
      @@
       
      uint64 max_queue_delay_microseconds = 3;
      Returns:
      The maxQueueDelayMicroseconds.
    • getPreserveOrdering

      boolean getPreserveOrdering()
      @@    .. cpp:var:: bool preserve_ordering
      @@
      @@       Should the dynamic batcher preserve the ordering of responses to
      @@       match the order of requests received by the scheduler. Default is
      @@       false. If true, the responses will be returned in the same order
      @@       as the order of requests sent to the scheduler. If false, the
      @@       responses may be returned in arbitrary order. This option is
      @@       specifically needed when a sequence of related inference requests
      @@       (i.e. inference requests with the same correlation ID) are sent
      @@       to the dynamic batcher to ensure that the sequence responses are
      @@       in the correct order.
      @@
      @@       When using decoupled models, setting this to true may block the
      @@       responses from independent sequences from being returned to the
      @@       client until the previous request completes, hurting overall
      @@       performance. If using GRPC streaming protocol, the stream
      @@       ordering guarantee may be sufficient alone to ensure the
      @@       responses for each sequence are returned in sequence-order
      @@       without blocking based on independent requests, depending on the
      @@       use case.
      @@
       
      bool preserve_ordering = 4;
      Returns:
      The preserveOrdering.