All Superinterfaces:: com.google.protobuf.MessageLiteOrBuilder, com.google.protobuf.MessageOrBuilder

All Known Implementing Classes:: ModelConfigOuterClass.ModelSequenceBatching.StrategyOldest, ModelConfigOuterClass.ModelSequenceBatching.StrategyOldest.Builder

Enclosing class:: ModelConfigOuterClass.ModelSequenceBatching

public static interface ModelConfigOuterClass.ModelSequenceBatching.StrategyOldestOrBuilder extends com.google.protobuf.MessageOrBuilder

Method Summary

Modifier and Type

Method

Description

int

getMaxCandidateSequences()

@@ ..

long

getMaxQueueDelayMicroseconds()

@@ ..

int

getPreferredBatchSize(int index)

@@ ..

int

getPreferredBatchSizeCount()

@@ ..

List<Integer>

getPreferredBatchSizeList()

@@ ..

boolean

getPreserveOrdering()

@@ ..

Methods inherited from interface com.google.protobuf.MessageLiteOrBuilder
isInitialized

Methods inherited from interface com.google.protobuf.MessageOrBuilder
findInitializationErrors, getAllFields, getDefaultInstanceForType, getDescriptorForType, getField, getInitializationErrorString, getOneofFieldDescriptor, getRepeatedField, getRepeatedFieldCount, getUnknownFields, hasField, hasOneof

Method Details

getMaxCandidateSequences

int getMaxCandidateSequences()

@@    .. cpp:var:: int32 max_candidate_sequences
@@
@@       Maximum number of candidate sequences that the batcher
@@       maintains. Excess sequences are kept in an ordered backlog
@@       and become candidates when existing candidate sequences
@@       complete.
@@

int32 max_candidate_sequences = 1;

Returns:: The maxCandidateSequences.

getPreferredBatchSizeList

List<Integer> getPreferredBatchSizeList()

@@    .. cpp:var:: int32 preferred_batch_size (repeated)
@@
@@       Preferred batch sizes for dynamic batching of candidate
@@       sequences. If a batch of one of these sizes can be formed
@@       it will be executed immediately. If not specified a
@@       preferred batch size will be chosen automatically
@@       based on model and GPU characteristics.
@@

repeated int32 preferred_batch_size = 2;

Returns:: A list containing the preferredBatchSize.

getPreferredBatchSizeCount

int getPreferredBatchSizeCount()

@@    .. cpp:var:: int32 preferred_batch_size (repeated)
@@
@@       Preferred batch sizes for dynamic batching of candidate
@@       sequences. If a batch of one of these sizes can be formed
@@       it will be executed immediately. If not specified a
@@       preferred batch size will be chosen automatically
@@       based on model and GPU characteristics.
@@

repeated int32 preferred_batch_size = 2;

Returns:: The count of preferredBatchSize.

getPreferredBatchSize

int getPreferredBatchSize(int index)

@@    .. cpp:var:: int32 preferred_batch_size (repeated)
@@
@@       Preferred batch sizes for dynamic batching of candidate
@@       sequences. If a batch of one of these sizes can be formed
@@       it will be executed immediately. If not specified a
@@       preferred batch size will be chosen automatically
@@       based on model and GPU characteristics.
@@

repeated int32 preferred_batch_size = 2;

Parameters:: index - The index of the element to return.
Returns:: The preferredBatchSize at the given index.

getMaxQueueDelayMicroseconds

long getMaxQueueDelayMicroseconds()

@@    .. cpp:var:: uint64 max_queue_delay_microseconds
@@
@@       The maximum time, in microseconds, a candidate request
@@       will be delayed in the dynamic batch scheduling queue to
@@       wait for additional requests for batching. Default is 0.
@@

uint64 max_queue_delay_microseconds = 3;

Returns:: The maxQueueDelayMicroseconds.

getPreserveOrdering

boolean getPreserveOrdering()

@@    .. cpp:var:: bool preserve_ordering
@@
@@       Should the dynamic batcher preserve the ordering of responses to
@@       match the order of requests received by the scheduler. Default is
@@       false. If true, the responses will be returned in the same order
@@       as the order of requests sent to the scheduler. If false, the
@@       responses may be returned in arbitrary order. This option is
@@       specifically needed when a sequence of related inference requests
@@       (i.e. inference requests with the same correlation ID) are sent
@@       to the dynamic batcher to ensure that the sequence responses are
@@       in the correct order.
@@
@@       When using decoupled models, setting this to true may block the
@@       responses from independent sequences from being returned to the
@@       client until the previous request completes, hurting overall
@@       performance. If using GRPC streaming protocol, the stream
@@       ordering guarantee may be sufficient alone to ensure the
@@       responses for each sequence are returned in sequence-order
@@       without blocking based on independent requests, depending on the
@@       use case.
@@

bool preserve_ordering = 4;

Returns:: The preserveOrdering.

Interface ModelConfigOuterClass.ModelSequenceBatching.StrategyOldestOrBuilder

Method Summary

Methods inherited from interface com.google.protobuf.MessageLiteOrBuilder

Methods inherited from interface com.google.protobuf.MessageOrBuilder

Method Details

getMaxCandidateSequences

getPreferredBatchSizeList

getPreferredBatchSizeCount

getPreferredBatchSize

getMaxQueueDelayMicroseconds

getPreserveOrdering