Package inference
Interface ModelConfigOuterClass.ModelSequenceBatching.StrategyOldestOrBuilder
- All Superinterfaces:
com.google.protobuf.MessageLiteOrBuilder,com.google.protobuf.MessageOrBuilder
- All Known Implementing Classes:
ModelConfigOuterClass.ModelSequenceBatching.StrategyOldest,ModelConfigOuterClass.ModelSequenceBatching.StrategyOldest.Builder
- Enclosing class:
ModelConfigOuterClass.ModelSequenceBatching
public static interface ModelConfigOuterClass.ModelSequenceBatching.StrategyOldestOrBuilder
extends com.google.protobuf.MessageOrBuilder
-
Method Summary
Modifier and TypeMethodDescriptionint@@ ..long@@ ..intgetPreferredBatchSize(int index) @@ ..int@@ ..@@ ..boolean@@ ..Methods inherited from interface com.google.protobuf.MessageLiteOrBuilder
isInitializedMethods inherited from interface com.google.protobuf.MessageOrBuilder
findInitializationErrors, getAllFields, getDefaultInstanceForType, getDescriptorForType, getField, getInitializationErrorString, getOneofFieldDescriptor, getRepeatedField, getRepeatedFieldCount, getUnknownFields, hasField, hasOneof
-
Method Details
-
getMaxCandidateSequences
int getMaxCandidateSequences()@@ .. cpp:var:: int32 max_candidate_sequences @@ @@ Maximum number of candidate sequences that the batcher @@ maintains. Excess sequences are kept in an ordered backlog @@ and become candidates when existing candidate sequences @@ complete. @@
int32 max_candidate_sequences = 1;- Returns:
- The maxCandidateSequences.
-
getPreferredBatchSizeList
@@ .. cpp:var:: int32 preferred_batch_size (repeated) @@ @@ Preferred batch sizes for dynamic batching of candidate @@ sequences. If a batch of one of these sizes can be formed @@ it will be executed immediately. If not specified a @@ preferred batch size will be chosen automatically @@ based on model and GPU characteristics. @@
repeated int32 preferred_batch_size = 2;- Returns:
- A list containing the preferredBatchSize.
-
getPreferredBatchSizeCount
int getPreferredBatchSizeCount()@@ .. cpp:var:: int32 preferred_batch_size (repeated) @@ @@ Preferred batch sizes for dynamic batching of candidate @@ sequences. If a batch of one of these sizes can be formed @@ it will be executed immediately. If not specified a @@ preferred batch size will be chosen automatically @@ based on model and GPU characteristics. @@
repeated int32 preferred_batch_size = 2;- Returns:
- The count of preferredBatchSize.
-
getPreferredBatchSize
int getPreferredBatchSize(int index) @@ .. cpp:var:: int32 preferred_batch_size (repeated) @@ @@ Preferred batch sizes for dynamic batching of candidate @@ sequences. If a batch of one of these sizes can be formed @@ it will be executed immediately. If not specified a @@ preferred batch size will be chosen automatically @@ based on model and GPU characteristics. @@
repeated int32 preferred_batch_size = 2;- Parameters:
index- The index of the element to return.- Returns:
- The preferredBatchSize at the given index.
-
getMaxQueueDelayMicroseconds
long getMaxQueueDelayMicroseconds()@@ .. cpp:var:: uint64 max_queue_delay_microseconds @@ @@ The maximum time, in microseconds, a candidate request @@ will be delayed in the dynamic batch scheduling queue to @@ wait for additional requests for batching. Default is 0. @@
uint64 max_queue_delay_microseconds = 3;- Returns:
- The maxQueueDelayMicroseconds.
-
getPreserveOrdering
boolean getPreserveOrdering()@@ .. cpp:var:: bool preserve_ordering @@ @@ Should the dynamic batcher preserve the ordering of responses to @@ match the order of requests received by the scheduler. Default is @@ false. If true, the responses will be returned in the same order @@ as the order of requests sent to the scheduler. If false, the @@ responses may be returned in arbitrary order. This option is @@ specifically needed when a sequence of related inference requests @@ (i.e. inference requests with the same correlation ID) are sent @@ to the dynamic batcher to ensure that the sequence responses are @@ in the correct order. @@ @@ When using decoupled models, setting this to true may block the @@ responses from independent sequences from being returned to the @@ client until the previous request completes, hurting overall @@ performance. If using GRPC streaming protocol, the stream @@ ordering guarantee may be sufficient alone to ensure the @@ responses for each sequence are returned in sequence-order @@ without blocking based on independent requests, depending on the @@ use case. @@
bool preserve_ordering = 4;- Returns:
- The preserveOrdering.
-