Package inference

Class ModelConfigOuterClass.ModelOptimizationPolicy.ExecutionAccelerators

java.lang.Object
com.google.protobuf.AbstractMessageLite
com.google.protobuf.AbstractMessage
com.google.protobuf.GeneratedMessageV3
inference.ModelConfigOuterClass.ModelOptimizationPolicy.ExecutionAccelerators
All Implemented Interfaces:
com.google.protobuf.Message, com.google.protobuf.MessageLite, com.google.protobuf.MessageLiteOrBuilder, com.google.protobuf.MessageOrBuilder, ModelConfigOuterClass.ModelOptimizationPolicy.ExecutionAcceleratorsOrBuilder, Serializable
Enclosing class:
ModelConfigOuterClass.ModelOptimizationPolicy

public static final class ModelConfigOuterClass.ModelOptimizationPolicy.ExecutionAccelerators extends com.google.protobuf.GeneratedMessageV3 implements ModelConfigOuterClass.ModelOptimizationPolicy.ExecutionAcceleratorsOrBuilder
@@
@@  .. cpp:var:: message ExecutionAccelerators
@@
@@     Specify the preferred execution accelerators to be used to execute
@@     the model. Currently only recognized by ONNX Runtime backend and
@@     TensorFlow backend.
@@
@@     For ONNX Runtime backend, it will deploy the model with the execution
@@     accelerators by priority, the priority is determined based on the
@@     order that they are set, i.e. the provider at the front has highest
@@     priority. Overall, the priority will be in the following order:
@@         <gpu_execution_accelerator> (if instance is on GPU)
@@         CUDA Execution Provider     (if instance is on GPU)
@@         <cpu_execution_accelerator>
@@         Default CPU Execution Provider
@@
 
Protobuf type inference.ModelOptimizationPolicy.ExecutionAccelerators
See Also:
  • Field Details

    • GPU_EXECUTION_ACCELERATOR_FIELD_NUMBER

      public static final int GPU_EXECUTION_ACCELERATOR_FIELD_NUMBER
      See Also:
    • CPU_EXECUTION_ACCELERATOR_FIELD_NUMBER

      public static final int CPU_EXECUTION_ACCELERATOR_FIELD_NUMBER
      See Also:
  • Method Details

    • newInstance

      protected Object newInstance(com.google.protobuf.GeneratedMessageV3.UnusedPrivateParameter unused)
      Overrides:
      newInstance in class com.google.protobuf.GeneratedMessageV3
    • getDescriptor

      public static final com.google.protobuf.Descriptors.Descriptor getDescriptor()
    • internalGetFieldAccessorTable

      protected com.google.protobuf.GeneratedMessageV3.FieldAccessorTable internalGetFieldAccessorTable()
      Specified by:
      internalGetFieldAccessorTable in class com.google.protobuf.GeneratedMessageV3
    • getGpuExecutionAcceleratorList

      @@    .. cpp:var:: Accelerator gpu_execution_accelerator (repeated)
      @@
      @@       The preferred execution provider to be used if the model instance
      @@       is deployed on GPU.
      @@
      @@       For ONNX Runtime backend, possible value is "tensorrt" as name,
      @@       and no parameters are required.
      @@
      @@       For TensorFlow backend, possible values are "tensorrt",
      @@       "auto_mixed_precision", "gpu_io".
      @@
      @@       For "tensorrt", the following parameters can be specified:
      @@         "precision_mode": The precision used for optimization.
      @@         Allowed values are "FP32" and "FP16". Default value is "FP32".
      @@
      @@         "max_cached_engines": The maximum number of cached TensorRT
      @@         engines in dynamic TensorRT ops. Default value is 100.
      @@
      @@         "minimum_segment_size": The smallest model subgraph that will
      @@         be considered for optimization by TensorRT. Default value is 3.
      @@
      @@         "max_workspace_size_bytes": The maximum GPU memory the model
      @@         can use temporarily during execution. Default value is 1GB.
      @@
      @@       For "auto_mixed_precision", no parameters are required. If set,
      @@       the model will try to use FP16 for better performance.
      @@       This optimization can not be set with "tensorrt".
      @@
      @@       For "gpu_io", no parameters are required. If set, the model will
      @@       be executed using TensorFlow Callable API to set input and output
      @@       tensors in GPU memory if possible, which can reduce data transfer
      @@       overhead if the model is used in ensemble. However, the Callable
      @@       object will be created on model creation and it will request all
      @@       outputs for every model execution, which may impact the
      @@       performance if a request does not require all outputs. This
      @@       optimization will only take affect if the model instance is
      @@       created with KIND_GPU.
      @@
       
      repeated .inference.ModelOptimizationPolicy.ExecutionAccelerators.Accelerator gpu_execution_accelerator = 1;
      Specified by:
      getGpuExecutionAcceleratorList in interface ModelConfigOuterClass.ModelOptimizationPolicy.ExecutionAcceleratorsOrBuilder
    • getGpuExecutionAcceleratorOrBuilderList

      public List<? extends ModelConfigOuterClass.ModelOptimizationPolicy.ExecutionAccelerators.AcceleratorOrBuilder> getGpuExecutionAcceleratorOrBuilderList()
      @@    .. cpp:var:: Accelerator gpu_execution_accelerator (repeated)
      @@
      @@       The preferred execution provider to be used if the model instance
      @@       is deployed on GPU.
      @@
      @@       For ONNX Runtime backend, possible value is "tensorrt" as name,
      @@       and no parameters are required.
      @@
      @@       For TensorFlow backend, possible values are "tensorrt",
      @@       "auto_mixed_precision", "gpu_io".
      @@
      @@       For "tensorrt", the following parameters can be specified:
      @@         "precision_mode": The precision used for optimization.
      @@         Allowed values are "FP32" and "FP16". Default value is "FP32".
      @@
      @@         "max_cached_engines": The maximum number of cached TensorRT
      @@         engines in dynamic TensorRT ops. Default value is 100.
      @@
      @@         "minimum_segment_size": The smallest model subgraph that will
      @@         be considered for optimization by TensorRT. Default value is 3.
      @@
      @@         "max_workspace_size_bytes": The maximum GPU memory the model
      @@         can use temporarily during execution. Default value is 1GB.
      @@
      @@       For "auto_mixed_precision", no parameters are required. If set,
      @@       the model will try to use FP16 for better performance.
      @@       This optimization can not be set with "tensorrt".
      @@
      @@       For "gpu_io", no parameters are required. If set, the model will
      @@       be executed using TensorFlow Callable API to set input and output
      @@       tensors in GPU memory if possible, which can reduce data transfer
      @@       overhead if the model is used in ensemble. However, the Callable
      @@       object will be created on model creation and it will request all
      @@       outputs for every model execution, which may impact the
      @@       performance if a request does not require all outputs. This
      @@       optimization will only take affect if the model instance is
      @@       created with KIND_GPU.
      @@
       
      repeated .inference.ModelOptimizationPolicy.ExecutionAccelerators.Accelerator gpu_execution_accelerator = 1;
      Specified by:
      getGpuExecutionAcceleratorOrBuilderList in interface ModelConfigOuterClass.ModelOptimizationPolicy.ExecutionAcceleratorsOrBuilder
    • getGpuExecutionAcceleratorCount

      public int getGpuExecutionAcceleratorCount()
      @@    .. cpp:var:: Accelerator gpu_execution_accelerator (repeated)
      @@
      @@       The preferred execution provider to be used if the model instance
      @@       is deployed on GPU.
      @@
      @@       For ONNX Runtime backend, possible value is "tensorrt" as name,
      @@       and no parameters are required.
      @@
      @@       For TensorFlow backend, possible values are "tensorrt",
      @@       "auto_mixed_precision", "gpu_io".
      @@
      @@       For "tensorrt", the following parameters can be specified:
      @@         "precision_mode": The precision used for optimization.
      @@         Allowed values are "FP32" and "FP16". Default value is "FP32".
      @@
      @@         "max_cached_engines": The maximum number of cached TensorRT
      @@         engines in dynamic TensorRT ops. Default value is 100.
      @@
      @@         "minimum_segment_size": The smallest model subgraph that will
      @@         be considered for optimization by TensorRT. Default value is 3.
      @@
      @@         "max_workspace_size_bytes": The maximum GPU memory the model
      @@         can use temporarily during execution. Default value is 1GB.
      @@
      @@       For "auto_mixed_precision", no parameters are required. If set,
      @@       the model will try to use FP16 for better performance.
      @@       This optimization can not be set with "tensorrt".
      @@
      @@       For "gpu_io", no parameters are required. If set, the model will
      @@       be executed using TensorFlow Callable API to set input and output
      @@       tensors in GPU memory if possible, which can reduce data transfer
      @@       overhead if the model is used in ensemble. However, the Callable
      @@       object will be created on model creation and it will request all
      @@       outputs for every model execution, which may impact the
      @@       performance if a request does not require all outputs. This
      @@       optimization will only take affect if the model instance is
      @@       created with KIND_GPU.
      @@
       
      repeated .inference.ModelOptimizationPolicy.ExecutionAccelerators.Accelerator gpu_execution_accelerator = 1;
      Specified by:
      getGpuExecutionAcceleratorCount in interface ModelConfigOuterClass.ModelOptimizationPolicy.ExecutionAcceleratorsOrBuilder
    • getGpuExecutionAccelerator

      @@    .. cpp:var:: Accelerator gpu_execution_accelerator (repeated)
      @@
      @@       The preferred execution provider to be used if the model instance
      @@       is deployed on GPU.
      @@
      @@       For ONNX Runtime backend, possible value is "tensorrt" as name,
      @@       and no parameters are required.
      @@
      @@       For TensorFlow backend, possible values are "tensorrt",
      @@       "auto_mixed_precision", "gpu_io".
      @@
      @@       For "tensorrt", the following parameters can be specified:
      @@         "precision_mode": The precision used for optimization.
      @@         Allowed values are "FP32" and "FP16". Default value is "FP32".
      @@
      @@         "max_cached_engines": The maximum number of cached TensorRT
      @@         engines in dynamic TensorRT ops. Default value is 100.
      @@
      @@         "minimum_segment_size": The smallest model subgraph that will
      @@         be considered for optimization by TensorRT. Default value is 3.
      @@
      @@         "max_workspace_size_bytes": The maximum GPU memory the model
      @@         can use temporarily during execution. Default value is 1GB.
      @@
      @@       For "auto_mixed_precision", no parameters are required. If set,
      @@       the model will try to use FP16 for better performance.
      @@       This optimization can not be set with "tensorrt".
      @@
      @@       For "gpu_io", no parameters are required. If set, the model will
      @@       be executed using TensorFlow Callable API to set input and output
      @@       tensors in GPU memory if possible, which can reduce data transfer
      @@       overhead if the model is used in ensemble. However, the Callable
      @@       object will be created on model creation and it will request all
      @@       outputs for every model execution, which may impact the
      @@       performance if a request does not require all outputs. This
      @@       optimization will only take affect if the model instance is
      @@       created with KIND_GPU.
      @@
       
      repeated .inference.ModelOptimizationPolicy.ExecutionAccelerators.Accelerator gpu_execution_accelerator = 1;
      Specified by:
      getGpuExecutionAccelerator in interface ModelConfigOuterClass.ModelOptimizationPolicy.ExecutionAcceleratorsOrBuilder
    • getGpuExecutionAcceleratorOrBuilder

      @@    .. cpp:var:: Accelerator gpu_execution_accelerator (repeated)
      @@
      @@       The preferred execution provider to be used if the model instance
      @@       is deployed on GPU.
      @@
      @@       For ONNX Runtime backend, possible value is "tensorrt" as name,
      @@       and no parameters are required.
      @@
      @@       For TensorFlow backend, possible values are "tensorrt",
      @@       "auto_mixed_precision", "gpu_io".
      @@
      @@       For "tensorrt", the following parameters can be specified:
      @@         "precision_mode": The precision used for optimization.
      @@         Allowed values are "FP32" and "FP16". Default value is "FP32".
      @@
      @@         "max_cached_engines": The maximum number of cached TensorRT
      @@         engines in dynamic TensorRT ops. Default value is 100.
      @@
      @@         "minimum_segment_size": The smallest model subgraph that will
      @@         be considered for optimization by TensorRT. Default value is 3.
      @@
      @@         "max_workspace_size_bytes": The maximum GPU memory the model
      @@         can use temporarily during execution. Default value is 1GB.
      @@
      @@       For "auto_mixed_precision", no parameters are required. If set,
      @@       the model will try to use FP16 for better performance.
      @@       This optimization can not be set with "tensorrt".
      @@
      @@       For "gpu_io", no parameters are required. If set, the model will
      @@       be executed using TensorFlow Callable API to set input and output
      @@       tensors in GPU memory if possible, which can reduce data transfer
      @@       overhead if the model is used in ensemble. However, the Callable
      @@       object will be created on model creation and it will request all
      @@       outputs for every model execution, which may impact the
      @@       performance if a request does not require all outputs. This
      @@       optimization will only take affect if the model instance is
      @@       created with KIND_GPU.
      @@
       
      repeated .inference.ModelOptimizationPolicy.ExecutionAccelerators.Accelerator gpu_execution_accelerator = 1;
      Specified by:
      getGpuExecutionAcceleratorOrBuilder in interface ModelConfigOuterClass.ModelOptimizationPolicy.ExecutionAcceleratorsOrBuilder
    • getCpuExecutionAcceleratorList

      @@    .. cpp:var:: Accelerator cpu_execution_accelerator (repeated)
      @@
      @@       The preferred execution provider to be used if the model instance
      @@       is deployed on CPU.
      @@
      @@       For ONNX Runtime backend, possible value is "openvino" as name,
      @@       and no parameters are required.
      @@
       
      repeated .inference.ModelOptimizationPolicy.ExecutionAccelerators.Accelerator cpu_execution_accelerator = 2;
      Specified by:
      getCpuExecutionAcceleratorList in interface ModelConfigOuterClass.ModelOptimizationPolicy.ExecutionAcceleratorsOrBuilder
    • getCpuExecutionAcceleratorOrBuilderList

      public List<? extends ModelConfigOuterClass.ModelOptimizationPolicy.ExecutionAccelerators.AcceleratorOrBuilder> getCpuExecutionAcceleratorOrBuilderList()
      @@    .. cpp:var:: Accelerator cpu_execution_accelerator (repeated)
      @@
      @@       The preferred execution provider to be used if the model instance
      @@       is deployed on CPU.
      @@
      @@       For ONNX Runtime backend, possible value is "openvino" as name,
      @@       and no parameters are required.
      @@
       
      repeated .inference.ModelOptimizationPolicy.ExecutionAccelerators.Accelerator cpu_execution_accelerator = 2;
      Specified by:
      getCpuExecutionAcceleratorOrBuilderList in interface ModelConfigOuterClass.ModelOptimizationPolicy.ExecutionAcceleratorsOrBuilder
    • getCpuExecutionAcceleratorCount

      public int getCpuExecutionAcceleratorCount()
      @@    .. cpp:var:: Accelerator cpu_execution_accelerator (repeated)
      @@
      @@       The preferred execution provider to be used if the model instance
      @@       is deployed on CPU.
      @@
      @@       For ONNX Runtime backend, possible value is "openvino" as name,
      @@       and no parameters are required.
      @@
       
      repeated .inference.ModelOptimizationPolicy.ExecutionAccelerators.Accelerator cpu_execution_accelerator = 2;
      Specified by:
      getCpuExecutionAcceleratorCount in interface ModelConfigOuterClass.ModelOptimizationPolicy.ExecutionAcceleratorsOrBuilder
    • getCpuExecutionAccelerator

      @@    .. cpp:var:: Accelerator cpu_execution_accelerator (repeated)
      @@
      @@       The preferred execution provider to be used if the model instance
      @@       is deployed on CPU.
      @@
      @@       For ONNX Runtime backend, possible value is "openvino" as name,
      @@       and no parameters are required.
      @@
       
      repeated .inference.ModelOptimizationPolicy.ExecutionAccelerators.Accelerator cpu_execution_accelerator = 2;
      Specified by:
      getCpuExecutionAccelerator in interface ModelConfigOuterClass.ModelOptimizationPolicy.ExecutionAcceleratorsOrBuilder
    • getCpuExecutionAcceleratorOrBuilder

      @@    .. cpp:var:: Accelerator cpu_execution_accelerator (repeated)
      @@
      @@       The preferred execution provider to be used if the model instance
      @@       is deployed on CPU.
      @@
      @@       For ONNX Runtime backend, possible value is "openvino" as name,
      @@       and no parameters are required.
      @@
       
      repeated .inference.ModelOptimizationPolicy.ExecutionAccelerators.Accelerator cpu_execution_accelerator = 2;
      Specified by:
      getCpuExecutionAcceleratorOrBuilder in interface ModelConfigOuterClass.ModelOptimizationPolicy.ExecutionAcceleratorsOrBuilder
    • isInitialized

      public final boolean isInitialized()
      Specified by:
      isInitialized in interface com.google.protobuf.MessageLiteOrBuilder
      Overrides:
      isInitialized in class com.google.protobuf.GeneratedMessageV3
    • writeTo

      public void writeTo(com.google.protobuf.CodedOutputStream output) throws IOException
      Specified by:
      writeTo in interface com.google.protobuf.MessageLite
      Overrides:
      writeTo in class com.google.protobuf.GeneratedMessageV3
      Throws:
      IOException
    • getSerializedSize

      public int getSerializedSize()
      Specified by:
      getSerializedSize in interface com.google.protobuf.MessageLite
      Overrides:
      getSerializedSize in class com.google.protobuf.GeneratedMessageV3
    • equals

      public boolean equals(Object obj)
      Specified by:
      equals in interface com.google.protobuf.Message
      Overrides:
      equals in class com.google.protobuf.AbstractMessage
    • hashCode

      public int hashCode()
      Specified by:
      hashCode in interface com.google.protobuf.Message
      Overrides:
      hashCode in class com.google.protobuf.AbstractMessage
    • parseFrom

      public static ModelConfigOuterClass.ModelOptimizationPolicy.ExecutionAccelerators parseFrom(ByteBuffer data) throws com.google.protobuf.InvalidProtocolBufferException
      Throws:
      com.google.protobuf.InvalidProtocolBufferException
    • parseFrom

      public static ModelConfigOuterClass.ModelOptimizationPolicy.ExecutionAccelerators parseFrom(ByteBuffer data, com.google.protobuf.ExtensionRegistryLite extensionRegistry) throws com.google.protobuf.InvalidProtocolBufferException
      Throws:
      com.google.protobuf.InvalidProtocolBufferException
    • parseFrom

      public static ModelConfigOuterClass.ModelOptimizationPolicy.ExecutionAccelerators parseFrom(com.google.protobuf.ByteString data) throws com.google.protobuf.InvalidProtocolBufferException
      Throws:
      com.google.protobuf.InvalidProtocolBufferException
    • parseFrom

      public static ModelConfigOuterClass.ModelOptimizationPolicy.ExecutionAccelerators parseFrom(com.google.protobuf.ByteString data, com.google.protobuf.ExtensionRegistryLite extensionRegistry) throws com.google.protobuf.InvalidProtocolBufferException
      Throws:
      com.google.protobuf.InvalidProtocolBufferException
    • parseFrom

      public static ModelConfigOuterClass.ModelOptimizationPolicy.ExecutionAccelerators parseFrom(byte[] data) throws com.google.protobuf.InvalidProtocolBufferException
      Throws:
      com.google.protobuf.InvalidProtocolBufferException
    • parseFrom

      public static ModelConfigOuterClass.ModelOptimizationPolicy.ExecutionAccelerators parseFrom(byte[] data, com.google.protobuf.ExtensionRegistryLite extensionRegistry) throws com.google.protobuf.InvalidProtocolBufferException
      Throws:
      com.google.protobuf.InvalidProtocolBufferException
    • parseFrom

      Throws:
      IOException
    • parseFrom

      public static ModelConfigOuterClass.ModelOptimizationPolicy.ExecutionAccelerators parseFrom(InputStream input, com.google.protobuf.ExtensionRegistryLite extensionRegistry) throws IOException
      Throws:
      IOException
    • parseDelimitedFrom

      Throws:
      IOException
    • parseDelimitedFrom

      public static ModelConfigOuterClass.ModelOptimizationPolicy.ExecutionAccelerators parseDelimitedFrom(InputStream input, com.google.protobuf.ExtensionRegistryLite extensionRegistry) throws IOException
      Throws:
      IOException
    • parseFrom

      public static ModelConfigOuterClass.ModelOptimizationPolicy.ExecutionAccelerators parseFrom(com.google.protobuf.CodedInputStream input) throws IOException
      Throws:
      IOException
    • parseFrom

      public static ModelConfigOuterClass.ModelOptimizationPolicy.ExecutionAccelerators parseFrom(com.google.protobuf.CodedInputStream input, com.google.protobuf.ExtensionRegistryLite extensionRegistry) throws IOException
      Throws:
      IOException
    • newBuilderForType

      Specified by:
      newBuilderForType in interface com.google.protobuf.Message
      Specified by:
      newBuilderForType in interface com.google.protobuf.MessageLite
    • newBuilder

    • newBuilder

    • toBuilder

      Specified by:
      toBuilder in interface com.google.protobuf.Message
      Specified by:
      toBuilder in interface com.google.protobuf.MessageLite
    • newBuilderForType

      protected ModelConfigOuterClass.ModelOptimizationPolicy.ExecutionAccelerators.Builder newBuilderForType(com.google.protobuf.GeneratedMessageV3.BuilderParent parent)
      Specified by:
      newBuilderForType in class com.google.protobuf.GeneratedMessageV3
    • getDefaultInstance

    • parser

      public static com.google.protobuf.Parser<ModelConfigOuterClass.ModelOptimizationPolicy.ExecutionAccelerators> parser()
    • getParserForType

      public com.google.protobuf.Parser<ModelConfigOuterClass.ModelOptimizationPolicy.ExecutionAccelerators> getParserForType()
      Specified by:
      getParserForType in interface com.google.protobuf.Message
      Specified by:
      getParserForType in interface com.google.protobuf.MessageLite
      Overrides:
      getParserForType in class com.google.protobuf.GeneratedMessageV3
    • getDefaultInstanceForType

      Specified by:
      getDefaultInstanceForType in interface com.google.protobuf.MessageLiteOrBuilder
      Specified by:
      getDefaultInstanceForType in interface com.google.protobuf.MessageOrBuilder