TGI Client Inference Workflow

This workflow makes use of Huggingface's TGI client to communicate with a TGI server.

Constructor Arguments

server_url (str): Full URL to the TGI server.
timeout (Optional[int]): Timeout in seconds for the request. Default is 30 seconds.
**inference_params: dict[str, Any]: Any extra kwargs to be passed to the generate() (opens in a new tab) method of the TGI client. These will be passed with every inference request.

Additional Installations

Since this workflow uses some additional libraries, you'll need to install infernet-ml[tgi_inference]. Alternatively, you can install those packages directly. The optional dependencies "[tgi_inference]" are provided for your convenience.

To install via pip (opens in a new tab):

pip install infernet-ml[tgi_inference]

Input Format

Input format is the following dictionary:

{
    "text": str
}

text (str): Prompt to be used for text generation.

Example

from infernet_ml.workflows.inference.tgi_client_inference_workflow import (
    TGIClientInferenceWorkflow,
)
 
workflow = TGIClientInferenceWorkflow(
    server_url="http://your-server-url",
)
workflow.setup()
 
results = workflow.inference({"text": "Can shrimp actually fry rice fr?"})
print(f"results: {results}")

Additional Configurations

You can configure the TGI client by setting the following environment variables:

TGI_REQUEST_TRIES: Number of times to retry a request if it fails. Default is 3.
TGI_REQUEST_DELAY: Initial delay between retries in seconds. Default is 3.
TGI_REQUEST_MAX_DELAY: Maximum delay between retries in seconds. Default is None.
TGI_REQUEST_BACKOFF: This parameter controls the factor by which the delay should increase after each retry.
TGI_REQUEST_JITTER: Jitter introduces randomness to the delay periods between retries. Instead of waiting for a fixed amount of time. Default is (0.5, 1.5).

TorchInferenceWorkflow BarkHFInferenceWorkflow