TGI Client Inference Workflow
This workflow makes use of Huggingface's TGI client to communicate with a TGI server.
Constructor Arguments
server_url (str)
: Full URL to the TGI server.timeout (Optional[int])
: Timeout in seconds for the request. Default is 30 seconds.**inference_params: dict[str, Any]
: Any extra kwargs to be passed to thegenerate()
(opens in a new tab) method of the TGI client. These will be passed with every inference request.
Additional Installations
Since this workflow uses some additional libraries, you'll need to install infernet-ml[tgi_inference]
. Alternatively,
you can install those packages directly. The optional dependencies "[tgi_inference]"
are provided for your
convenience.
To install via pip (opens in a new tab):
pip install infernet-ml[tgi_inference]
Input Format
Input format is the following dictionary:
{
"text": str
}
text (str)
: Prompt to be used for text generation.
Example
from infernet_ml.workflows.inference.tgi_client_inference_workflow import (
TGIClientInferenceWorkflow,
)
workflow = TGIClientInferenceWorkflow(
server_url="http://your-server-url",
)
workflow.setup()
results = workflow.inference({"text": "Can shrimp actually fry rice fr?"})
print(f"results: {results}")
Additional Configurations
You can configure the TGI client by setting the following environment variables:
TGI_REQUEST_TRIES
: Number of times to retry a request if it fails. Default is 3.TGI_REQUEST_DELAY
: Initial delay between retries in seconds. Default is 3.TGI_REQUEST_MAX_DELAY
: Maximum delay between retries in seconds. Default is None.TGI_REQUEST_BACKOFF
: This parameter controls the factor by which the delay should increase after each retry.TGI_REQUEST_JITTER
: Jitter introduces randomness to the delay periods between retries. Instead of waiting for a fixed amount of time. Default is (0.5, 1.5).