Bark Inference Workflow

This workflow uses huggingface's transformers library (opens in a new tab) to perform inference on Suno's text-to-speech Bark (opens in a new tab) model.

Constructor Arguments

model_source (Optional[str]): The source of the model. This can be either suno/bark or suno/bark-small. Default is suno/bark.
default_voice_preset (Optional[str]): The default voice preset to be used. See list (opens in a new tab) of supported presets.

Additional Installations

Since this workflow uses some additional libraries, you'll need to install infernet-ml[bark_inference]. Alternatively, you can install those packages directly. The optional dependencies "[bark_inference]" are provided for your convenience.

To install via pip (opens in a new tab):

pip install infernet-ml[bark_inference]

Input Format

Input to the inference workflow is the following pydantic model:

class BarkWorkflowInput(BaseModel):
    # prompt to generate audio from
    prompt: str
    # voice to be used. There is a list of supported presets here:
    # here: https://github.com/suno-ai/bark?tab=readme-ov-file#-voice-presets
    voice_preset: Optional[str]

"prompt": The text prompt to generate audio from.
"voice_preset": The voice preset to be used. See list (opens in a new tab) of supported presets.

Output Format

The output of the inference workflow is a pydantic model with the following keys:

class AudioInferenceResult(BaseModel):
    audio_array: np.ndarray[Any, Any]

"audio_array": The audio array generated from the input prompt.

Example

In this example, we will use the Bark Inference Workflow to generate audio from a prompt. We will then write the generated audio to a wav file.

from scipy.io.wavfile import write as write_wav  # type: ignore
from infernet_ml.workflows.inference.bark_hf_inference_workflow import (
    BarkHFInferenceWorkflow,
    BarkWorkflowInput,
)
 
workflow = BarkHFInferenceWorkflow(model_source="suno/bark-small", default_voice_preset="v2/en_speaker_0")
 
workflow.setup()
 
input = BarkWorkflowInput(
    prompt="Hello, my name is Suno. I am a text-to-speech model.",
    voice_preset="v2/en_speaker_5"
)
 
inference_result = workflow.inference(input)
 
generated_audio_path = "output.wav"
 
# write output to a wav file
write_wav(
    generated_audio_path,
    BarkHFInferenceWorkflow.SAMPLE_RATE,
    inference_result.audio_array,
)

TGIClientInferenceWorkflow CSSInferenceWorkflow