Bark Inference Workflow
This workflow uses huggingface's transformers library (opens in a new tab) to perform inference on Suno's text-to-speech Bark (opens in a new tab) model.
Constructor Arguments
model_source (Optional[str])
: The source of the model. This can be eithersuno/bark
orsuno/bark-small
. Default issuno/bark
.default_voice_preset (Optional[str])
: The default voice preset to be used. See list (opens in a new tab) of supported presets.
Additional Installations
Since this workflow uses some additional libraries, you'll need to install infernet-ml[bark_inference]
. Alternatively,
you can install those packages directly. The optional dependencies "[bark_inference]"
are provided for your
convenience.
To install via pip (opens in a new tab):
pip install infernet-ml[bark_inference]
Input Format
Input to the inference workflow is the following pydantic model:
class BarkWorkflowInput(BaseModel):
# prompt to generate audio from
prompt: str
# voice to be used. There is a list of supported presets here:
# here: https://github.com/suno-ai/bark?tab=readme-ov-file#-voice-presets
voice_preset: Optional[str]
"prompt"
: The text prompt to generate audio from."voice_preset"
: The voice preset to be used. See list (opens in a new tab) of supported presets.
Output Format
The output of the inference workflow is a pydantic model with the following keys:
class AudioInferenceResult(BaseModel):
audio_array: np.ndarray[Any, Any]
"audio_array"
: The audio array generated from the input prompt.
Example
In this example, we will use the Bark Inference Workflow to generate audio from a prompt. We will then write the generated audio to a wav file.
from scipy.io.wavfile import write as write_wav # type: ignore
from infernet_ml.workflows.inference.bark_hf_inference_workflow import (
BarkHFInferenceWorkflow,
BarkWorkflowInput,
)
workflow = BarkHFInferenceWorkflow(model_source="suno/bark-small", default_voice_preset="v2/en_speaker_0")
workflow.setup()
input = BarkWorkflowInput(
prompt="Hello, my name is Suno. I am a text-to-speech model.",
voice_preset="v2/en_speaker_5"
)
inference_result = workflow.inference(input)
generated_audio_path = "output.wav"
# write output to a wav file
write_wav(
generated_audio_path,
BarkHFInferenceWorkflow.SAMPLE_RATE,
inference_result.audio_array,
)