====== NervProj: Setting up support for OpenAI whisper ======
* We start with the github repository: git@github.com:roche-emmanuel/whisper.git
* We need to prepare a python environment for whisper: whisper_env:
inherit: default_env
packages:
- numba
- numpy
- torch
- tqdm
- more-itertools
- tiktoken==0.3.3
* Then we prepare the environment: nvp pyenv setup whisper_env
* => So far so good 👍!
* Hmm, actually we probably don't even need this and we can simply use the package **openai-whisper** instead.
* Just written the whisper_gen component to handle the convertion: class WhisperGen(NVPComponent):
"""WhisperGen component class"""
def __init__(self, ctx: NVPContext):
"""Component constructor"""
NVPComponent.__init__(self, ctx)
self.config = ctx.get_config()["movie_handler"]
def process_cmd_path(self, cmd):
"""Re-implementation of process_cmd_path"""
if cmd == "convert":
file = self.get_param("input_file")
model = self.get_param("model")
return self.translate_audio(file, model)
return False
def translate_audio(self, file, model):
"""Translate an audio file to text"""
logger.info("Should translate audio file to text: %s", file)
tools: ToolsManager = self.get_component("tools")
ffmpeg_path = tools.get_tool_path("ffmpeg")
ffmpeg_dir = self.get_parent_folder(ffmpeg_path)
# sys.path.append(ffmpeg_dir)
logger.info("Adding path to ffmpeg: %s", ffmpeg_dir)
self.append_env_list([ffmpeg_dir], os.environ)
# Check that cuda is available:
self.check(torch.cuda.is_available(), "Torch CUDA backend is not available ?")
model = whisper.load_model(model)
result = model.transcribe(file)
txt = result["text"]
self.write_text_file(txt, file + ".txt")
logger.info("Done")
logger.info("Generated output: %s", txt)
return True
* => And this works just fine!
* **Note**: models are saved in **C:\Users\ultim\.cache\whisper** => will make a copy of those just in case ;-).
* Updating python env to get CUDA support: whisper_env:
inherit: default_env
packages:
- openai-whisper
- --extra-index-url https://download.pytorch.org/whl/cu117
- torch
- torchvision
- torchaudio
* When converting the audio on the CPU we get for instance the duration: INFO: Done converting auto to text in 161.48 secs
* Performing the same convertion on the GPU we now get: INFO: Done converting auto to text in 33.66 secs
* The command to transcribe a recording is thus: nvp audio2text -i nervproj_0006_whisper_integration.x265.mkv
This also works with video files as input ;-)!