Show pageOld revisionsBacklinksBack to top This page is read only. You can view the source, but not change it. Ask your administrator if you think this is wrong. ====== NervProj: Setting up support for OpenAI whisper ====== * We start with the github repository: git@github.com:roche-emmanuel/whisper.git * We need to prepare a python environment for whisper: <sxh yaml; highlight: []> whisper_env: inherit: default_env packages: - numba - numpy - torch - tqdm - more-itertools - tiktoken==0.3.3</sxh> * Then we prepare the environment: <sxh bash; highlight: []>nvp pyenv setup whisper_env</sxh> * => So far so good 👍! * Hmm, actually we probably don't even need this and we can simply use the package **openai-whisper** instead. * Just written the whisper_gen component to handle the convertion: <sxh python; highlight: []>class WhisperGen(NVPComponent): """WhisperGen component class""" def __init__(self, ctx: NVPContext): """Component constructor""" NVPComponent.__init__(self, ctx) self.config = ctx.get_config()["movie_handler"] def process_cmd_path(self, cmd): """Re-implementation of process_cmd_path""" if cmd == "convert": file = self.get_param("input_file") model = self.get_param("model") return self.translate_audio(file, model) return False def translate_audio(self, file, model): """Translate an audio file to text""" logger.info("Should translate audio file to text: %s", file) tools: ToolsManager = self.get_component("tools") ffmpeg_path = tools.get_tool_path("ffmpeg") ffmpeg_dir = self.get_parent_folder(ffmpeg_path) # sys.path.append(ffmpeg_dir) logger.info("Adding path to ffmpeg: %s", ffmpeg_dir) self.append_env_list([ffmpeg_dir], os.environ) # Check that cuda is available: self.check(torch.cuda.is_available(), "Torch CUDA backend is not available ?") model = whisper.load_model(model) result = model.transcribe(file) txt = result["text"] self.write_text_file(txt, file + ".txt") logger.info("Done") logger.info("Generated output: %s", txt) return True</sxh> * => And this works just fine! * **Note**: models are saved in **C:\Users\ultim\.cache\whisper** => will make a copy of those just in case ;-). * Updating python env to get CUDA support: <sxh yaml; highlight: []> whisper_env: inherit: default_env packages: - openai-whisper - --extra-index-url https://download.pytorch.org/whl/cu117 - torch - torchvision - torchaudio</sxh> * When converting the audio on the CPU we get for instance the duration: <code>INFO: Done converting auto to text in 161.48 secs</code> * Performing the same convertion on the GPU we now get: <code>INFO: Done converting auto to text in 33.66 secs</code> * The command to transcribe a recording is thus: <sxh bash; highlight: []>nvp audio2text -i nervproj_0006_whisper_integration.x265.mkv</sxh> <note>This also works with video files as input ;-)!</note> public/projects/nervproj/0001_whisper_gen.txt Last modified: 2023/06/21 20:00by 127.0.0.1