public:projects:nervproj:0001_whisper_gen [NervTech's Wiki]

This page is read only. You can view the source, but not change it. Ask your administrator if you think this is wrong.
====== NervProj: Setting up support for OpenAI whisper ======

  * We start with the github repository: git@github.com:roche-emmanuel/whisper.git
  * We need to prepare a python environment for whisper: <sxh yaml; highlight: []>  whisper_env:
    inherit: default_env
    packages:
      - numba
      - numpy
      - torch
      - tqdm
      - more-itertools
      - tiktoken==0.3.3</sxh>
  * Then we prepare the environment: <sxh bash; highlight: []>nvp pyenv setup whisper_env</sxh>
    * => So far so good 👍!
  * Hmm, actually we probably don't even need this and we can simply use the package **openai-whisper** instead.
  * Just written the whisper_gen component to handle the convertion: <sxh python; highlight: []>class WhisperGen(NVPComponent):
    """WhisperGen component class"""

    def __init__(self, ctx: NVPContext):
        """Component constructor"""
        NVPComponent.__init__(self, ctx)

        self.config = ctx.get_config()["movie_handler"]

    def process_cmd_path(self, cmd):
        """Re-implementation of process_cmd_path"""

        if cmd == "convert":
            file = self.get_param("input_file")
            model = self.get_param("model")
            return self.translate_audio(file, model)

        return False

    def translate_audio(self, file, model):
        """Translate an audio file to text"""
        logger.info("Should translate audio file to text: %s", file)

        tools: ToolsManager = self.get_component("tools")
        ffmpeg_path = tools.get_tool_path("ffmpeg")
        ffmpeg_dir = self.get_parent_folder(ffmpeg_path)
        # sys.path.append(ffmpeg_dir)
        logger.info("Adding path to ffmpeg: %s", ffmpeg_dir)
        self.append_env_list([ffmpeg_dir], os.environ)

        # Check that cuda is available:
        self.check(torch.cuda.is_available(), "Torch CUDA backend is not available ?")

        model = whisper.load_model(model)

        result = model.transcribe(file)
        txt = result["text"]
        self.write_text_file(txt, file + ".txt")
        logger.info("Done")
        logger.info("Generated output: %s", txt)
        return True</sxh>

  * => And this works just fine!

  * **Note**: models are saved in **C:\Users\ultim\.cache\whisper** => will make a copy of those just in case ;-).

  * Updating python env to get CUDA support: <sxh yaml; highlight: []>  whisper_env:
    inherit: default_env
    packages:
      - openai-whisper
      - --extra-index-url https://download.pytorch.org/whl/cu117
      - torch
      - torchvision
      - torchaudio</sxh>

  * When converting the audio on the CPU we get for instance the duration: <code>INFO: Done converting auto to text in 161.48 secs</code>
  * Performing the same convertion on the GPU we now get: <code>INFO: Done converting auto to text in 33.66 secs</code>

  * The command to transcribe a recording is thus: <sxh bash; highlight: []>nvp audio2text -i nervproj_0006_whisper_integration.x265.mkv</sxh>

<note>This also works with video files as input ;-)!</note>