public:projects:nervproj:0001_whisper_gen

NervProj: Setting up support for OpenAI whisper

  • We start with the github repository: git@github.com:roche-emmanuel/whisper.git
  • We need to prepare a python environment for whisper:
      whisper_env:
        inherit: default_env
        packages:
          - numba
          - numpy
          - torch
          - tqdm
          - more-itertools
          - tiktoken==0.3.3
  • Then we prepare the environment:
    nvp pyenv setup whisper_env
    • β‡’ So far so good πŸ‘!
  • Hmm, actually we probably don't even need this and we can simply use the package openai-whisper instead.
  • Just written the whisper_gen component to handle the convertion:
    class WhisperGen(NVPComponent):
        """WhisperGen component class"""
    
        def __init__(self, ctx: NVPContext):
            """Component constructor"""
            NVPComponent.__init__(self, ctx)
    
            self.config = ctx.get_config()["movie_handler"]
    
        def process_cmd_path(self, cmd):
            """Re-implementation of process_cmd_path"""
    
            if cmd == "convert":
                file = self.get_param("input_file")
                model = self.get_param("model")
                return self.translate_audio(file, model)
    
            return False
    
        def translate_audio(self, file, model):
            """Translate an audio file to text"""
            logger.info("Should translate audio file to text: %s", file)
    
            tools: ToolsManager = self.get_component("tools")
            ffmpeg_path = tools.get_tool_path("ffmpeg")
            ffmpeg_dir = self.get_parent_folder(ffmpeg_path)
            # sys.path.append(ffmpeg_dir)
            logger.info("Adding path to ffmpeg: %s", ffmpeg_dir)
            self.append_env_list([ffmpeg_dir], os.environ)
    
            # Check that cuda is available:
            self.check(torch.cuda.is_available(), "Torch CUDA backend is not available ?")
    
            model = whisper.load_model(model)
    
            result = model.transcribe(file)
            txt = result["text"]
            self.write_text_file(txt, file + ".txt")
            logger.info("Done")
            logger.info("Generated output: %s", txt)
            return True
  • β‡’ And this works just fine!
  • Note: models are saved in C:\Users\ultim\.cache\whisper β‡’ will make a copy of those just in case ;-).
  • Updating python env to get CUDA support:
      whisper_env:
        inherit: default_env
        packages:
          - openai-whisper
          - --extra-index-url https://download.pytorch.org/whl/cu117
          - torch
          - torchvision
          - torchaudio
  • When converting the audio on the CPU we get for instance the duration:
    INFO: Done converting auto to text in 161.48 secs
  • Performing the same convertion on the GPU we now get:
    INFO: Done converting auto to text in 33.66 secs
  • The command to transcribe a recording is thus:
    nvp audio2text -i nervproj_0006_whisper_integration.x265.mkv
This also works with video files as input ;-)!
  • public/projects/nervproj/0001_whisper_gen.txt
  • Last modified: 2023/06/21 20:00
  • by 127.0.0.1