Quick project: filtering out corrupted gifs [python]

Hi guys! So here is another “quick project” article, this time about filtering valid vs corrupted gif images from a given folder. The context is simple: I've been downloading a collection of gifs as a torrent, but unfortunately that download is stuck at 92% so I don't think I will ever get the full collection. Still some of the files in there are already completely downloaded, so technically I should be able to try to read each of those gif files, and then only keep the valid ones 😉

As usual now, I'm going to build this as a new python utility in my NervProj framework, let's get started!

  • Here is the initial minimal component class I created:
    """Module for Coin class definition"""
    import logging
    from PIL import Image
    import PIL
    from nvp.nvp_context import NVPContext
    from nvp.nvp_component import NVPComponent
    logger = logging.getLogger(__name__)
    class GifHandler(NVPComponent):
        """Coin component class"""
        def __init__(self, ctx: NVPContext):
            """class constructor"""
            NVPComponent.__init__(self, ctx)
        def process_command(self, cmd):
            """Check if this component can process the given command"""
            if cmd == 'filter-valid':
                return self.filter_valid_files()
            return False
        def filter_valid_files(self):
            """Filter the valid gif files from a given folder"""
            # Should perform the filtering here.
            return True
    if __name__ == "__main__":
        # Create the context:
        context = NVPContext()
        # Add our component:
        comp = context.register_component("gifhandler", GifHandler(context))
        context.define_subparsers("main", {
            'filter-valid': None,
        psr = context.get_parser('main.filter-valid')
        psr.add_argument("--output", dest="output_dir", type=str,
                         help="Output dir where to store the valid files")
        psr.add_argument("--input", dest="input_dir", type=str,
                         help="Input dir where to start the filtering")
  • Then of course I defined a new script, and added a new dedicated python env for this kind of “media handling” tools:
      "custom_python_envs": {
        "defi_env": {
          "packages": ["requests", "jstyleson", "xxhash", "numpy", "psycopg2"]
        "media_env": {
          "packages": [
      "scripts": {
        // Update the coingecko prices
        "coingecko": {
          "custom_python_env": "defi_env",
          "cmd": "${PYTHON} nvh/defi/",
          "cwd": "${PROJECT_ROOT_DIR}",
          "python_path": ["${PROJECT_ROOT_DIR}", "${NVP_ROOT_DIR}"]
        "gifs": {
          "custom_python_env": "media_env",
          "cmd": "${PYTHON} ${PROJECT_ROOT_DIR}/nvh/media/",
          "python_path": ["${PROJECT_ROOT_DIR}", "${NVP_ROOT_DIR}"]
Note that I'm not changing the CWD for the “gifs” script above: I want to be able to run that script from inside the input folder I need to process actually.
  • Final step was to implement the filter_valid_files method correctly, which was in fact pretty straightforward:
        def filter_valid_files(self):
            """Filter the valid gif files from a given folder"""
            input_dir = self.get_param("input_dir")
            if input_dir is None:
                # Use the current working dir:
                input_dir = self.get_cwd()
            output_dir = self.get_param("output_dir")
            if output_dir is None:
                # We use the parent folder of the input:
                folder = self.get_filename(input_dir)
                parent_dir = self.get_parent_folder(input_dir)
                output_dir = self.get_path(parent_dir, f"{folder}_filtered")
            #"Should filter the image from %s into %s", input_dir, output_dir)
            # list all the gif files recursively
            all_files = self.get_all_files(input_dir, exp="\.gif", recursive=True)
            num_imgs = len(all_files)
  "Collected %d gif files", num_imgs)
            # Create the destination dir:
            valid_count = 0
            # Iterate on each file:
            for i in range(num_imgs):
                fname = all_files[i]
                src_file = self.get_path(input_dir, fname)
                # Try to open that file:
                    img =
                    # Should have more that 1 frame:
                    nframes = getattr(img, 'n_frames', 1)
                    if nframes <= 1:
              "%d/%d: Not enough frames in %s", i+1, num_imgs, fname)
                    valid_count += 1
                    # Should move the image here.
                    dest_file = self.get_path(output_dir, fname)
                    self.rename_file(src_file, dest_file, True)
                except (PIL.UnidentifiedImageError, PIL.Image.DecompressionBombError):
          "%d/%d: Cannot open file %s", i+1, num_imgs, fname)
  "Filtered %d valid images (ie %.3f%%)", valid_count, valid_count*100.0/num_imgs)
            return True
  • ⇒ And this is it already: I just need to move in the root folder with all the gif images, and I run the command:
    nvp run gifs filter-valid
  • This will produce a sibling folder with the suffix “_filtered” containing all the valid gif files, while the invalid files will remain in the original folder 👍!
I didn't even bother using the command line arguments '–input' and '–output' I defined above: default behavior is OK for my usage
  • ⇒ So for once, this was really a quick project, good good 😂
  • blog/2022/0510_nervproj_filtering_corrupted_gifs.txt
  • Last modified: 2022/05/10 20:02
  • (external edit)