blog:2022:0510_nervproj_crypto_coins_monitoring

NervProj: Restoring my crypto coins monitoring project

Okay so, I have a coin monitoring system I built some time ago: it's in python and it will retrive price data for many coins on a regular basis, and store that data into a postgresql database. It was producing some errors or conflicts or corruptions sometimes but on the whole it was working reasonably well. Until I decided to get ride of my fully corrupted RAID array, and thus mess up completely my main server structure lol.

Yet that tool was actually very handy (I then created a GUI on top of it to get some visuals on the coin price actions), so I think it's now high time I try and put this back on rails!

  • First I need to clarify my current status on this point, because, as usual, I have zero idea were I'm at in this mess 😅.
  • Okay, so, before, I had this cron script running every minute:
    * * * * * su kenshin -c "bash /home/kenshin/scripts/cron/every_1min_operations.sh"
  • Let's check the content of that… So, basically that script is just executing another script to collect highres data on coingecko:
    defi_gecko --updateHighResDatasets >>$lfile 2>&1
  • Let's find the defi_gecko command. [searching…] Here it is:
    defi_gecko()
    {
      cd `defi_dir`
      nv_py_setup_paths
      nv_call_python gecko.py "$@"
      cd - > /dev/null
    }
  • Now to the gecko.py file: okay, this mostly just a launcher for Coingecko class which will do all the heavy lifting. So the Coingecko class is the final class I should re-integrate in NervProj.
  • And in fact, this is getting me thinking a little: I could keep adding “components” into a single “NVPContext” in a given python environment,
  • But this will progressively make the application larger and larger, with more dependency packages in a single env, etc.
  • ⇒ So what I want instead is to be able to “split” this (future giant) collection of components into simpler apps in dedicated python envs.
  • In this case for instance, I need this script to work:
      "custom_python_envs": {
        "defi_env": {
          "packages": ["requests"]
        }
      },
      "scripts": {
        // Update the coingecko prices
        "defi_gecko_update": {
          "custom_python_env": "defi_env",
          "cmd": "${PYTHON} nvh/defi/coingecko.py --updateHighResDatasets",
          "cwd": "${PROJECT_ROOT_DIR}",
          "python_path": ["${PROJECT_ROOT_DIR}"]
        }
      }
  • But the launcher I'm going to build in the Coingecko file should still use an NVPContext: except that this will be in a different python env than the main NVPContext running the initial “nvp run defi_gecko_update” command, and it should only load the components we really need in there.
  • Let's get coding and see where we go with that… 😎
  • I thus started a minimal coingecko script:
    """Module for Coingecko class definition"""
    import logging
    
    logger = logging.getLogger(__name__)
    
    if __name__ == "__main__":
        logger.info("This is the main script for coingecko")
    
  • But then I get an error when trying to run that command:
    $ nvp run defi_gecko_update
    Traceback (most recent call last):
      File "D:\Projects\NervProj\cli.py", line 5, in <module>
        ctx.run()
      File "D:\Projects\NervProj\nvp\nvp_context.py", line 315, in run
        if comp.process_command(cmd):
      File "D:\Projects\NervProj\nvp\components\runner.py", line 39, in process_command
        self.run_script(sname, proj)
      File "D:\Projects\NervProj\nvp\components\runner.py", line 120, in run_script
        self.execute(cmd, cwd=cwd, env=env)
      File "D:\Projects\NervProj\nvp\nvp_object.py", line 383, in execute
        subprocess.check_call(cmd, stdout=stdout, stderr=stderr, cwd=cwd, env=env)
      File "D:\Projects\NervProj\tools\windows\python-3.10.1\lib\subprocess.py", line 364, in check_call
        retcode = call(*popenargs, **kwargs)
      File "D:\Projects\NervProj\tools\windows\python-3.10.1\lib\subprocess.py", line 345, in call
        with Popen(*popenargs, **kwargs) as p:
      File "D:\Projects\NervProj\tools\windows\python-3.10.1\lib\subprocess.py", line 966, in __init__
        self._execute_child(args, executable, preexec_fn, close_fds,
      File "D:\Projects\NervProj\tools\windows\python-3.10.1\lib\subprocess.py", line 1435, in _execute_child
        hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
    FileNotFoundError: [WinError 2] Le fichier spécifié est introuvable
    
  • ⇒ Actually that make sense since I have not created the defi_env environment yet! 😅 ⇒ Let's just try to automate that in our runner component (ie. create the pyenv if it doesn't exist yet)
  • OK! better now: script execution is OK, but I don't get anything displayed, because I did not setup logging correctly yet, let's continue.
  • So I made some progress on my coingecko class which currently looks like this:
    """Module for Coingecko class definition"""
    
    import logging
    from nvp.nvp_context import NVPContext
    from nvp.nvp_component import NVPComponent
    
    logger = logging.getLogger(__name__)
    
    
    class Coingecko(NVPComponent):
        """Coingecko component class"""
    
        def __init__(self, ctx: NVPContext):
            """Script runner constructor"""
            NVPComponent.__init__(self, ctx)
    
            # override the config entry with our dedicated config read
            # from the current base dir:
            self.config = self.config["coingecko"]
    
        def process_command(self, cmd):
            """Check if this component can process the given command"""
    
            if cmd == 'update-highres':
                self.update_highres_datasets()
                return True
    
            return False
    
        # def get_price_db(self, user=None, password=None):
        #     if self.priceDb is None:
        #         db_name = self.config['price_db_name']
    
        #         # logDEBUG2("Loading PriceDB from %s" % dbFile)
        #         if user is None:
        #             user = self.config['price_db_user']
        #         if password is None:
        #             password = self.config['price_db_password']
    
        #         sqldb = PostgreSQLDB(dbName=db_name, user=user, password=password)
        #         self.priceDb = PriceDB(sqldb)
    
        #     # Now return that object:
        #     return self.priceDb
    
        def get_monitored_coins(self):
            """Retrieve the full list of monitored coins"""
            return self.config["monitored_coins"]
    
        def update_highres_datasets(self, cids=None, period=60):
            """High frequency dataset update of the selected coins"""
            if cids is None:
                cids = self.get_monitored_coins()
    
            logger.info("Should update the monitored coins: %s", cids)
    
            # start_time = self.get_time()
    
            # pdb = self.get_price_db()
    
            # last_datas = {}
    
            # # check for too long gaps in the datasets:
            # now_stamp = self.get_timestamp()
    
    
    if __name__ == "__main__":
        # Create the context:
        context = NVPContext()
    
        # Add our component:
        comp = context.register_component("coingecko", Coingecko(context))
    
        context.define_subparsers("main", {'update-highres': None})
    
        # psr = context.get_parser('main')
        # psr.add_argument("--updateHighResDatasets", dest="update_highres", action="store_true",
        #                  help="Update the highres data from coingecko")
    
        comp.run()
    
  • The next step is to add support for the postgresql_db class, but this requires dependency on the psycopg2
  • Yet, of course, I dont want to keep adding more dependencies in my main NVP python environment only for those subprojects, so I definitely need a dedicated python env for this tool
  • But I also want to get the intellisense support from Visual Studio Code so I need to setup a local python env for the NervHome project:
    $ nvp -p nvh admin init -p
      $ nvp home nvh
      # => Update the tools/requirements.txt file here.
      $ ./cli.sh --install-py-reqs
  • OK, so after some good refactoring of the previous version of my coingecko class, I now finally have a new component for it integrated in my NervProj framework [but in a dedicated python env as described above!] yeepeee 🥳!
  • And retrieving the “highres” prices will work just fine (after a good debugging session of course):
    $ nvp run defi_gecko_update
    2022/05/07 17:52:29 [nvh.core.postgresql_db] INFO: Opening connection to crypto_prices_v2
    2022/05/07 17:52:33 [__main__] INFO: Writing 8 new price entries
    2022/05/07 17:52:33 [__main__] INFO: Done updating highres data for bitcoin
    2022/05/07 17:52:37 [__main__] INFO: Writing 8 new price entries
    2022/05/07 17:52:37 [__main__] INFO: Done updating highres data for ethereum
    2022/05/07 17:52:41 [__main__] INFO: Writing 8 new price entries
    2022/05/07 17:52:41 [__main__] INFO: Done updating highres data for litecoin
    2022/05/07 17:52:45 [__main__] INFO: Writing 8 new price entries
    2022/05/07 17:52:45 [__main__] INFO: Done updating highres data for binancecoin
    2022/05/07 17:52:48 [__main__] INFO: Writing 9 new price entries
    2022/05/07 17:52:48 [__main__] INFO: Done updating highres data for elrond-erd-2
    2022/05/07 17:52:50 [__main__] INFO: Last written timestamp for bitcoin: 1651941900
    2022/05/07 17:52:50 [__main__] INFO: Writing 1 new price observations for bitcoin
    2022/05/07 17:52:50 [__main__] INFO: Last written timestamp for ethereum: 1651941900
    2022/05/07 17:52:50 [__main__] INFO: Writing 1 new price observations for ethereum
    2022/05/07 17:52:50 [__main__] INFO: Last written timestamp for litecoin: 1651941900
    2022/05/07 17:52:50 [__main__] INFO: Writing 1 new price observations for litecoin
    2022/05/07 17:52:50 [__main__] INFO: Last written timestamp for binancecoin: 1651941900
    2022/05/07 17:52:50 [__main__] INFO: Writing 1 new price observations for binancecoin
    2022/05/07 17:52:50 [__main__] INFO: Last written timestamp for elrond-erd-2: 1651942200
    2022/05/07 17:52:50 [__main__] INFO: Updated all high res datasets in 20.780 seconds
  • Yet I find those final “Writing 1 new price observations for” lines unexpected here… maybe I should try to look a bit more carefully into this 🤔, you know, just in case…
  • OOhhh, OK, now I kind of remember how this works: the first part with the “Writing 9 new price entries” comesfrom the retrieval of “market range” data in case our last entry is too far away in the past. And then the single added observation is from the just “retrieved price” which may or may not lead to a new entry. I think it sort of make sense in the end 😁.
  • ⇒ So all I have left to do now is to activate running that script every 5mins (yeah: initially I was considering running that every 1min, but for now I don't really think I need to be that precise. And later if I really need it, its only a default parameter value to change in the coingecko module function:
    def update_highres_datasets(self, cids=None, period=300):
  • Created a some script every_5mins_operations.sh for this:
    source /mnt/data1/dev/projects/NervProj/cli.sh
    
    log_dir="/mnt/data1/admin/logs"
    
    lfile="${log_dir}/coingecko_highres_updates.log"
    nvp run defi_gecko_update 2>&1 | tee -a $lfile
    
  • And now deploying the support on my server machine:
    $ nvp git pull
    $ nvp -p nvh git pull
    $ nvp pyenv setup defi_env
    $ nvp run defi_gecko_update
  • Finally installing the cron script with the entry:
    */5 * * * * su kenshin -c "bash /home/kenshin/cron/every_5mins_operations.sh"
  • ⇒ Now just checking our new log file coingecko_highres_updates.log ⇒ All good in there 👍!
  • Next logical step now is to also restore the support for the history data retrieval part: basically same thing as the highres data but only once per day… or is it per hour ? 🤔 I don't quite remember… [arrff, well, actually it's a bit of both: day period for too old data, and otherwise hourly data for recent enough timestamps (ie. less than 3 years ago)]
  • And here there is a new small issue which I need to address: the storage of the coins from coingecko, and the retrieval of their start timestamp:
            # By default our start_stamp should be the date of release of the coin:
            start_stamp = self.getCoin(cid).getStartTimestamp()
  • ⇒ So I basically also had to restore my previous CoinDB and Coin classes.
  • And now let's create a new script to call the 'update-history' command:
        "gecko_update_history": {
          "custom_python_env": "defi_env",
          "cmd": "${PYTHON} nvh/defi/coingecko.py update-history",
          "cwd": "${PROJECT_ROOT_DIR}",
          "python_path": ["${PROJECT_ROOT_DIR}", "${NVP_ROOT_DIR}"]
        }
  • And time to check the results:
    $ nvp run gecko_update_history
    2022/05/08 15:27:47 [__main__] INFO: Default start timestamp for bitcoin is: 1367107200
    2022/05/08 15:27:47 [__main__] INFO: Creating history price table for bitcoin
    2022/05/08 15:27:51 [__main__] INFO: Writing 89 new price entries
    2022/05/08 15:27:56 [__main__] INFO: Writing 89 new price entries
    2022/05/08 15:28:01 [__main__] INFO: Writing 89 new price entries
    2022/05/08 15:28:06 [__main__] INFO: Writing 89 new price entries
    2022/05/08 15:28:12 [__main__] INFO: Writing 89 new price entries
    2022/05/08 15:28:16 [__main__] INFO: Writing 89 new price entries
    2022/05/08 15:28:21 [__main__] INFO: Writing 89 new price entries
    2022/05/08 15:28:26 [__main__] INFO: Writing 89 new price entries
    
  • Not bad! 🤪😎👍!
  • Next, as before, I should call this additional function once per day, let's add that in daily_operations.sh:
    lfile="${log_dir}/coingecko_history_updates.log"
    nvp run gecko_update_history 2>&1 | tee -a $lfile
    
  • Continuing on this project, I need to handle my list of “monitored coins” properly: this is currently stored in a PostgreSQL table, and I'm only using a small “fixed list” read from config file in this refreshed version of my coingecko module.
  • I would like to restart that table and add the coins I want incrementally to keep things under control.
  • For this I need to support some additional command line arguments to:
    • list the current content of the table
    • Remove elements from that table (drop the full table ?)
    • Add elements one by one to that list.
    • Remove a single element from that list.
  • ⇒ but obviously I'm not going to create a dedicated script for each of those commands. Instead I should find a way to pass arbitrary arguments to my scripts… let's see how this can be done…
  • Okay, so that was easy in the end, now just parsing the command line arguments with support for unknown args in the NVPContext:
        def parse_args(self):
            """Parse the command line arguments"""
            # self.settings = vars(self.parsers['main'].parse_args())
            # cf. https://docs.python.org/3.4/library/argparse.html#partial-parsing
            self.settings, self.additional_args = self.parsers['main'].parse_known_args()
            self.settings = vars(self.settings)
  • And then appending those args to the script command line itself in our runner component:
            # Check if we have additional args to pass to the command:
            args = self.ctx.get_additional_args()
            if len(args) > 0:
                cmd += args
    
            # Execute that command:
            logger.info("Executing script command: %s (cwd=%s)", cmd, cwd)
            self.execute(cmd, cwd=cwd, env=env)
  • I then added a generic script defined this way:
        "coingecko": {
          "custom_python_env": "defi_env",
          "cmd": "${PYTHON} nvh/defi/coingecko.py",
          "cwd": "${PROJECT_ROOT_DIR}",
          "python_path": ["${PROJECT_ROOT_DIR}", "${NVP_ROOT_DIR}"]
        }
  • And this seems to be working just fine:
    $ nvp run coingecko update-history
    2022/05/08 19:07:07 [nvp.components.runner] INFO: Executing script command: ['D:\\Projects\\NervProj\\.pyenvs\\defi_env\\python.exe',
    'nvh/defi/coingecko.py', 'update-history'] (cwd=D:\Projects\NervHome)
    2022/05/08 19:07:09 [__main__] INFO: Default start timestamp for bitcoin is: 1367107200
    2022/05/08 19:07:13 [__main__] INFO: Writing 4 new price entries
    2022/05/08 19:07:13 [__main__] INFO: Done updating history data for bitcoin
    2022/05/08 19:07:13 [__main__] INFO: Default start timestamp for ethereum is: 1438905600
    2022/05/08 19:07:17 [__main__] INFO: Writing 4 new price entries
    2022/05/08 19:07:17 [__main__] INFO: Done updating history data for ethereum
    2022/05/08 19:07:17 [__main__] INFO: Default start timestamp for litecoin is: 1367107200
    2022/05/08 19:07:21 [__main__] INFO: Writing 4 new price entries
    2022/05/08 19:07:21 [__main__] INFO: Done updating history data for litecoin
    2022/05/08 19:07:21 [__main__] INFO: Default start timestamp for binancecoin is: 1505520000
    2022/05/08 19:07:26 [__main__] INFO: Writing 4 new price entries
    2022/05/08 19:07:26 [__main__] INFO: Done updating history data for binancecoin
    2022/05/08 19:07:26 [__main__] INFO: Default start timestamp for elrond-erd-2 is: 1599091200
    2022/05/08 19:07:30 [__main__] INFO: Writing 4 new price entries
    2022/05/08 19:07:30 [__main__] INFO: Done updating history data for elrond-erd-2
    
  • Now updating my previous cron script to replace the gecko_update_highres and gecko_update_history by their respective command line (no need to defined those specific scripts anymore): OK
One small issue I see with this “additional args” mechanism is that for instance this line nvp run coingecko --help will should the help for the run command: the --help argument is not “unknown” to it's collected as usual… 😐 Not quite sure how to deal with this, but well, I can live with this for now anyway.
  • Update: well actually there is a simple workaround for that: I just added an optional command line arg --show-help to the runner parser, and when specified, this will add the --help to our script command line (tested and working 👍!):
            if len(args) > 0:
                cmd += args
    
            if self.get_param("show_script_help", False):
                cmd += ["--help"]
    
  • Now that I can pass arbitrary args to my scripts, I continued with adding control for the “monitored coins”:
        context.define_subparsers("main", {
            'update-highres': None,
            'update-history': None,
            'monitored': {
                'list': None,
                'add': None,
                'remove': None,
                'drop': None
            }
        })
  • Implementing the corresponding action was easy:
            if cmd == 'monitored':
                cmd1 = self.ctx.get_command(1)
    
                if cmd1 == 'list':
                    # We should list all the monitored coins:
                    mcoins = self.get_coin_db().get_all_monitored_coins()
                    logger.info("List of monitored coins: %s", self.pretty_print(mcoins))
                    return True
                if cmd1 == 'remove':
                    # We should list all the monitored coins:
                    mcoin = self.get_param("mcoin_name")
                    self.get_coin_db().delete_monitored_coin(mcoin)
                    logger.info("Removed monitored coin %s", mcoin)
                    return True
                if cmd1 == 'add':
                    # We should list all the monitored coins:
                    mcoin = self.get_param("mcoin_name")
                    # Convert to list:
                    mcoin = mcoin.split(",")
                    self.get_coin_db().insert_monitored_coins(mcoin)
                    logger.info("Added monitored coins: %s", mcoin)
                    return True
                if cmd1 == 'drop':
                    logger.info("TODO: Should drop monitored coins table here.")
                    return True
  • And list/remove/add were tested successfully:
    $ nvp run coingecko monitored list
    2022/05/08 20:11:39 [__main__] INFO: List of monitored coins: [ 'bitcoin',
      'ethereum',
      'litecoin',
      'binancecoin',
      'elrond-erd-2',
      # ( large list here )
      'meld']
  • $ nvp run coingecko monitored remove meld
    2022/05/08 20:12:27 [__main__] INFO: Removed monitored coin meld
  • $ nvp run coingecko monitored add meld
    2022/05/08 20:12:52 [__main__] INFO: Added monitored coins: ['meld']
I can actually add multiple monitored coins at once separating them with a comma on the command line
  • Now time to use that full list of monitored coins to get the corresponding history price [this will take some time…]:
  • First ensuring that that read the list of monitored coins from the database now:
        def get_monitored_coins(self):
            """Retrieve the full list of monitored coins"""
            # return self.config["monitored_coins"]
            return self.get_coin_db().get_all_monitored_coins()
    
  • And then executing the command to get the data:
    $ nvp run coingecko update-history
  • All good: history data restored!
  • And I actually did the same for the highres data overnight, and now this is running smoothly also, as, in the coingecko_highres_updates.log, I get:
    2022/05/09 07:55:04 [__main__] INFO: Updated 69 highres datasets in 1.929 seconds
    2022/05/09 08:00:05 [__main__] INFO: Updated 69 highres datasets in 2.541 seconds
    2022/05/09 08:05:04 [__main__] INFO: Updated 69 highres datasets in 1.922 seconds
    2022/05/09 08:10:04 [__main__] INFO: Updated 69 highres datasets in 2.024 seconds
    2022/05/09 08:15:04 [__main__] INFO: Updated 69 highres datasets in 1.939 seconds
  • The next [critical] part to handle, is the backup of those new databases
  • First, let's check if I have an overall backup in place for my postgresql server: Yes, I have a backup element for that server in my backup config:
          // Backup of postgresql server data:
          "postgresql_server": {
            "source_dir": "/mnt/data1/containers/postgresql_server",
            "backup_dir": "${slot1}/containers",
            "repository_url": "ssh://git@gitlab.nervtech.org:22002/backups/postgresql_server.git",
            "clone_dirs": [
              "${slot1}/git/archives_1",
              "${slot2}/git/archives_1",
              "${slot3}/git/archives_1"
            ]
          },
  • So in theory, I don't really need to keep backups of the individual databases… but still, when it comes to backups, I now think that the more you have, the better, so let's have a look at how to backup individual databases anyway.
“Why am I getting so obcessed with backups ?” you may ask… well, as I said at the beginning of this post I got a recurring corruption issue on one disc in a RAID 6 array of 7 discs where I was basically storing all my data for, well, everything. So Everything I was getting a new problem, broken docker container, corrupted database, lost files, etc… a nightmare. And I don't want this to happen to me anymore, no way lol.
  • Basically I need a command like this one I believe:
    docker exec -t postgresql_server pg_dump crypto_coins -c -U crypto_user > my_dump.sql
  • That command seems to work just fine, so now I need to integrate this as part of my “backup” component processing… let's get to work.
  • Starting with a simple backup target definition as follow:
          // Backup of individual SQL databases:
          "postgresql_crypto_databases": {
            "type": "postgresql",
            "container": "postgresql_server",
            "user": "crypto_user",
            "databases": ["crypto_coins", "crypto_prices_v2"],
            "backup_dirs": [
              "${slot1}/sql/crypto",
              "${slot2}/sql/crypto",
              "${slot3}/sql/crypto"
            ]
          },
  • And preparing the corresponding handling method in the BackupManager component:
        def backup_postgresql_databases(self, tgt_name, desc):
            """Handle a postgresql database backup target"""
    
            logger.info("Should backup postgresql databases with desc: %s", desc)
  • let's check if this works:
    $ nvp backup run postgresql_crypto_databases
    2022/05/09 08:21:58 [components.backup_manager] INFO: Should backup postgresql databases with desc: {'type': 'postgresql', 'container': '
    postgresql_server', 'user': 'crypto_user', 'databases': ['crypto_coins', 'crypto_prices_v2'], 'backup_dirs': ['${slot1}/sql/crypto', '${s
    lot2}/sql/crypto', '${slot3}/sql/crypto']}
    2022/05/09 08:21:58 [components.backup_manager] INFO: Backup target postgresql_crypto_databases processed in 0.00 seconds
  • OK, now let's add some content in there.
  • Alright, it's working just fine now 👍! Here are the main functions I added in my Backup manager component to handle this process:
        def backup_postgresql_databases(self, tgt_name, desc):
            """Handle a postgresql database backup target"""
    
            # prepare the work folder:
            backup_dirs = self.collect_valid_slot_paths(desc["backup_dirs"])
            num_bdirs = len(backup_dirs)
            if num_bdirs == 0:
                logger.warning("No valid backup dir provided for %s", tgt_name)
                return
    
            logger.info("postgresql db backup dirs: %s", backup_dirs)
    
            # Prepare the work folder:
            work_dir = backup_dirs[0]
    
            # Ensure our work folder is created:
            self.make_folder(work_dir)
    
            # Iterate on each database:
            dbs = desc["databases"]
            user = desc["user"]
            container = desc["container"]
            now = self.get_now()
            now_str = now.strftime("%Y%m%d_%H%M%S")
    
            tools = self.get_component('tools')
    
            btype = self.get_rolling_backup_type(now.date())
            suffix = f"{btype}_{now_str}.sql"
    
            for db_name in dbs:
                # Prepare the command line:
                cmd = ["docker", "exec", "-t", container, "pg_dump", db_name, "-c", "-U", user]
                filename = f"{db_name}_{suffix}"
                outfile = self.get_path(work_dir, filename)
    
                logger.info("Dumping database %s...", db_name)
    
                with open(outfile, "w", encoding="utf-8") as file:
                    self.execute(cmd, outfile=file)
    
                # Next we should compress that file:
                logger.info("Generating archive for %s...", db_name)
    
                pkg_name = f"{filename}.tar.xz"
                tools.create_package(outfile, work_dir, pkg_name)
    
                # remove the source sql file:
                self.remove_file(outfile)
                pkg_file = self.get_path(work_dir, pkg_name)
    
                dest_folder = self.get_path(work_dir, db_name)
                self.make_folder(dest_folder)
                dest_file = self.get_path(dest_folder, pkg_name)
                self.rename_file(pkg_file, dest_file)
    
                self.remove_old_backup_files(dest_folder)
    
                # Finally we also copy that file into the additional backup slots:
                for i in range(1, num_bdirs):
                    bck_dir = backup_dirs[i]
                    dest_folder = self.get_path(bck_dir, db_name)
                    self.make_folder(dest_folder)
    
                    logger.debug("Copying %s into %s", pkg_name, dest_folder)
                    self.copy_file(dest_file, self.get_path(dest_folder, pkg_name))
    
                    # Remove the old backups there too:
                    self.remove_old_backup_files(dest_folder)
    
        def get_rolling_backup_type(self, date):
            """Retrieve the type of rolling backup based on the given date"""
            if date.day == 1:
                # first day of the month:
                return "mbak"
    
            if date.weekday() == 0:
                # This is a monday:
                return "wbak"
    
            # ordinary day backup:
            return "dbak"
    
        def remove_old_backup_files(self, folder):
            """Remove the too old backup files in a given folder"""
    
            files = self.get_all_files(folder, recursive=True)
    
            now_stamp = self.get_timestamp()
    
            # Iterate on each file:
            for fname in files:
    
                if "_mbak_" in fname:
                    # keep for 366 days!
                    offset = 366*24*3600
                elif "_wbak_" in fname:
                    # keep for 28 days:
                    offset = 28*24*3600
                elif "_dbak_" in fname:
                    # Keep the files for 7 days
                    offset = 7*24*3600
                else:
                    logger.warning("Ignoring non-rolling backup file %s", fname)
                    continue
    
                min_stamp = now_stamp - offset
    
                # Check the file timestamp:
                file_path = self.get_path(folder, fname)
                mtime = self.get_file_mtime(file_path)
                if mtime < min_stamp:
                    logger.info("Removing old backup file %s", file_path)
                    self.remove_file(file_path)
                else:
                    ndays = (mtime - min_stamp)/(24*3600)
                    logger.debug("Keeping %s for %.3f more days", file_path, ndays)
  • A bit long and complex sure, but its doing its job:
    • Dumping the databases to an SQL file,
    • Generating a tar.xz of that sql dump,
    • Placing that into the correct backup folder,
    • Duplicating the backup file in all the provided backup slots,
    • Removing the old backup files from all those slots.
  • I also considered generating PAR2 recovery files for those SQL archives, with something like that:
    # Create The par files:
    redun = 10
    pkg_file = self.get_path(work_dir, pkg_name)
    fsize = self.get_file_size(pkg_file)
    
    # We allocate 1 block per 1kB with min=10, max=3000
    nblocks = max(min(fsize/1024, 3000), 10)
    logger.info("Generating PAR2 files with %d source blocks...", nblocks)
    tools.create_par2_archives(pkg_file, redun, nblocks)
  • But in the end I think this is a bit of an overkill, so I removed that layer ;-).
  • This post is getting a bit long now, so I will stop it here for this time.
  • I now have the history and highres data stored on my postgresql server again (and continuously updated), so naturally the next step will be to restore the GUI application itself, so I could finally get back to crying in front of so much red graphs I will get out of this 🤣 [yeahh… we are the 10th May 2022, crypto currencies are getting completely destroyed right now… UST depeg, BTC -20%, etc… you know: the usual fun part lol]
  • ⇒ So, see you next time ✌!
  • blog/2022/0510_nervproj_crypto_coins_monitoring.txt
  • Last modified: 2022/05/10 11:08
  • by 127.0.0.1