blog:2021:1206_finance_pruning_sqlite_records

# Pruning entries from SQlite database

Hey hey! So, very quick/short article this time. Because this is a bit of an urgency in fact: I have an SQLite database, that I'm acessing in python, and I write cryptocurrency prices in that database… for about 45 currencies so far… every minute… lol.

So, yeah… the idea was to collect enough data to be able to perform some advanced/precise analysis on about 7 days… except that I'm been collecting data for a lot more that just 7 days now 😁 So my database is getting bigger and bigger everyday! It's not extraordinary big sure, but it's about 130MB already for a single database file, so I really think I should try to keep this under control somehow.

The solution is relatively simple: everytime I write a new set of price entries, I should then also delete the too old entries. ⇒ So how do we do that ? [… burning a few neurons here… but not too much really…🤔]. And here we go! Just created a simple function in my PriceDB class:

    def deleteOldHighResRecords(self, ts):
cur = self.execute("SELECT COUNT(*) from highres WHERE timestamp < %d;" % ts)
num = cur.fetchone()
logDEBUG("Deleting %d rows from highres dataset." % num)
cur = self.execute("DELETE from highres WHERE timestamp < %d;" % ts, commit=True)
cur.fetchall()

And then I simply call this function on each 1 minute cycle when I'm done adding the new rows inside the database:

        # Here we should also remove the too old entries:
# We remove all the entries older than 8 days:
pdb.deleteOldHighResRecords(nowStamp - 8*24*3600)

Then checking the outputs I get in my highres_price_dataset.log file, we see that the first pass was a bit intensive (deleting more than 603k records), but then it will quickly stabilize itself to deleteting only 1 row per currency on each cycle:

2021-06-12T16:26:11.898338 [DEBUG] Updated all high res datasets in 8.081 seconds
2021-06-12T16:29:03.789121 [DEBUG] Deleting 603834 rows from highres dataset.
2021-06-12T16:30:09.625585 [DEBUG] Updated all high res datasets in 67.816 seconds
2021-06-12T16:31:06.311368 [DEBUG] Deleting 94 rows from highres dataset.
2021-06-12T16:31:06.633544 [DEBUG] Updated all high res datasets in 4.774 seconds
2021-06-12T16:32:03.574602 [DEBUG] Deleting 47 rows from highres dataset.
2021-06-12T16:32:10.712554 [DEBUG] Updated all high res datasets in 8.377 seconds
2021-06-12T16:33:11.018032 [DEBUG] Deleting 47 rows from highres dataset.
2021-06-12T16:33:13.725622 [DEBUG] Updated all high res datasets in 11.107 seconds
2021-06-12T16:34:03.695258 [DEBUG] Deleting 47 rows from highres dataset.
2021-06-12T16:34:03.885305 [DEBUG] Updated all high res datasets in 1.326 seconds
2021-06-12T16:35:08.617795 [DEBUG] Deleting 47 rows from highres dataset.
2021-06-12T16:35:12.484663 [DEBUG] Updated all high res datasets in 10.447 seconds

And well… this is it 😊! For once, I told you this was going to be a very short one and I didn't lie in the end hi hi hi 🤣

• blog/2021/1206_finance_pruning_sqlite_records.txt