Analysis sessions space usage #38

153957 · 2013-05-15T10:59:45Z

The coincidences events use a lot of space (probably due to their traces)

$ du -h /var/lib/mysql/publicdb/*.MYD
779M    coincidences_event.MYD
659M    histograms_dailydataset.MYD
302M    histograms_dailyhistogram.MYD
<25M    [rest]

We should clean up events after they have been analysed or when a session has ended, since they will not be used again.

Except perhaps for the 'Get Example' option in jSparc, so there should always be some coincidences/events (make a couple of never ending 'ghost' sessions?)..

The text was updated successfully, but these errors were encountered:

153957 · 2013-05-16T16:14:57Z

The following script can delete events from expired sessions in one go.
However, it can use a lot of memory if there are a lot of qualifying events.
Currently there are more than 180.000 events that can be deleted,
this does not fit in the memory of our server.
A top down method (below) will probably work better.

from publicdb.coincidences.models import *
import datetime

now = datetime.datetime.now()
events = Event.objects.filter(coincidence__analyzedcoincidence__session__ends__lt=now)
events.delete()

This might work better, but has a lot of loop overhead, but at least it should fit in memory:

from publicdb.analysissessions.models import *
import datetime

now = datetime.datetime.now()
oldsessions = AnalysisSession.objects.filter(ends__lt=now)

for sessions in oldsessions:
    coincidences = sessions.analyzedcoincidence_set.all()
    for coinc in coincidences:
        events = coinc.coincidence.events.all()
        for event in events:
            event.delete()

153957 · 2013-05-29T14:40:01Z

The second method works nicely, i.e. does not crash the server.

After that I optimized the table to reclaim disc space:
optimize table coincidences_event;

779M -> 137M

This should be changed to happen automatically.

153957 · 2014-04-15T11:10:46Z

Perhaps this should be rethought when #43 is finished.
It might require some redesigning of jSparc.. but would be more flexible and faster since the coincidences will already be available.

tomkooij · 2019-03-01T13:26:03Z

ATM the SQL dump is too big to import into a publicdb VM! A database import runs out of diskspace even after deleting the 2GB .conda/pkgs cache.
As the VMs diskspace is already 10GB, I do not want to increase that.

/srv/publicdb/publicdb_venv> sudo su - postgres
Last login: Sun Mar  3 15:08:03 CET 2019 on pts/0
-bash-4.2$ psql
psql (9.6.12)
Type "help" for help.

postgres=# \c publicdb
You are now connected to database "publicdb" as user "postgres".
publicdb=# \dt+
                                      List of relations
 Schema |                  Name                  | Type  |  Owner  |    Size    | Description
--------+----------------------------------------+-------+---------+------------+-------------
 public | analysissessions_analysissession       | table | hisparc | 72 kB      |
 public | analysissessions_analyzedcoincidence   | table | hisparc | 9720 kB    |
 public | analysissessions_sessionrequest        | table | hisparc | 80 kB      |
 public | analysissessions_student               | table | hisparc | 112 kB     |
[...]
 public | coincidences_coincidence               | table | hisparc | 10048 kB   |
 public | coincidences_event                     | table | hisparc | 967 MB     |
[...]
 public | histograms_configuration               | table | hisparc | 119 MB     |
 public | histograms_dailydataset                | table | hisparc | 108 MB     |
 public | histograms_dailyhistogram              | table | hisparc | 97 MB      |
[...]
 public | histograms_multidailydataset           | table | hisparc | 419 MB     |
 public | histograms_multidailyhistogram         | table | hisparc | 1202 MB    |
 public | histograms_networkhistogram            | table | hisparc | 6720 kB    |
 public | histograms_networksummary              | table | hisparc | 272 kB     |
 public | histograms_stationtimingoffset         | table | hisparc | 9864 kB    |
 public | histograms_summary                     | table | hisparc | 15 MB      |

Again coincidences_event is very large.

tomkooij · 2019-03-01T14:41:41Z

I deleted most of the events. I used this very slow script. (The script above doesn't work anymore)

from publicdb.analysissessions.models import *

all_coincidences = AnalyzedCoincidence.objects.all()
for coinc in all_coincidences:
    if not coinc.session.in_progress():
        events = coinc.coincidence.events.all()
        for event in events:
            print 'about to delete: ', event
            event.delete()

This deleted >800MB in conicidences_event

public | coincidences_coincidence               | table | hisparc | 10056 kB |
public | coincidences_event                     | table | hisparc | 110 MB   |

Next I vacuumed the db:

sudo su - postgres
-bash-4.2$ vacuumdb -a
vacuumdb: vacuuming database "postgres"
vacuumdb: vacuuming database "publicdb"
vacuumdb: vacuuming database "template1"

153957 mentioned this issue Jan 11, 2016

jSparc analysis sessions Coincidences #113

Closed

153957 mentioned this issue Jul 8, 2018

Management commands (and other improvements) #232

Merged

32 tasks

153957 mentioned this issue Aug 28, 2018

Leftovers from #232 #237

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Analysis sessions space usage #38

Analysis sessions space usage #38

153957 commented May 15, 2013

153957 commented May 16, 2013 •

edited

153957 commented May 29, 2013

153957 commented Apr 15, 2014

tomkooij commented Mar 1, 2019 •

edited

tomkooij commented Mar 1, 2019 •

edited

Analysis sessions space usage #38

Analysis sessions space usage #38

Comments

153957 commented May 15, 2013

153957 commented May 16, 2013 • edited

153957 commented May 29, 2013

153957 commented Apr 15, 2014

tomkooij commented Mar 1, 2019 • edited

tomkooij commented Mar 1, 2019 • edited

153957 commented May 16, 2013 •

edited

tomkooij commented Mar 1, 2019 •

edited

tomkooij commented Mar 1, 2019 •

edited