Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analysis sessions space usage #38

Open
153957 opened this issue May 15, 2013 · 5 comments
Open

Analysis sessions space usage #38

153957 opened this issue May 15, 2013 · 5 comments

Comments

@153957
Copy link
Member

153957 commented May 15, 2013

The coincidences events use a lot of space (probably due to their traces)

$ du -h /var/lib/mysql/publicdb/*.MYD
779M    coincidences_event.MYD
659M    histograms_dailydataset.MYD
302M    histograms_dailyhistogram.MYD
<25M    [rest]

We should clean up events after they have been analysed or when a session has ended, since they will not be used again.

Except perhaps for the 'Get Example' option in jSparc, so there should always be some coincidences/events (make a couple of never ending 'ghost' sessions?)..

@153957
Copy link
Member Author

153957 commented May 16, 2013

The following script can delete events from expired sessions in one go.
However, it can use a lot of memory if there are a lot of qualifying events.
Currently there are more than 180.000 events that can be deleted,
this does not fit in the memory of our server.
A top down method (below) will probably work better.

from publicdb.coincidences.models import *
import datetime

now = datetime.datetime.now()
events = Event.objects.filter(coincidence__analyzedcoincidence__session__ends__lt=now)
events.delete()

This might work better, but has a lot of loop overhead, but at least it should fit in memory:

from publicdb.analysissessions.models import *
import datetime

now = datetime.datetime.now()
oldsessions = AnalysisSession.objects.filter(ends__lt=now)

for sessions in oldsessions:
    coincidences = sessions.analyzedcoincidence_set.all()
    for coinc in coincidences:
        events = coinc.coincidence.events.all()
        for event in events:
            event.delete()

@153957
Copy link
Member Author

153957 commented May 29, 2013

The second method works nicely, i.e. does not crash the server.

After that I optimized the table to reclaim disc space:
optimize table coincidences_event;

779M -> 137M

This should be changed to happen automatically.

@153957
Copy link
Member Author

153957 commented Apr 15, 2014

Perhaps this should be rethought when #43 is finished.
It might require some redesigning of jSparc.. but would be more flexible and faster since the coincidences will already be available.

@tomkooij
Copy link
Member

tomkooij commented Mar 1, 2019

ATM the SQL dump is too big to import into a publicdb VM! A database import runs out of diskspace even after deleting the 2GB .conda/pkgs cache.
As the VMs diskspace is already 10GB, I do not want to increase that.

/srv/publicdb/publicdb_venv> sudo su - postgres
Last login: Sun Mar  3 15:08:03 CET 2019 on pts/0
-bash-4.2$ psql
psql (9.6.12)
Type "help" for help.

postgres=# \c publicdb
You are now connected to database "publicdb" as user "postgres".
publicdb=# \dt+
                                      List of relations
 Schema |                  Name                  | Type  |  Owner  |    Size    | Description
--------+----------------------------------------+-------+---------+------------+-------------
 public | analysissessions_analysissession       | table | hisparc | 72 kB      |
 public | analysissessions_analyzedcoincidence   | table | hisparc | 9720 kB    |
 public | analysissessions_sessionrequest        | table | hisparc | 80 kB      |
 public | analysissessions_student               | table | hisparc | 112 kB     |
[...]
 public | coincidences_coincidence               | table | hisparc | 10048 kB   |
 public | coincidences_event                     | table | hisparc | 967 MB     |
[...]
 public | histograms_configuration               | table | hisparc | 119 MB     |
 public | histograms_dailydataset                | table | hisparc | 108 MB     |
 public | histograms_dailyhistogram              | table | hisparc | 97 MB      |
[...]
 public | histograms_multidailydataset           | table | hisparc | 419 MB     |
 public | histograms_multidailyhistogram         | table | hisparc | 1202 MB    |
 public | histograms_networkhistogram            | table | hisparc | 6720 kB    |
 public | histograms_networksummary              | table | hisparc | 272 kB     |
 public | histograms_stationtimingoffset         | table | hisparc | 9864 kB    |
 public | histograms_summary                     | table | hisparc | 15 MB      |

Again coincidences_event is very large.

@tomkooij
Copy link
Member

tomkooij commented Mar 1, 2019

I deleted most of the events. I used this very slow script. (The script above doesn't work anymore)

from publicdb.analysissessions.models import *

all_coincidences = AnalyzedCoincidence.objects.all()
for coinc in all_coincidences:
    if not coinc.session.in_progress():
        events = coinc.coincidence.events.all()
        for event in events:
            print 'about to delete: ', event
            event.delete()

This deleted >800MB in conicidences_event

public | coincidences_coincidence               | table | hisparc | 10056 kB |
public | coincidences_event                     | table | hisparc | 110 MB   |

Next I vacuumed the db:

sudo su - postgres
-bash-4.2$ vacuumdb -a
vacuumdb: vacuuming database "postgres"
vacuumdb: vacuuming database "publicdb"
vacuumdb: vacuuming database "template1"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants