Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework move logic #189

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

GuillaumeSeren
Copy link
Collaborator

Following the #188 issue, we needed to change move() function,
to be able to move, and upgrade the database a bit more lightly:

  • Copy the original file to the destination
  • Add the new file to the database
  • Remove the original file
  • Remove the original mail from db

Copy link
Member

@flokli flokli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your PR! Please have a look at the comments.

# Open DB R/W if are not in dry-run
if not self.dry_run:
self.db = notmuch.Database(self.db_path,
mode=notmuch.Database.MODE.READ_WRITE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we set database mode inside __init__, as we already know there if we want run in dry-run or not?
This would avoid opening the database a second time here…

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I will move this to the __init__

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So about this point, I have tested several version, but if I open the database READ_WRITE,
in the __init__ the call hang after 2 requests, and get stuck, I have to kill manually.

$ afew --move-mails -vvvvvvvv --all
INFO:root:checking mails in 'Perso/INBOX'                                                                                                                 
DEBUG:root:query: folder:Perso/INBOX AND tag:spam AND 576540000..1522686227                                                                               
DEBUG:root:query: folder:Perso/INBOX AND tag:sent AND 576540000..1522686227

So maybe it comes with my own version of notmuch / Xappian but unless I find a way to make it work, I think it is better to let the DB READ_ONLY all the time and just switch to READ_WRITE when needed (like I finally did), but it would nicer to have it setup only once.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hanging while trying to open the database r/w a second time looks like the previous connection wasn't closed (since notmuch 0.23):

When opening a database notmuch by default will wait for another process to release a write lock, rather than returning an error.

except shutil.SameFileError:
logging.warn("trying to move '{}' onto itself".format(fname))
continue
logging.info("Copy original mail: "+fname)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use .format instead of concatenating strings: https://pyformat.info/

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I will upgrade those

logging.info("Copy to: "+destination)
dbpath = self.db.get_path()
logging.info("Path of DB: "+dbpath)
relpath=os.path.relpath(os.path.dirname(destination), dbpath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo - should be realpath and os.path.realpath.
Is realpath calculation even necessary?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the beginning I was trying to use a relative path as explained in the doc, but after some time stuck with that I finally test and successfully made it work with full path.
So no this is not more necessary I will remove it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow, there's a os.path.relpath as well - didn't know that.
But if the code now works with both relative and absolute paths, we shouldn't do this here, right.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I discovered relpath aswell really useful to generate relative path.
I agree we can get rid of that now we only use absolute path.

shutil.copy2(fname, self.get_new_name(
fname,
destination))
dest_file_rel = dbpath+'/'+relpath+'/'+filename
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use os.path.join, or even better pathlib to join paths.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I will upgrade that

# 3 REMOVE ORIGINAL
logging.info("Delete the original mail in DB")
# 4 REMOVE ORIGINAL FROM DB
notmuch.Database.remove_message(self.db, fname)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I do favor some logging, this whole block seems to be wayy to extensive.

Most of the statements are merely printing out variable values. Maybe one or two lines with loglevel debug (something like {} -> {}.format(oldpath, newpath)`) should be enough

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree but I will rework them to be more precise

# this is ugly, but shutil does not provide more
# finely individuated errors
if str(e).endswith("already exists"):
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's with this line? Do we already check somewhere else if source and destination are the same, which is why we skip that check?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed that because this will be logged by shutil log and I think if you try to copy over the same thing it should be an error from shutil and/or your FS but I will add a test with a similar message for that case.

@@ -59,7 +62,7 @@ def move(self, maildir, rules):
messages = notmuch.Query(self.db, main_query).search_messages()
for message in messages:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add below per-message code into a self-contained staticmethod? This would greatly improve readability and testability.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really sure to fully understand, but if you talk about export the whole per-message process I agree, I think the whole class need some refactor, for example the move() method is also searching the messages which is problematic to add tests and keep simple.

So I have project to refactor the class to be more simple, and support some unit tests, but as the mail mover is a big feature I think it would be safer to do that after.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, code in the loop body should go inside it's own staticmethod, for better testability.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes I agree the thing is I feel it would be simpler to use the function move() to do the actual moving operation, and create an other function like getMessages() to do the notmuch query and pass the message list to the move(), that way it could be easy tested by using mock data.

@GuillaumeSeren
Copy link
Collaborator Author

GuillaumeSeren commented Apr 2, 2018

Hey @flokli I have upgraded the branch I have implemented all your feedback maybe beside the open connection ot the database in the __init__ and also tested it with a lot of mails and it work fine.
If we agree on the design I will rebase the branch to get the commit a bit cleaner.

str(
uuid.uuid1())
+ ':'
+ os.path.basename(fname).split(':')[-1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use PyFormat here as well?

# construct a new filename, composed of a made-up ID
# and the flags part of the original filename
filename = '{}:{}'.format(uuid.uuid1(),
                          os.path.basename(fname).split(':')[-1])
return os.path.join(destination, filename)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sure, I just upgraded it.

@flokli
Copy link
Member

flokli commented Apr 8, 2018

Already looks much cleaner, thanks!
Can you squash some of the follow-up commits together with the ones were I commented on? This makes reviewing a lot easier 👍
Additionally, in d3d62af you can use %-style format strings, and pass variables to the logging.* method directly, so we can strip some of the .format calls, too.

With that being done, I can have another look in why we need to open the database again for R/W.

@GuillaumeSeren
Copy link
Collaborator Author

@flokli I am squashing the non moveMail() related commit.

Not really sure about the "you can use %-style format strings, and pass variables to the logging.* method directly, " but I have cleanup a bit of messages in the log.

@GuillaumeSeren
Copy link
Collaborator Author

Hey @flokli
If you do not have more feedback to fix in this patch I think we could push it,
sure there is no rush but there is a few other PR like #201 that should also appends,
ideally after this one, because it would force us to restart the review.

def moveMail(self, mailFrom, mailTo, upgradeDatabase):
output = False
""" We need to check that we are not moving a file to itself """
if mailFrom == mailTo:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mailFrom == mailTo should not be an error. But mailTo is a dir, while mailFrom is a file. So: can this happen? If not, it should not be considered, either.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sure I will rebase the branch and change that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey,
after rebasing the branch, and re-reading I aggree we need to be sure that:

  • mailFrom is a file AND mailTo is a directory

If this is not I will raise an error.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you see those:

            try:
                " mailFrom must be a file "
                if not os.path.isfile(mailFrom):
                    logging.error("mailFrom not a file: {}".format(mailFrom))
                    raise SystemExit
                " mailTo must be a directory "
                if not os.path.isdir(mailTo):
                    logging.error("mailTo not a dir: {}".format(mailTo))
                    raise SystemExit

So those tests already exist (sorry took me a second reading to be sure),
so duplicating those is not a good idea, and moving them end in the same state.

So tell me @rpuntaie

Copy link

@rpuntaie rpuntaie Sep 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the commits I referred to have gone. There is no mailFrom anywhere anymore. So this seems to be obsolete.

As commented further below: tests are absolutely needed, to avoid breaking established behavior. E.g. I've seen {}/{}/cur/ is now {}/{}, which is an interface change: The user would have to change its config.

def move(self, maildir, rules):
'''
Move mails in folder maildir according to the given rules.
'''
# identify and move messages
logging.info("checking mails in '{}'".format(maildir))
to_delete_fnames = []
moved = False
for query in rules.keys():

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you assure that the mailbox dirs do exist. I would like to only change the config, without having to additionally create a new directory manually:

    for ncs in 'tmp new cur'.split(): 
        destination = os.path.join(self.db_path,rules[query],ncs)
        os.makedirs(destination, mode=0o700, exist_ok=True)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On that point I am not sure that behavior should be default,
to my pov, the mailmover should move from existing mailbox to an other (existing).

I am not sure, this case may append often, but if there is a real use case behind this,
we could easily add this check, feel free to add more details @rpuntaie maybe I am missing something.

@GuillaumeSeren
Copy link
Collaborator Author

GuillaumeSeren commented Sep 18, 2019

So, I have merged #200 and #201 this patch need some rebase / cleanup,
I will do it asap, maybe we can squeeze it into the next release (2.0.1).

Fix multi import
Cleanup flake8 error
Refactor MailMover.move() logic
Fix import order
Cleanup class linter error
@GuillaumeSeren
Copy link
Collaborator Author

GuillaumeSeren commented Sep 18, 2019

Ok I have rebased the branch on the actual master,
I need help to test / review the patch.

Send reviews !

@flokli
Copy link
Member

flokli commented Sep 19, 2019 via email

@GuillaumeSeren
Copy link
Collaborator Author

Hey @flokli ,
No need to be sorry, what we need is a project that can evolve and keep it's functionnal state,
yes having tests is better than not, no one is discussing that.

I will work on that and add a test around the MailMover, this issue is real and has to fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants