Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NPI-3197 Transfer download functions from ginan repo #18

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

ronaldmaj
Copy link
Collaborator

Recently we had an update to the auto_download_PPP.py script over on the Ginan repo. A number of new and useful functions were added there but have a more appropriate home here on gnssanalysis.

This PR transfers some of those useful functions from https://github.com/GeoscienceAustralia/ginan/blob/develop-weekly/scripts/auto_download_PPP.py togn_download.py.

This should be the first in a series of PRs where we slowly move to using the newer download functions. This will include:

  • moving existing utilities that use the old download functions to using the new functions (on gnssanalysis and ginan)
  • deleting old functions that have been superseded with new functions
  • updating / documenting existing functions that aren't deleted

Happy to discuss this further in the comments of this PR or in the discussions tab of this repo

@seballgeyer seballgeyer self-requested a review April 3, 2024 04:23
Copy link
Collaborator

@seballgeyer seballgeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few possible changes

Comment on lines +215 to +236
def generate_uncompressed_filename(filename: str) -> str:
"""Returns a string of the uncompressed filename given the [assumed compressed] filename

:param str filename: Original filename of compressed file
:return str: The uncompressed filename based on input (returns input filename if compression not recognised)
"""
if filename.endswith(".tar.gz") or filename.endswith(".tar"):
with _tarfile.open(filename, "r") as tar:
# Get name of file inside tar.gz file (assuming only one file)
return tar.getmembers()[0].name
elif filename.endswith(".crx.gz"):
return filename[:-6] + "rnx"
elif filename.endswith(".gz"):
return filename[:-3]
elif filename.endswith(".Z"):
return filename[:-2]
elif filename.endswith(".bz2"):
return filename[:-4]
else:
logging.debug(f"{filename} not compressed - extension not a recognized compression format")
return filename

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a bit too much if/elif

def generate_uncompressed_filename(filename: str) -> str:
    """Returns a string of the uncompressed filename given the [assumed compressed] filename

    :param str filename: Original filename of compressed file
    :return str: The uncompressed filename based on input (returns input filename if compression not recognised)
    """
    # Define a dictionary to map file extensions to their corresponding actions
    actions = {
        ".tar.gz": lambda f: _tarfile.open(f, "r").getmembers()[0].name,
        ".tar": lambda f: _tarfile.open(f, "r").getmembers()[0].name,
        ".crx.gz": lambda f: f[:-6] + "rnx",
        ".gz": lambda f: f[:-3],
        ".Z": lambda f: f[:-2],
        ".bz2": lambda f: f[:-4],
    }

    # Iterate over the dictionary items
    for ext, action in actions.items():
        if filename.endswith(ext):
            return action(filename)

    # If no matching extension is found, log a debug message and return the original filename
    logging.debug(f"{filename} not compressed - extension not a recognized compression format")
    return filename

or if you assume there is no . in the filename

def generate_uncompressed_filename(filename: str) -> str:
    """Returns a string of the uncompressed filename given the [assumed compressed] filename

    :param str filename: Original filename of compressed file
    :return str: The uncompressed filename based on input (returns input filename if compression not recognised)
    """
    # Define a dictionary to map file extensions to their corresponding actions
    actions = {
        "tar.gz": lambda f: _tarfile.open(f, "r").getmembers()[0].name,
        "tar": lambda f: _tarfile.open(f, "r").getmembers()[0].name,
        "crx.gz": lambda f: f[:-6] + "rnx",
        "gz": lambda f: f[:-3],
        "Z": lambda f: f[:-2],
        "bz2": lambda f: f[:-4],
    }

    # Split the filename on the '.' and use the last part as the extension
    ext = filename.split('.')[-1]

    # Use the dictionary to get the action for the extension, if it exists
    action = actions.get(ext)

    # If an action was found, execute it, otherwise log a debug message and return the original filename
    if action:
        return action(filename)
    else:
        logging.debug(f"{filename} not compressed - extension not a recognized compression format")
        return filename

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running this through testing I realised that there's a bigger problem with how this has been written up: the tarfile.open function only works when there is an actual file to open. The generate_uncompressed_filename function is supposed to work on strings so this will fail unless the filename variable actually points at an existing file. We may have to make an exception for tar.gz files in that we cannot generate its uncompressed filename

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having no . in the filename seems... very unlikely? I shouldn't need to handle e.g. a file named bz2.

Copy link
Collaborator

@treefern treefern Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably not realistic to reliably predict the filename of a compressed directory/collection - only the case where a single file is compressed. And even then, there is no universal convention for naming a compressed directory vs a compressed single file. So when provided with only a filename, we'd still have to make assumptions as to whether it decompresses to single file of the same name (minus suffix) or a directory, based on what is typical of specific sources.

Comment on lines +326 to +336
def long_filename_cddis_cutoff(epoch: _datetime.datetime) -> bool:
"""Simple function that determines whether long filenames should be expected on the CDDIS server

:param _datetime.datetime epoch: Start epoch of data in file
:return bool: Boolean of whether file would follow long filename convention on CDDIS
"""
long_filename_cutoff = _datetime.datetime(2022, 11, 27)
if epoch >= long_filename_cutoff:
return True
else:
return False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return epoch >= long_filename_cutoff

will have the same results as your if statement

    if epoch >= long_filename_cutoff:
        return True
    else:
        return False

Comment on lines +825 to +874
def download_file_from_cddis(
filename: str,
ftp_folder: str,
output_folder: _Path,
max_retries: int = 3,
decompress: bool = True,
if_file_present: str = "prompt_user",
note_filetype: str = None,
) -> None:
"""Downloads a single file from the cddis ftp server.

:param filename: Name of the file to download
:ftp_folder: Folder where the file is stored on the remote
:output_folder: Folder to store the output file
:ftps: Optional active connection object which is reused
:max_retries: Number of retries before raising error
:uncomp: If true, uncompress files on download
"""
with ftp_tls("gdc.cddis.eosdis.nasa.gov") as ftps:
ftps.cwd(ftp_folder)
retries = 0
download_done = False
while not download_done and retries <= max_retries:
try:
download_filepath = attempt_ftps_download(
download_dir=output_folder,
ftps=ftps,
filename=filename,
type_of_file=note_filetype,
if_file_present=if_file_present,
)
if decompress and download_filepath:
download_filepath = decompress_file(
input_filepath=download_filepath, delete_after_decompression=True
)
download_done = True
if download_filepath:
logging.info(f"Downloaded {download_filepath.name}")
except _ftplib.all_errors as e:
retries += 1
if retries > max_retries:
logging.info(f"Failed to download {filename} and reached maximum retry count ({max_retries}).")
if (output_folder / filename).is_file():
(output_folder / filename).unlink()
raise e

logging.debug(f"Received an error ({e}) while try to download {filename}, retrying({retries}).")
# Add some backoff time (exponential random as it appears to be contention based?)
_sleep(_random.uniform(0.0, 2.0**retries))
return download_filepath
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typehint of the return is None in the declaration

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants