New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NPI-3197 Transfer download functions from ginan repo #18
base: main
Are you sure you want to change the base?
Conversation
…to NPI-3197-transfer-download-functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few possible changes
def generate_uncompressed_filename(filename: str) -> str: | ||
"""Returns a string of the uncompressed filename given the [assumed compressed] filename | ||
|
||
:param str filename: Original filename of compressed file | ||
:return str: The uncompressed filename based on input (returns input filename if compression not recognised) | ||
""" | ||
if filename.endswith(".tar.gz") or filename.endswith(".tar"): | ||
with _tarfile.open(filename, "r") as tar: | ||
# Get name of file inside tar.gz file (assuming only one file) | ||
return tar.getmembers()[0].name | ||
elif filename.endswith(".crx.gz"): | ||
return filename[:-6] + "rnx" | ||
elif filename.endswith(".gz"): | ||
return filename[:-3] | ||
elif filename.endswith(".Z"): | ||
return filename[:-2] | ||
elif filename.endswith(".bz2"): | ||
return filename[:-4] | ||
else: | ||
logging.debug(f"{filename} not compressed - extension not a recognized compression format") | ||
return filename | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a bit too much if/elif
def generate_uncompressed_filename(filename: str) -> str:
"""Returns a string of the uncompressed filename given the [assumed compressed] filename
:param str filename: Original filename of compressed file
:return str: The uncompressed filename based on input (returns input filename if compression not recognised)
"""
# Define a dictionary to map file extensions to their corresponding actions
actions = {
".tar.gz": lambda f: _tarfile.open(f, "r").getmembers()[0].name,
".tar": lambda f: _tarfile.open(f, "r").getmembers()[0].name,
".crx.gz": lambda f: f[:-6] + "rnx",
".gz": lambda f: f[:-3],
".Z": lambda f: f[:-2],
".bz2": lambda f: f[:-4],
}
# Iterate over the dictionary items
for ext, action in actions.items():
if filename.endswith(ext):
return action(filename)
# If no matching extension is found, log a debug message and return the original filename
logging.debug(f"{filename} not compressed - extension not a recognized compression format")
return filename
or if you assume there is no .
in the filename
def generate_uncompressed_filename(filename: str) -> str:
"""Returns a string of the uncompressed filename given the [assumed compressed] filename
:param str filename: Original filename of compressed file
:return str: The uncompressed filename based on input (returns input filename if compression not recognised)
"""
# Define a dictionary to map file extensions to their corresponding actions
actions = {
"tar.gz": lambda f: _tarfile.open(f, "r").getmembers()[0].name,
"tar": lambda f: _tarfile.open(f, "r").getmembers()[0].name,
"crx.gz": lambda f: f[:-6] + "rnx",
"gz": lambda f: f[:-3],
"Z": lambda f: f[:-2],
"bz2": lambda f: f[:-4],
}
# Split the filename on the '.' and use the last part as the extension
ext = filename.split('.')[-1]
# Use the dictionary to get the action for the extension, if it exists
action = actions.get(ext)
# If an action was found, execute it, otherwise log a debug message and return the original filename
if action:
return action(filename)
else:
logging.debug(f"{filename} not compressed - extension not a recognized compression format")
return filename
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running this through testing I realised that there's a bigger problem with how this has been written up: the tarfile.open
function only works when there is an actual file to open. The generate_uncompressed_filename
function is supposed to work on strings so this will fail unless the filename
variable actually points at an existing file. We may have to make an exception for tar.gz
files in that we cannot generate its uncompressed filename
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having no .
in the filename seems... very unlikely? I shouldn't need to handle e.g. a file named bz2
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably not realistic to reliably predict the filename of a compressed directory/collection - only the case where a single file is compressed. And even then, there is no universal convention for naming a compressed directory vs a compressed single file. So when provided with only a filename, we'd still have to make assumptions as to whether it decompresses to single file of the same name (minus suffix) or a directory, based on what is typical of specific sources.
def long_filename_cddis_cutoff(epoch: _datetime.datetime) -> bool: | ||
"""Simple function that determines whether long filenames should be expected on the CDDIS server | ||
|
||
:param _datetime.datetime epoch: Start epoch of data in file | ||
:return bool: Boolean of whether file would follow long filename convention on CDDIS | ||
""" | ||
long_filename_cutoff = _datetime.datetime(2022, 11, 27) | ||
if epoch >= long_filename_cutoff: | ||
return True | ||
else: | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return epoch >= long_filename_cutoff
will have the same results as your if statement
if epoch >= long_filename_cutoff:
return True
else:
return False
def download_file_from_cddis( | ||
filename: str, | ||
ftp_folder: str, | ||
output_folder: _Path, | ||
max_retries: int = 3, | ||
decompress: bool = True, | ||
if_file_present: str = "prompt_user", | ||
note_filetype: str = None, | ||
) -> None: | ||
"""Downloads a single file from the cddis ftp server. | ||
|
||
:param filename: Name of the file to download | ||
:ftp_folder: Folder where the file is stored on the remote | ||
:output_folder: Folder to store the output file | ||
:ftps: Optional active connection object which is reused | ||
:max_retries: Number of retries before raising error | ||
:uncomp: If true, uncompress files on download | ||
""" | ||
with ftp_tls("gdc.cddis.eosdis.nasa.gov") as ftps: | ||
ftps.cwd(ftp_folder) | ||
retries = 0 | ||
download_done = False | ||
while not download_done and retries <= max_retries: | ||
try: | ||
download_filepath = attempt_ftps_download( | ||
download_dir=output_folder, | ||
ftps=ftps, | ||
filename=filename, | ||
type_of_file=note_filetype, | ||
if_file_present=if_file_present, | ||
) | ||
if decompress and download_filepath: | ||
download_filepath = decompress_file( | ||
input_filepath=download_filepath, delete_after_decompression=True | ||
) | ||
download_done = True | ||
if download_filepath: | ||
logging.info(f"Downloaded {download_filepath.name}") | ||
except _ftplib.all_errors as e: | ||
retries += 1 | ||
if retries > max_retries: | ||
logging.info(f"Failed to download {filename} and reached maximum retry count ({max_retries}).") | ||
if (output_folder / filename).is_file(): | ||
(output_folder / filename).unlink() | ||
raise e | ||
|
||
logging.debug(f"Received an error ({e}) while try to download {filename}, retrying({retries}).") | ||
# Add some backoff time (exponential random as it appears to be contention based?) | ||
_sleep(_random.uniform(0.0, 2.0**retries)) | ||
return download_filepath |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typehint of the return is None
in the declaration
Recently we had an update to the auto_download_PPP.py script over on the Ginan repo. A number of new and useful functions were added there but have a more appropriate home here on
gnssanalysis
.This PR transfers some of those useful functions from https://github.com/GeoscienceAustralia/ginan/blob/develop-weekly/scripts/auto_download_PPP.py to
gn_download.py
.This should be the first in a series of PRs where we slowly move to using the newer download functions. This will include:
Happy to discuss this further in the comments of this PR or in the discussions tab of this repo