ENH/API Separate reading of file from storing file data #335

drewejohnson · 2019-09-06T04:59:16Z

There is a lot of clutter on some of our reader classes that are not necessary as they strictly pertain to file locations and information related to reading the file. Once the file is read, we don't need to store how many times we counted past a block, like results and branching files. Furthermore, if we want to expand the API to export to alternative storage like HDF5, it would make sense to be able to recreate the readers and objects back from this file.

I propose that we break most if not all of the current readers into two classes: those responsible for reading files and those responsible for storing file data. Then the latter class doesn't care how the data was obtained, just that we have the data necessary for accessing and plotting etc. We can use a class method to create these file-objects using their associated readers. To demonstrate,

class ResultsReader:
    """Class for reading results files"""

    def __init__(self, filepath):
        # do some prep-work on the file, checking for existence?
    def read(self):
        # read the file and create a ResultsFile object
        return resultsFile

class ResultsFile:
    """Class for storing results data"""
    def __init__(self):
        self.metadata = {}
        self.universes = {}
        self.resultsData = {}

    @classmethod
    def fromSerpent(cls, fileP):
        reader = ResultsReader(fileP)
        return reader.read()

Then the main read function serpentTools.read just needs to know the expected File type and we can use fileType.fromSerpent(inputFile) and we're off to the races.
This can allow us to experiment with different file readers without mucking up how file data is stored, and makes it very clear what each object does: Readers read, Files hold file data. Maybe less clear for the file...

Easier support for additional export/input file types

Down the road, if we add exporting to HDF5, we can have companion methods toHdf5 and fromHdf5. This would support the following workflow

Run some potentially large job on a cluster
Compress outputs into one or more alternative files, potentially taking advantage of filtering, e.g. input_res.h5, input_dep.h5 or outputs.h5
Retrieve smaller compressed file[s] from cluster to local work station
Re-open the files using fromHdf5 for further post processing and analysis

Level of incompatibility

Ideally, anyone using serpentTools.read should not experience any difference on a basic level. The object returned from serpentTools.read is expected to behave exactly like the original. The issues will come if people are doing type checking or subclassing our readers. I'm not sure how many people are doing this, but I will reach out to the users list and ask people to comment when we think this through more seriously.

Phasing the changes in

We can rename our current readers to their file-like companions, e.g. ResultsReader ➡️ ResultsFile so that the output of serpentTools.read will not change as we introduce these changes. Then, we can add the original readers back as aliases to their replacements, with deprecation warnings is someone tries to initialize / subclass from them.

class ResultsReader(ResultsFile):
    def __init__(self, *args, **kwargs):
        warn("Future versions will restrict the ResultsReader simply to file reading, not long-term storage", 
          FutureWarning)
        ResultsFile.__init__(self, *args, **kwargs)

We could also add in the fromSerpent class method to the ResultsFile-like objects to at least provide the method, even if it does refer back to itself.

The text was updated successfully, but these errors were encountered:

Living in serpentTools/base.py, this class provides a very basic interface that will begin the transition in separating file objects and file readers [GH CORE-GATECH-GROUP#335]. Concrete classes must provide a single class method, ``fromSerpent``, that should perform the following actions: 1. Accept a string or pathlib.Path object representing the file name, or a readable stream to be read 2. Create a new instance of the SerpentFile, e.g. DetectorFile, 3. Process the data in the file or stream, and 4. Return the SerpentFile instance with the new data A helper function is also provided that can handle this flexibility in input arguments - serpentTools.base.getStream. Used as a context manager, this function ensures a readable stream is returned. Future commits will begin implementing concrete classes piecewise.

Interface outside of the rc object for expanding variable groups. Abstracted away because the setting interface will eventually be removed (GH CORE-GATECH-GROUP#339) as each reader (built from CORE-GATECH-GROUP#335 and CORE-GATECH-GROUP#400) will control their own settings.

drewejohnson added pythonicness discussion API - Compatible Changes to our API that require no user actions API - Incompatible Incompatible changes to our API that require user actions proposal labels Sep 6, 2019

drewejohnson added this to To do in Main Sep 8, 2019

drewejohnson mentioned this issue Oct 3, 2019

API Provide generalized detectors in serpentTools.detectors #341

Merged

drewejohnson removed the proposal label Nov 7, 2019

drewejohnson added this to the 0.10.0 milestone Nov 25, 2019

drewejohnson mentioned this issue Dec 29, 2019

[WIP] API Separate file objects from readers #371

Closed

13 tasks

drewejohnson moved this from To do to In progress in Main Dec 29, 2019

drewejohnson mentioned this issue Apr 27, 2020

DEV New data model and proposal for 0.10.0 #400

Merged

drewejohnson mentioned this issue Aug 6, 2020

[WIP] Start working on next results reader #410

Closed

5 tasks

drewejohnson mentioned this issue Aug 31, 2020

DEV Add variable expander in settings.py #418

Merged

This was referenced Oct 6, 2020

ENH Allow underlying results plot to be configured with keyword arguments #422

Closed

ENH Allow underlying sensitivity plot to be configured with keyword arguments #423

Closed

ENH Allow HomogUniv plot to be configured with keyword arguments #424

Closed

drewejohnson removed this from the 0.10.0 milestone Aug 7, 2023

drewejohnson mentioned this issue Aug 7, 2023

ENH Return self from reader read method #495

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH/API Separate reading of file from storing file data #335

ENH/API Separate reading of file from storing file data #335

drewejohnson commented Sep 6, 2019

ENH/API Separate reading of file from storing file data #335

ENH/API Separate reading of file from storing file data #335

Comments

drewejohnson commented Sep 6, 2019

Easier support for additional export/input file types

Level of incompatibility

Phasing the changes in