New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clean up SRDD.scala and SciSparkcontext.scala #67
Comments
Brian and I discussed this briefly today. If we look at SciSparkContext, it is a wrapper around the SparkContext - i.e. it utilizes composition and delegates some of the loading operations to SparkContext. An example of this is NetcdfDFSFile. The function uses the SparkContext's binaryFiles function to read netcdf files off of HDFS. Other functions are either pulling data from OpenDAP, reading from the local file system, or generating a random dataset. For these functions we utilize the SRDD constructor and generate SRDD's.
A potential solution to both issues with the least invasive changes to other parts of the code is to rewrite binaryFiles to return an SRDD rather than an RDD. Another solution is to make SRDD a wrapper for an RDD and lose the inheritance. This is a quick fix solution and will effect some other parts of code. |
@kwhitehall @BrianWilson1 How SRDD should really be used is just to abstract away loading datasets from different sources. SciSparkContext should just return an RDD of Scidatasets. Even functions that explicitly return to an SRDD should just return it as an RDD (this can be easily done since SRDD is a subclass of RDD). |
@rahulpalamuttam @BrianWilson1 Can I close this issue? |
@kwhitehall Yes. SciSparkContext still needs some refactoring.
I'll file a new issue for the code changes, once this one is closed. |
Currently some methods in SciSparkContext return an RDD of sciTensors, while others return an SRDD of sciTensors. We should discuss here the best approach. @rahulpalamuttam @chrismattmann @BrianWilson1
The text was updated successfully, but these errors were encountered: