-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SCISPARK 106] Write SciDataset to local filesystem as a NetCDF file #112
[SCISPARK 106] Write SciDataset to local filesystem as a NetCDF file #112
Conversation
Added extra functionality to write an entire RDD of SciDatasets to HDFS. We need to find a better way to write to hdfs without writing to local and then transferring to hdfs. To do this I created a class called SRDDFunctions which are functions you can call on top of RDD's rather than within a map task. How To Use: import org.dia.core.SRDDFUnctions._ val sRDD : RDD[SciDataset] = sc.NetcdfDFSFile("hdfs://hostname:9000/path/to/files/") The import implicitly gives you functions ontop of the RDD. This also addresses issue #50 |
…iteSciDatasetToNetcdf
rebased |
Changes Unknown when pulling 5f9db19 on rahulpalamuttam:writeSciDatasetToNetcdf into * on SciSpark:master*. |
Thanks @rahulpalamuttam. Tested. This PR is still a little dirty in that the rebased commits could have been squashed into one. Nonetheless, in the interest of moving on, I'm merging as it. |
This addresses issue #106
The key function here is SciDataset.write(name, path) which is used to write the contents
of the SciDataset to a Netcdf file.
The name parameter is an optional parameter. If no name is specified, then
the current datasetName is used. Note that the function does not
append ".nc" by default and so must be included in the name.
The path parameter is an optional parameter.
It points to the directory where this file will be written to.
By default it is written to the current directory.
Other changes :
SciDataset.scala
SciDataset has a datasetName member variable that indicates the name of the file it was loaded from.
SciDataset also has a globalDimensions function.
globalDimensions() gives you a list of strings which indicate the dimensions and their length
i.e. List("row(400)", "col(1440)")
The toString function in SciDataset also includes the global dimensions.
Variable.scala
Variable now has a member LinkedHashMap that records the dimension name and the corresponding value length as a key value pair.
NetcdfUtils.scala
The conversion function convertMa2Arrayto1dJavaArray() checks for the
data type stored in the Ma2Array using case statements. It converts the array to a Double array.
Future work :
Being able to write the SciDataset to HDFS rather than just the local file system.