[SCISPARK 106] Write SciDataset to local filesystem as a NetCDF file #112

rahulpalamuttam · 2016-08-13T07:37:06Z

This addresses issue #106

The key function here is SciDataset.write(name, path) which is used to write the contents
of the SciDataset to a Netcdf file.

The name parameter is an optional parameter. If no name is specified, then
the current datasetName is used. Note that the function does not
append ".nc" by default and so must be included in the name.

The path parameter is an optional parameter.
It points to the directory where this file will be written to.
By default it is written to the current directory.

Other changes :
SciDataset.scala
SciDataset has a datasetName member variable that indicates the name of the file it was loaded from.
SciDataset also has a globalDimensions function.
globalDimensions() gives you a list of strings which indicate the dimensions and their length
i.e. List("row(400)", "col(1440)")
The toString function in SciDataset also includes the global dimensions.

Variable.scala
Variable now has a member LinkedHashMap that records the dimension name and the corresponding value length as a key value pair.

NetcdfUtils.scala
The conversion function convertMa2Arrayto1dJavaArray() checks for the
data type stored in the Ma2Array using case statements. It converts the array to a Double array.

Future work :
Being able to write the SciDataset to HDFS rather than just the local file system.

rahulpalamuttam · 2016-08-14T01:10:36Z

Added extra functionality to write an entire RDD of SciDatasets to HDFS.
This is a quick fix.
Each of the SciDatasets are written locally to the tmp directory.
The files are then copied from the tmp directory to the hdfs directory specified (can also be copied to local file system directories.

We need to find a better way to write to hdfs without writing to local and then transferring to hdfs.
It's a quick fix for now.

To do this I created a class called SRDDFunctions which are functions you can call on top of RDD's rather than within a map task.

How To Use:

import org.dia.core.SRDDFUnctions._
....

val sRDD : RDD[SciDataset] = sc.NetcdfDFSFile("hdfs://hostname:9000/path/to/files/")
sRDD.writeSRDD("hdfs://hostname:9000/new/path/to/files/different/")

The import implicitly gives you functions ontop of the RDD.

This also addresses issue #50

…t/resources

…aset

…iteSciDatasetToNetcdf

rahulpalamuttam · 2016-08-15T21:30:57Z

rebased

coveralls · 2016-08-15T21:40:06Z

Changes Unknown when pulling 5f9db19 on rahulpalamuttam:writeSciDatasetToNetcdf into * on SciSpark:master*.

kwhitehall · 2016-08-16T00:18:49Z

Thanks @rahulpalamuttam. Tested. This PR is still a little dirty in that the rebased commits could have been squashed into one. Nonetheless, in the interest of moving on, I'm merging as it.

rahulpalamuttam added 7 commits August 12, 2016 21:52

SciDataset write to Netcdf files, Variables record dimension values

0154674

Update Ma2 conversion to do appropriate type conversions

bf3b327

comment on write function

a9066d5

test on globalDimensions function

76ee35f

add coverage badge of scispark

b65d4c2

add sbt coverage and coveralls + call from travis.yml script

33eea97

Write all SciDatasets in SRDD to directory in HDFS

bf6c8ea

rahulpalamuttam added 18 commits August 13, 2016 18:12

file must end in newLine

950d4d4

need to add file so directory exists on github

a6befd8

rename to writeToNetCDF for more explicit naming

d829368

move SciDatasetTest to scala directory

d81238f

remove http credentials from code and put it in config file under tes…

21a9163

…t/resources

[SCISPARK 108] fix for os x builds

f3a2fff

[SCISPARK 108] travis CI can't build scala for OS x

0848808

[SCISPARK 108] indicate success when successfully downloading the dat…

46c1a3f

…aset

SciDataset write to Netcdf files, Variables record dimension values

960ed46

Update Ma2 conversion to do appropriate type conversions

277f3e4

comment on write function

cdcd590

test on globalDimensions function

2ceea98

Write all SciDatasets in SRDD to directory in HDFS

a8de568

file must end in newLine

8e79494

need to add file so directory exists on github

0db89b9

rename to writeToNetCDF for more explicit naming

1aa2701

move SciDatasetTest to scala directory

73a82fb

Merge remote-tracking branch 'origin/writeSciDatasetToNetcdf' into wr…

5f9db19

…iteSciDatasetToNetcdf

kwhitehall merged commit 1131c63 into SciSpark:master Aug 16, 2016

rahulpalamuttam mentioned this pull request Aug 16, 2016

[SCISPARK 106] Write SciDataset to local filesystem as a NetCDF file V2 #119

Closed

rahulpalamuttam deleted the writeSciDatasetToNetcdf branch August 16, 2016 01:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SCISPARK 106] Write SciDataset to local filesystem as a NetCDF file #112

[SCISPARK 106] Write SciDataset to local filesystem as a NetCDF file #112

rahulpalamuttam commented Aug 13, 2016

rahulpalamuttam commented Aug 14, 2016 •

edited

rahulpalamuttam commented Aug 15, 2016

coveralls commented Aug 15, 2016

kwhitehall commented Aug 16, 2016

[SCISPARK 106] Write SciDataset to local filesystem as a NetCDF file #112

[SCISPARK 106] Write SciDataset to local filesystem as a NetCDF file #112

Conversation

rahulpalamuttam commented Aug 13, 2016

rahulpalamuttam commented Aug 14, 2016 • edited

rahulpalamuttam commented Aug 15, 2016

coveralls commented Aug 15, 2016

kwhitehall commented Aug 16, 2016

rahulpalamuttam commented Aug 14, 2016 •

edited