KeplerML

Introduction

The work contained here has been developed for the purpose of outlier detection on light curves from the Kepler Data. There are methods to calculate features based on light curve variability, methods to identify outliers based on cluster membership, methods to score the relative outlying nature of each point within a dataset, and some user interface methods to explore the data. Clustering and scoring can be performed independently, though are intrinsically related to one another. Both require features to be calculated for the light curves as a prerequisite. The user interface tools have various prerequisites.

It is assumed that these methods will be applied to very large sets of data; each Kepler quarter contains roughly 160k light curves. Light curve features can be calculated for any number of light curves, and most methods for clustering and scoring will work with any number of files, but a minimum of 1k is recommended. The "Cluster Outlier Object" developed as the analysis container of this work cannot be created with less than 1k objects.

Feature Calculation:

Recommended prerequisite: Filelist with lightcurve filenames.

Recommendation (example given for long cadence lightcurves from a Kepler quarter):

Collect lightcurves into a single directory.
Generate a filelist of the contents of that directory (the lightcurves)

for f in *llc.fits; do echo $f >> filelist; done

NOTE: For unknown reasons the code is having issues processing a whole quarter at once. Recommend splitting the files into at least 2 groups. An easy option is the splitting the files starting in 00 and 01

(alternate)

for f in kplr00*llc.fits; do echo $f >> Q??_00filelist; done

for f in kplr01*llc.fits; do echo $f >> Q??_01filelist; done

Where ?? is replaced by the quarter number.

Ways to use:

Run keplerml.py to calculate the lightcurve features, this will output a numpy array with the calculated features for each lightcurve.
```
 python keplerml.py path/to/filelist path/to/fitsfiles path/to/outputfile
```
Open Feature Calculation Example.ipynb in jupyter notebook to see examples of how to run feature calculation in a notebook.

Note: Using a 48-2.70GHz core linux computer (using 47 of the cores), processing 114,948 files took 54m:48s, which translates to 1.344 seconds to process a single file on one core. If you have less cores (most computers have 1-8 cores), multiply the number of files by the time to process a single file, and divide by the number of cores in the computer for an estimate on how long it will take to process.

Clustering:

Prerequisite: Calculated features saved as a Pandas Dataframe or Numpy array. Recommendation: Use the feature data in a Cluster Outlier Object.

See clustering example.

Scoring:

Prerequisite: Calculated features saved as a Pandas Dataframe or Numpy array. Recommendation: Use the feature data in a Cluster Outlier Object.

See scoring example.

Name		Name	Last commit message	Last commit date
Latest commit History 177 Commits
data		data
python		python
ClusterWorkbook.ipynb		ClusterWorkbook.ipynb
Feature Calculation Example.ipynb		Feature Calculation Example.ipynb
License.txt		License.txt
Q_Workbook.ipynb		Q_Workbook.ipynb
README.md		README.md
README.txt		README.txt
Scoring Example.ipynb		Scoring Example.ipynb
Summary_Table.ipynb		Summary_Table.ipynb
Weirdness_profile_example.ipynb		Weirdness_profile_example.ipynb
feature_key.txt		feature_key.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

python

python

ClusterWorkbook.ipynb

ClusterWorkbook.ipynb

Feature Calculation Example.ipynb

Feature Calculation Example.ipynb

License.txt

License.txt

Q_Workbook.ipynb

Q_Workbook.ipynb

README.md

README.md

README.txt

README.txt

Scoring Example.ipynb

Scoring Example.ipynb

Summary_Table.ipynb

Summary_Table.ipynb

Weirdness_profile_example.ipynb

Weirdness_profile_example.ipynb

feature_key.txt

feature_key.txt

Repository files navigation

KeplerML

Introduction

Feature Calculation:

Clustering:

Scoring:

About

Releases

Packages

Languages

License

d-giles/KeplerML

Folders and files

Latest commit

History

Repository files navigation

KeplerML

Introduction

Feature Calculation:

Clustering:

Scoring:

About

Resources

License

Stars

Watchers

Forks

Languages