Skip to content
/ DTrsiv Public

A collection of functions using data.table to efficiently clean large tables using a simplified syntax

License

Notifications You must be signed in to change notification settings

YoannPa/DTrsiv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DTrsiv

GitHub repo size GitHub issues GitHub closed issues

DTrsiv is a R package containing a collection of R data.table functions available to quickly and easily clean your data.
Everyone who wants is welcome to contribute!

Author: PAGEAUD Y.1
Contributors: Everyone who wants is welcome to contribute!
1- DKFZ - Division of Applied Bioinformatics, Germany.

GitHub R package version

GitHub last commit
GitHub

Prerequisites

Install devtools and data.table packages:

install.packages(pkgs = c("devtools", "data.table"))

Install

devtools::install_github("YoannPa/DTrsiv")

Content

dt_fun.R script contains functions related to R data.table formating:

  • dt.sub() for pattern matching and substitution applied on data.table object column-wise. It first identifies the columns containing any occurence matching the pattern and then applies the substitution considering only columns where the pattern matched, thus shortening execution time on data.table with many columns. It supports columns of type list.
  • dt.ls2c() converts data.table columns of type list to a type vector.
  • dt.rm.dup() removes duplicated columns based on their content (not on their names).
  • dt.rm.allNA() removes columns exclusively containing NAs from a data.table.
  • dt.int64tochar() converts columns of 'double.integer64' type into 'character' type.
  • dt.combine() combines values of partially duplicated columns from a data.table into new columns.

dt_chk.R script contains functions related to checking a R data.table content:

  • allNA.col() checks if any column contains exclusively NAs and returns their names if any with a warning.
  • best.merged.dt() looks for the best merging operation(s) between two data.tables trying a set of columns from the second one.

Problems ? / I need help !

For any questions Not related to bugs or development you can write me at y.pageaud@dkfz.de.

Technical questions / Development / Feature request

If you encounters issues or a feature you would expect is not part of DTrsiv functions available, please go to the DTrsiv Github repository click on the tab Issues and create an issue.

References

  1. Introduction to data.table: https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html
  2. Official R data.table Github repository: https://github.com/Rdatatable/data.table
  3. By-Group Processing, the R data.table and the Power of Open Source (22.02.2011) - Steve Miller

About

A collection of functions using data.table to efficiently clean large tables using a simplified syntax

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages