Skip to content

Latest commit

 

History

History
2170 lines (2025 loc) · 247 KB

CHANGES.md

File metadata and controls

2170 lines (2025 loc) · 247 KB

ADAM Changelog

Version 1.0.1

Closed issues:

  • Update Avro dependency version to 1.11.1 #2379
  • BAM/BED to parquet #2376
  • Update Spark dependency version to 3.3.0 #2375

Merged and closed pull requests:

Version 1.0.0

Closed issues:

  • Update bdg-formats dependency version to 1.0 #2371
  • Update bdg-utils dependency version to 1.0 #2370
  • Add maximum length parameter to countSliceKmers #2365
  • Add references parameter to transformSequences/transformSequences/countKmers etc. #2364
  • Add sort output parameter to countKmers/countSliceKmers #2363
  • Kmers overcounted in Slice countKmers #2362
  • Features without strand NPE in printFeatureAttributes #2357
  • Update Spark dependency version to 3.2.1 #2353
  • Add CITATION.cff for citation(s) #2329
  • Benchmark spark.kryo.unsafe config performance #2303
  • Clean up packaging of conversion methods #1170
  • Remove reliance on MD tags #622
  • Clean up Rich records #577
  • Create ADAM Benchmarking suite #120

Merged and closed pull requests:

  • Update bdg-formats dependency version to 1.0 #2373 (heuermh)
  • Update bdg-utils dependency version to 1.0 #2372 (heuermh)
  • [ADAM-2295] Add workaround for BAM files hosted on SRA s3 archive #2369 (heuermh)
  • Add count kmers methods to SequenceDataset #2368 (heuermh)
  • Add sort and maximum length arguments #2367 (heuermh)
  • [ADAM-2362] Account for duplicate kmers on left flank #2366 (heuermh)
  • Add single file argument to count kmers #2361 (heuermh)
  • Add IUPAC amino acid and nucleotide base alphabet with ambiguity #2360 (heuermh)
  • Update scala-guice dependency version to 5.0.2, guava to 31.1-jre #2359 (heuermh)
  • Feature strand may be null #2358 (heuermh)
  • Allow space in IntervalList species header value #2356 (heuermh)
  • Update spark dependency version to 3.2.1 #2354 (heuermh)

Version 0.37.0

Closed issues:

  • error when saveAsPairedFastq files #2350
  • Note Bioconda and Homebrew in readme and installation docs #2348
  • Migrate Github Actions JDK from 'adopt' to 'temurin' #2346
  • Update pom.xml url to Github repository #2342
  • Update bdg-formats to 0.17.0 release #2341
  • Note Spark version 3.2.0 or later is now required #2339
  • Update Parquet to 1.12.x, Avro to 1.10.x, remove build workarounds #2336
  • Extract multi region in one shot #2334
  • Add proteinId field to Feature #2333
  • Update Spark dependency version to 3.2.0 #2331
  • Missing SAM header when using pipe API #2322
  • saveAsSam format #2321
  • Remove Github Pages environment and branch #2313
  • ArrayIndexOutOfBoundsException when operating on VCF DataFrame doesn't indicate bad row #2275
  • InMemoryFileIndex$SerializableFileStatus is not registered #2187
  • Investigate registrator performance re classForName #2185
  • Keep dataset transformations on datasets where appropriate #2166
  • Code generate conversion functions from templates #2163
  • Replace dependency on Hadoop-BAM with Disq #2111
  • Explore alternatives for code generating Scala products and projections from Avro schema #2110
  • Support for Elastic Search #2071
  • loadIndexedBam pulls entire bam file on http files, even when predicate is specified #2057
  • Add API docs where unmapped reads are discarded #2022
  • Indel realignment discards all unmapped reads #2018
  • NullPointerException at htsjdk CramNormalizer.getByteOrDefault #1993
  • Add support for left semi, left anti region join #1990
  • Alternate allele depth is incorrect for gVCF reference models #1987
  • Add examples to all functions supported across languages #1962
  • Log validation errors back to driver #1950
  • Parallelize loading metadata #1902
  • Release cycle on pypi #1888
  • suffix arrays implementation #1872
  • ArrayIndexOutOfBoundsException in AbstractVCFCodec.oneAllele #1868
  • Allow users to configure partitioning for pipe APIs #1863
  • optimize SmithWaterman alignment #1850
  • Support pushing down range queries to CRAM index #1833
  • Exporting non-reference genotype likelihoods to VCF appears to fail for triploid sites #1776
  • Reconstructing variant contexts will be incorrect if a variant has different qualities across samples #1773
  • GenomicRDD.transform should allow preserving partitioning #1761
  • Add ability to normalize INDEL variants #1710
  • Inferred serializer for fast concatenation engine can stack overflow #1708
  • Write out BAI when saving sorted BAMs to disk #1654
  • Create bash completion file for adam-submit #1652
  • Make it easier to modify/interrogate metadata from Python/R #1604
  • Support ShuffleRegionJoins when an rdd is sorted by contig number #1519
  • Clean up replicated join code in INDEL realigner and BQSR #1429
  • Improve unit test coverage of Feature parser/encoder helper #1418
  • Write unit tests for CLI #1416
  • Write unit tests for ParquetFileTraversable #1415
  • Add regression test suite #1407
  • Address classes with lower than ideal coverage #1405
  • Genotype filtersPassed field not set correctly #1269
  • Need clear error messages when schema has backward-incompatible change #345
  • Annotate classes with their production worthiness #123

Merged and closed pull requests:

  • Add conda, homebrew, biocontainers installation docs #2352 (heuermh)
  • Migrate Github Actions JDK from 'adopt' to 'temurin' #2347 (heuermh)
  • [ADAM-2341] Update bdg-formats dependency version to 0.17.0 #2345 (heuermh)
  • [ADAM-2185] Reduce classForName calls to improve registrator performance #2344 (heuermh)
  • Update pom.xml url to Github repository #2343 (heuermh)
  • Update Scala dependency version to 2.12.15 #2340 (heuermh)
  • [ADAM-2333] Add proteinId field to Feature #2338 (heuermh)
  • Update github-changes-maven-plugin version to 1.2 #2337 (heuermh)
  • Bump adam-python development version to 0.37.0a0 #2332 (heuermh)
  • trim sam format readname #2325 (hxdhan)
  • Update Spark to 3.2.0, Parquet to 1.12.1, Avro to 1.10.2 #2289 (heuermh)
  • Refactor Genotype and GenotypeAnnotation #2117 (heuermh)
  • Keep information for <NON_REF> alleles #2116 (karenfeng)

Version 0.36.0

Closed issues:

  • Update bdg-formats dependency version to 0.16.0 #2328
  • Add sample to Sequence, Slice, and Read #2319
  • Update Spark dependency version to 3.1.2 #2317
  • Update Spark dependency version to 3.1.1 #2302
  • Consider merging ADAM2Fastq into TransformAlignments #2173
  • Benchmark DNA sequence encodings vs Parquet string column compression #2164

Merged and closed pull requests:

  • Update bdg-formats dependency version to 0.16.0 #2330 (heuermh)
  • Update commons-io dependency version to 2.11.0, scala-guice to 5.0.1, mockito-core to 3.11.2 #2327 (heuermh)
  • [ADAM-2319] Add sample to Sequence, Slice, and Read #2320 (heuermh)
  • [ADAM-2317] Update Spark dependency version to 3.1.2 #2318 (heuermh)
  • Add cache to deploy action #2316 (heuermh)
  • [ADAM-2302] Update Spark dependency version to 3.1.1 #2315 (heuermh)
  • Replace link to Freenode IRC with Libera.Chat #2314 (heuermh)

Version 0.35.0

Closed issues:

  • Migrate CI off Jenkins #2307
  • Provide access to header to model conversions #2304

Merged and closed pull requests:

Version 0.34.0

Closed issues:

  • Update Spark dependency version to 3.0.2 #2300
  • printInfoFields does not display value of Type=Flag VCF INFO fields #2299
  • Add CRAM reference argument to CLI for loading alignments and fragments #2293
  • Duplicate version 0.33.0 release notes in changelog #2290
  • adam-submit: Not able to transform fasta file to adam format. #2288
  • Bump Scala 2.12 dependency version to 2.12.10 #2287
  • Default build to Hadoop 3.x #2285
  • Remove Scala 2.11.x from build and release matrix #2284
  • Remove Spark 2.x from build and release matrix #2283
  • All of my samples fail due to adam erroneously thinking there are different numbers of reads in r1 and r2. #2247
  • Rename sequences SequenceDictionary field to references #2171
  • Replace jvmRdd with jvmDataset in adam-python #2134
  • Refactor org.bdgenomics.adam.rdd package to org.bdgenomics.adam.ds #2112
  • FastaConverter causes mango to crash due to missing contigNames #2038
  • Verify LICENSE and NOTICE per upstream changes #2005
  • Add convenience method to filter by contig #1877
  • Confirm all I/O resources created by ADAM are closed properly #1719
  • Remove Smith-Waterman consensus mode #1414

Merged and closed pull requests:

  • [ADAM-2300] Update Spark dependency version to 3.0.2 #2301 (heuermh)
  • [ADAM-2171] Rename sequences SequenceDictionary field to references #2298 (heuermh)
  • Bump adam-python version to 0.34.0a0 #2297 (heuermh)
  • [ADAM-2112] Refactor org.bdgenomics.adam.rdd package to org.bdgenomics.adam.ds #2296 (heuermh)
  • [ADAM-2293] Add CRAM reference argument to CLI for loading alignments and fragments #2294 (heuermh)
  • [ADAM-1877] Add convenience method to filter to reference name #2292 (heuermh)
  • [ADAM-2290] Remove duplicate version 0.33.0 release notes in changelog #2291 (heuermh)
  • Update build and release matrix #2286 (heuermh)
  • [ADAM-2112] Refactor org.bdgenomics.adam.rdd package to org.bdgenomics.adam.ds #2137 (heuermh)

Version 0.33.0

Closed issues:

  • Update Spark dependency to version 3.0.1 #2273
  • Add cite DOI link to command line --version output #2270
  • Support Hadoop 3.2.x in build #2267
  • Default build to Spark 3/Scala 2.12 #2266

Merged and closed pull requests:

  • Performance improvements to SAM reading and processing #2280 (benraha)
  • [ADAM-2150] Use interval start as position for Ensembl VEP ANN attributes #2278 (heuermh)
  • Update Spark 2.x dependency version to 2.4.7 #2277 (heuermh)
  • Use explicit provided scope for scala-library dependency #2276 (heuermh)
  • Update Spark dependency version to 3.0.1 #2274 (heuermh)
  • [ADAM-2270] Add cite DOI link to command line --version output #2271 (heuermh)
  • Default doc links to Spark 3/Scala 2.12 #2269 (heuermh)
  • [ADAM-2266] Default build to Spark 3/Scala 2.12 #2268 (heuermh)

Version 0.32.0

Closed issues:

  • Jenkins build not producing Spark 3/Scala 2.12 snapshot artifacts #2265
  • Use SCALA_VERSION instead of SCALAVER in Jenkins config #2264
  • Update Spark dependency version to 2.4.6 #2262
  • Spark Issue + Could not initialize class org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter #2261
  • Bump slf4j dependency to 1.7.30+ #2259
  • Load Multiple Files into the ADAM Variant schema #2258
  • Consider ParseMode to replace ValidationStringency #2251
  • NoClassDefError org.apache.spark.AccumulableParam on Spark 3.0.0 preview #2250
  • Create a build configuration for Spark 3 preview release(s) #2237
  • Drop python 2 support for pyadam #2221
  • NoSuchMethodError: shaded.parquet.org.apache.thrift.EncodingUtils.setBit(BIZ)B #2157
  • Push to CRAN #1851

Merged and closed pull requests:

Version 0.31.0

Closed issues:

  • Add deprecated annotations for code to be removed to support Spark 3 #2254
  • Update bdg-utils dependency version to 0.2.16 #2252
  • Bump Apache Spark dependency version to 2.4.5 #2248
  • FastqRecordConvert incompatible with single tube long fragment read headers #2246
  • Bam files with no unmapped reads fails to sort #2242
  • Unit test failure when building from release tarball #2241
  • Adam without HDFS #2238
  • Jenkins build status icon link is broken #2228
  • Write block-gzipped (bgzf) feature formats #2191
  • adam-submit is not exiting until I hit ctrl+C #2040
  • WARN VariantContextConverter:924 - Ran into Array Out of Bounds when accessing indices 0,1,2 of genotype . #2024
  • Add doc for running on HPC with PBS #2002
  • loadFastq with paired gzipped FASTQ files fails via s3a URLs #1855
  • Where to put lift over function #1811
  • Add transform to fix chromosome prefixes to genomic RDDs and CLIs #1757
  • Support using Spark-BAM to load BAM files #1683
  • Handling Validation Stringency without repeated code #1572
  • New model PartitionMap for Array[Option[(ReferenceRegion, ReferenceRegion)]] #1558
  • Revisit double-negative command line options (e.g. -disable_fast_concat) #1503
  • Improve test coverage for SAMRecord<->AlignmentRecord #1284
  • Allow alphabets to canonicalize strings #797
  • Update MdTag.getReference for CIGAR N #742
  • Replace contig length maps with sequence dictionary #572
  • Use tool like Scala Refactoring to enforce import guidelines #445

Merged and closed pull requests:

  • [ADAM-2254] Add deprecated annotations for code to be removed to support Spark 3 #2256 (heuermh)
  • [ADAM-2252] Update bdg-utils dependency version to 0.2.16 #2253 (heuermh)
  • [ADAM-2248] Bump Apache Spark dependency version to 2.4.5 #2249 (heuermh)
  • [ADAM-2241] Commit template substitution may not be available if building from tarball #2243 (heuermh)
  • [ADAM-2228] Remove Jenkins build status badge #2240 (heuermh)
  • remove 2.7 support checks #2222 (akmorrow13)
  • [ADAM-2023] Implemented Duplicate Marking algorithm in Spark SQL #2045 (jonpdeaton)
  • use readlink to properly source source dir #2036 (mtdeguzis)
  • Don't discard unmapped reads in indel realignment #2019 (pauldwolfe)
  • Refactor/mark buckets #2015 (jondeaton)
  • Adding a BamLoader class to have only 1 header parse for multiple ind… #1966 (ffinfo)
  • Added additional arguments to GenomicRDD.pipe() #1758 (gunjanbaid)
  • Migrate bdg-formats to new adam-formats module. #1689 (heuermh)
  • [ADAM-1683] Pull in Spark-BAM as a secondary loading path. #1686 (fnothaft)
  • Add SortedGenomicRDD trait, refactor shuffle joins and pipe #1590 (fnothaft)
  • [ADAM-1513] Strandedness for FeatureRDDs #1555 (devin-petersohn)

Version 0.30.0

Closed issues:

  • Github changes plugin used in release script does not use two-factor authentication #2235
  • Update bdg-formats dependency version to 0.15.0 #2233
  • 7 tests failing on HEAD #2231
  • BUILD FAILURE - Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.5.0:java #2227
  • GenomicDataset saveAsParquet incorrectly named parameter compressCodec #2224
  • Add printAttributes methods for Reads, Sequences, Slices #2219
  • Add default Set.empty to printAttributes key method parameter #2218
  • Add Avro-friendly ctrs in rdd.variant package #2215
  • Cannot resolve adam-shade-spark2_2.11 dependency #2211

Merged and closed pull requests:

  • [ADAM-2235] Update github-changes-maven-plugin dependency version to 1.1 #2236 (heuermh)
  • [ADAM-2233] Update bdg-formats dependency version to 0.15.0. #2234 (heuermh)
  • Update maven plugin dependency versions. #2230 (heuermh)
  • [ADAM-2224] Complete refactoring of compressionCodec for named parameter. #2229 (heuermh)
  • [ADAM-2224] Use compressionCodec for named parameter. #2226 (heuermh)
  • [ADAM-2219] Add printAttributes methods for Reads, Sequences, Slices #2223 (heuermh)
  • [ADAM-2218] Add default Set.empty to printAttributes key method parameter. #2220 (heuermh)
  • Rename AlignmentRecord to Alignment. #2217 (heuermh)
  • [ADAM-2215] Add Avro-friendly ctrs to rdd.variant package #2216 (heuermh)

Version 0.29.0

Closed issues:

  • Bump bdg-formats dependency version to 0.14.0 #2208
  • Bump Apache Spark dependency version to 2.4.4 #2202
  • Add missing loadVariantContexts(String, ValidationStringency) method #2197
  • Jenkins builds failing due to Coveralls API submission #2194
  • Confirm block-gzipped (bgzf) interleaved FASTQ is supported #2193
  • TransformGenotype/Variant do not support compressed VCF #2190
  • Add htsjdk conversion methods to VariantContextDataset #2189
  • TransformVariants is missing partition arguments #2188
  • StackOverflowError when saving to BAM in adam-shell #2186
  • loadFastaDna usage not obvious due to default method parameter #2183
  • loadFastaDna does not seem to work #2182
  • kryo buffer overflow when converting fastas from CLI to adam #1660

Merged and closed pull requests:

  • [ADAM-2208] Bump bdg-formats dependency version to 0.14.0 #2209 (heuermh)
  • Add FASTA in formatter for sequence datasets #2207 (heuermh)
  • Remove Avro 1.8.x download step from Jenkins Scala 2.12 installation. #2206 (heuermh)
  • Use qualityScores for base quality scores #2205 (heuermh)
  • [ADAM-2189] Add htsjdk conversion methods to VariantContextDataset #2204 (heuermh)
  • [ADAM-2202] Bump Apache Spark dependency version to 2.4.4. #2203 (heuermh)
  • [ADAM-2183] Drop default value for maximumLength #2201 (heuermh)
  • [ADAM-2197] Add missing loadVariantContexts(String, ValidationStringency) method #2200 (heuermh)
  • [ADAM-2194] Disable coveralls reporting from Jenkins test script #2196 (heuermh)
  • [ADAM-2188] Add partition cli args to TransformVariants,Features. #2192 (heuermh)
  • Bump htsjdk dependency version to 2.19.0 #2184 (heuermh)
  • Update required Maven version in docs #2181 (heuermh)

Version 0.28.0

Closed issues:

  • Bump bdg-formats dependency version to 0.13.0 #2177
  • Rename reads to alignments in methods where appropriate #2172
  • Add command line option re: creating references from FASTA sources #2168
  • Add command line support for loading references in TransformFeatures #2167
  • Add load methods for data frames #2159
  • Transform VCF to adam file not found exception. #2076
  • NoClassDefFoundError: javax/tools/ToolProvider on openjdk 10.0.2 #2030
  • NotSerializableException: com.netflix.servo.monitor.LongGauge #1952
  • Should NucleotideContigFragmentRDD create sequence dictionary on load? #1894
  • converting fasta to adam eats a huge ammount of time and memory #1891
  • Support minPartitions parameter across load calls #1792
  • make reading fasta less memory hungry #1458
  • Improve unit test coverage for NucleotideContigFragmentRDD #1413
  • Support for INSDC Sequence records (i.e., Genbank/EMBL format)? #1219

Merged and closed pull requests:

  • [ADAM-2177] Bump bdg-formats dependency version to 0.13.0 #2178 (heuermh)
  • [ADAM-2172] Rename reads to alignments in methods where appropriate #2176 (heuermh)
  • [ADAM-1891] Reimplement FASTA sequence and slice converters for performance #2175 (heuermh)
  • [ADAM-2168] Add command line option re: creating references from FASTA sources #2170 (heuermh)
  • [ADAM-2167] Add command line support for loading references in TransformFeatures #2169 (heuermh)
  • bump adam-python version #2165 (akmorrow13)
  • Convert fragment dataset to alignment dataset directly #2162 (heuermh)
  • [ADAM-2159] Add load methods for data frames #2158 (heuermh)
  • Post 0.27.0 release cleanup and doc fixes. #2155 (heuermh)
  • Add direct conversion from DatasetBoundFragmentRDD to DatasetBoundAli… #2016 (henrydavidge)
  • Add ADAMContext APIs to create genomic RDDs from dataframes #2000 (henrydavidge)
  • Adding ReadRDD, SequenceRDD, and SliceRDD. #1895 (heuermh)

Version 0.27.0

Closed issues:

  • Add Scala 2.12 artifacts to release script #2153
  • Tried to access method org.apache.avro.specific.SpecificData.()V from class ProcessingStep #2151
  • Update maven-jar-plugin dependency version to 3.1.2 #2147
  • Homebrew and Bioconda packages fail against Spark 2.4.2 #2146
  • Add Spark 2.4.3 and Scala 2.12 to Jenkins build #2145
  • Can encounter empty reduce when BAM header fails validation #2143
  • Build failing in jenkins from Spark 2.2.3 #2139
  • Make SamRecordConverter public #2138
  • python API does not match API #2127
  • Error when run : mvn install #2123
  • Always use Spark SQL in GenomicDataset read path #2114
  • Update bdg-utils dependency version to 0.2.14 #2106
  • NoSuchMethodError: org.apache.parquet.column.ParquetProperties.getAllocator()Lorg/apache/parquet/bytes/ByteBufferAllocator #2098
  • ClassNotFoundException: org.apache.avro.message.BinaryMessageEncoder #2091
  • Release script needs to touch Version in R DESCRIPTION file #2089
  • org.apache.avro.SchemaParseException: Can't redefine: list #2058
  • Support Spark 2.4 and Scala 2.12 #2044
  • Fail early when output directory already exists #2034
  • NoClassDefFoundError o.a.parquet.hadoop.metadata.CompressionCodecName #1742
  • Log with parameterized messages consistently for performance #1712

Merged and closed pull requests:

  • [ADAM-2153] Add Scala 2.12 artifacts to release script #2154 (heuermh)
  • [ADAM-2089] Bump Version in R DESCRIPTION file #2152 (heuermh)
  • [ADAM-2145] Add Spark 2.4.3 and Scala 2.12 to Jenkins build #2149 (heuermh)
  • [ADAM-2147] Update maven-jar-plugin dependency version to 3.1.2. #2148 (heuermh)
  • [ADAM-2143] Use fold instead of reduce when loading SAM/BAM/CRAM headers #2144 (fnothaft)
  • Remove parquet-scala dependency from dependencyManagement. #2142 (heuermh)
  • [ADAM-2139] Update Spark version to 2.3.3 for Jenkins test #2141 (heuermh)
  • [ADAM-1712] Replace utils.Logger with grizzled.slf4j.Logger #2136 (heuermh)
  • [ADAM-2034] Check output path is writeable before running transformations #2135 (heuermh)
  • jenkins scripts deletes conda envs #2133 (akmorrow13)
  • Update htsjdk dependency version to 2.18.2 #2132 (heuermh)
  • [ADAM-2127] Update python doc per GenomicRdd --> GenomicDataset change #2128 (heuermh)
  • Update python and R versions. #2126 (heuermh)
  • use parquet-scala_2.11 fork #2108 (ryan-williams)
  • [ADAM-2106] Update bdg-utils dependency version to 0.2.14 #2107 (heuermh)
  • [ADAM-2044] Update Spark version to 2.4.3, add move to Scala 2.12 script #2056 (heuermh)

Version 0.26.0

Closed issues:

  • Bump Spark dependency to version 2.3.3 #2120
  • Update Spark version on Jenkins to 2.2.3 #2115
  • Inverted duplicates are not found in mark duplicates #2102
  • Py4JError: org.bdgenomics.adam.algorithms.consensus.ConsensusGenerator.fromKnowns does not exist in the JVM #2099
  • Update Bioconda recipe for ADAM 0.25.0 #2088
  • Update Homebrew formula for ADAM 0.25.0 #2087
  • Error: Dependency package(s) 'SparkR' not available #2086
  • Java-friendly indel realignment method doesn't allow passing reference #2013
  • Use consistent (Scala-specific) (Java-specific) qualifiers in method scaladoc #1986
  • Clarify GenomicRDD vs. GenomicDataset name #1954
  • Support validation stringency in out formatters #1949
  • Compute coverage by sample #1498

Merged and closed pull requests:

  • Bump bdg-formats dependency to version 0.12.0. #2124 (heuermh)
  • [ADAM-2120] Bump Spark dependency to version 2.3.3. #2121 (heuermh)
  • Filter supplemental reads from scoring #2119 (pauldwolfe)
  • [ADAM-2115] Update Spark version on Jenkins to 2.2.3. #2118 (heuermh)
  • Refactor AlignmentRecord, RecordGroup, and ProcessingStep #2113 (heuermh)
  • removed anaconda requirement for venv during jenkins test #2109 (akmorrow13)
  • Propagate read negative flag to SAM records for unmapped reads #2105 (henrydavidge)
  • Add consensus targets to realignment targets #2104 (pauldwolfe)
  • [ADAM-2099] Add python realignIndelsFromKnownIndels method #2103 (heuermh)
  • [ADAM-2102] Inverted duplicates are not found in mark duplicates #2101 (pauldwolfe)
  • Rename contig to reference #2100 (heuermh)
  • [ADAM-1986] Add java-specific methods where missing. #2097 (heuermh)
  • [ADAM-2013] Add java-friendly indel realignment method that accepts reference. #2095 (heuermh)
  • Use build-helper-maven-plugin for build timestamp #2093 (heuermh)
  • bump adam-python version to 0.25.0a0 #2092 (akmorrow13)
  • [ADAM-2085] Update R installation docs re: libgit2 and SparkR. #2090 (heuermh)
  • [ADAM-1954] Complete refactoring GenomicRDD to GenomicDataset. #1981 (heuermh)
  • [ADAM-1949] Support validation stringency in out formatters. #1969 (heuermh)

Version 0.25.0

Closed issues:

  • Expand illumina metadata regex to include "N" character #2079
  • Remove support for Hadoop 2.6 #2073
  • NumberFormatException: For input string: "nan" in VCF #2068
  • Support Spark 2.3.2 #2062
  • Arrays should be passed to HTSJDK in the JVM primitive type #2059
  • toCoverage() function for alignments does not distinguish samples #2049
  • Building from adam-core module directory fails to generate Scala code for sql package #2047
  • Data Sets #2043
  • saveAsBed writes missing score values as '.' instead of '0' #2039
  • Fix GFF3 parser to handle trailing FASTA #2037
  • Add StorageLevel as an optional parameter to loadPairedFastq #2032
  • Error: File name too long when building on encrypted file system #2031
  • Fail to transform a VCF file containing multiple genome data (Muliple sample) #2029
  • Dataset and RDD constructors are missing from CoverageRDD #2027
  • How to create a single RDD[Genotype] object out of multiple VCF files? #2025
  • ReadTheDocs github banner is broken #2020
  • -realign_indels throws serialization error with instrumentation enabled #2007
  • Support 0 length FASTQ reads #2006
  • Speed of Reading into ADAM RDDs from S3 #2003
  • Support Python 3 #1999
  • Unordered list of region join types in doc is missing nested levels #1997
  • Add VariantContextRDD.saveAsPartitionedParquet, ADAMContext.loadPartitionedParquetVariantContexts #1996
  • VCF annotation question #1994
  • Fastq reader clips long reads at 10,000 bp #1992
  • adam-submit Error: Number of executors must be a positive number on EMR 5.13.0/Spark 2.3.0 #1991
  • Test against Spark 2.3.1, Parquet 1.8.3 #1989
  • END does not get set when writing a gVCF #1988
  • Support saving single files to filesystems that don't implement getScheme #1984
  • Add additional filter by convenience methods #1978
  • Limiting FragmentRDD pipe paralellism #1977
  • Consider javadoc.io for API documentation linking #1976
  • FASTQ Reader leaks connections #1974
  • Update bioconda recipe for version 0.24.0 #1971
  • Update homebrew formula at brewsci/homebrew-bio for version 0.24.0 #1970
  • loadPartitionedParquetAlignments fails with Reference.all #1967
  • Caused by: java.lang.VerifyError: class com.fasterxml.jackson.module.scala.ser.ScalaIteratorSerializer overrides final method withResolved #1953
  • FASTQ input format needs to support index sequences #1697
  • Changelog must be edited and committed manually during release process #936

Merged and closed pull requests:

  • added pyspark mock modules for API documentation #2084 (akmorrow13)
  • Added mock python modules for API python documentation #2082 (akmorrow13)
  • [ADAM-2079] Expand illumina metadata regex to include "N" character #2081 (pauldwolfe)
  • ADAM-2079 Added "N" to regexs for illumina metadata #2080 (pauldwolfe)
  • Update docs with new template and documentation #2078 (akmorrow13)
  • [ADAM-1992] Make maximum FASTQ read length configurable. #2077 (heuermh)
  • [ADAM-2059] Properly pass back primitive typed arrays to HTSJDK. #2075 (heuermh)
  • Update dependency versions, including htsjdk to 2.16.1 and guava to 27.0-jre #2072 (heuermh)
  • [ADAM-1999] Support Python 3 #2070 (akmorrow13)
  • [ADAM-2068] Prevent NumberFormatException for nan vs NaN in VCF files. #2069 (heuermh)
  • Update python MAKE file #2067 (Georgehe4)
  • Update python MAKE file #2066 (Georgehe4)
  • Update jenkins script to test python 3.6 #2060 (Georgehe4)
  • [ADAM-2062] Update Spark version to 2.3.2 #2055 (heuermh)
  • Clean up fields and doc in fragment. #2054 (heuermh)
  • [ADAM-2037] Support GFF3 files containing FASTA formatted sequences. #2053 (heuermh)
  • modified CoverageRDD and FeatureRDD to extend MultisampleGenomicDataset #2051 (akmorrow13)
  • Multi-sample coverage #2050 (akmorrow13)
  • [ADAM-2047] Use source directory relative to project.basedir for adam codegen. #2048 (heuermh)
  • [ADAM-2039] Adding support for writing BED format per UCSC definition #2042 (heuermh)
  • Update Jenkins Spark version to 2.2.2 #2035 (akmorrow13)
  • [ADAM-2032] Add StorageLevel as an optional parameter to loadPairedFastq #2033 (heuermh)
  • [ADAM-2027] Add RDD and Dataset constructors to CoverageRDD. #2028 (heuermh)
  • Allow for export of query name sorted SAM files #2026 (karenfeng)
  • [ADAM-2020] Fix ReadTheDocs Github banner. #2021 (fnothaft)
  • [ADAM-1988] Add copyVariantEndToAttribute method to support gVCF END attribute … #2017 (heuermh)
  • [ADAM-936] Use github-changes-maven-plugin to update CHANGES.md. #2014 (heuermh)
  • [ADAM-1992] Make maximum FASTQ read length configurable. #2011 (fnothaft)
  • [ADAM-1697] Expand Illumina metadata regex to cover interleaved index sequences. #2010 (heuermh)
  • [ADAM-2007] Make IndelRealignmentTarget implement Serializable. #2009 (fnothaft)
  • [ADAM-2006] Support loading 0-length reads as FASTQ. #2008 (fnothaft)
  • [ADAM-1697] Expand Illumina metadata regex to cover index sequences #2004 (pauldwolfe)
  • [ADAM-1996] Load and save VariantContexts as partitioned Parquet. #2001 (heuermh)
  • [ADAM-1997] Nest list of region join types in joins doc. #1998 (heuermh)
  • [ADAM-1877] Add filterToReferenceName(s) to SequenceDictionary. #1995 (heuermh)
  • [ADAM-1984] Support file systems that don't set the scheme. #1985 (fnothaft)
  • [ADAM-1978] Add additional filter by convenience methods. #1983 (heuermh)
  • Adding printAttribute methods for alignment records, features, and samples. #1982 (heuermh)
  • Fix partitioning code to use Long instead of Int #1980 (fnothaft)
  • [ADAM-1976] Adding core API documentation link and badge. #1979 (heuermh)
  • [ADAM-1974] Close unclosed stream in FastqInputFormat. #1975 (fnothaft)
  • Set defaults to schemas #1972 (ffinfo)
  • Add loadPairedFastqAsFragments method. #1866 (heuermh)
  • Adding loadPairedFastqAsFragments method #1828 (ffinfo)

Version 0.24.0

Closed issues:

  • Phred values from 156–254 do not round trip properly between log space #1964
  • Support VCF lines with positions at 0 #1959
  • Don't initialize non-ref values to Int.MinValue #1957
  • Support downsampling in recalibration #1955
  • Cannot waive validation stringency for INFO Number=.,Type=Flag fields #1939
  • Clip phred scores below Int.MaxValue #1934
  • ADAMContext.getFsAndFilesWithFilter should throw exception if paths null or empty #1932
  • Bump to Spark 2.3.0 #1931
  • util.FileExtensions should be public for use downstream in Cannoli #1927
  • Reduce logging level for ADAMKryoRegistrator #1925
  • Revisit performance implications of commit 1eed8e8 #1923
  • add akmorrow13 to PyPl for bdgenomics.adam #1919
  • Read the Docs build failing with TypeError: super() argument 1 must be type, not None #1917
  • Bump Hadoop-BAM dependency to 7.9.2. #1915
  • cannot run pyadam from adam distribution 0.23.0 #1914
  • adam2fasta/q are missing asSingleFile, disableFastConcat #1912
  • Pipe API doesn't properly handle multiple arguments and spaces #1909
  • Bump to HTSJDK 2.13.2 #1907
  • S3A error: HTTP request: Timeout waiting for connection from pool #1906
  • InputStream passed to VCFHeaderReader does not get closed #1900
  • Support INFO fields set to missing #1898
  • CLI to transfer between cloud storage and HDFS #1896
  • Jenkins does not run python or R tests #1889
  • pyadam throws application option error #1886
  • ReferenceRegion in python does not exist #1884
  • Caching GenomicRDD in pyspark #1883
  • adam-submit aborts if ADAM_HOME is set #1882
  • Allow piped commands to timeout #1875
  • loadVcf does not dedupe sample ID #1874
  • Add coverage command for reporting read coverage #1873
  • Only python 2? #1871
  • Support VariantContextRDD from SQL #1867
  • Cannot find find-adam-assembly.sh in bioconda build #1862
  • _jvm.java.lang.Class.forName does not work for certain configurations #1858
  • Formatting error in CHANGES.md #1857
  • Various improvements to readthedocs documentation #1853
  • add filterByOverlappingRegion(query: ReferenceRegion) to R and python APIs #1852
  • Support adding VCF header lines from Python #1840
  • Support loadIndexedBam from Python #1836
  • Add link to awesome list of applications that extend ADAM #1832
  • loadIndexed bam lazily throws Exception if index does not exist #1830
  • OAuth credentials for Github in Coveralls configuration are no longer valid #1829
  • base counts per position #1825
  • Issues loading BAM files in Google FS #1816
  • Error when writing a vcf file to Parquet #1810
  • transformAlignments cannot repartition files #1808
  • GenotypeRDD should support toVariants method #1806
  • Add support for python and R in Homebrew formula #1796
  • Add transformVariantContexts or similar to cli #1793
  • Issue while using Sorting option #1791
  • Issue with adam2vcf #1787
  • Remove explicit <compile> scopes from submodule POMs #1786
  • java.nio.file.ProviderNotFoundException (Provider "s3" not found) #1732
  • Accessing GenomicRDD join functions in python #1728
  • ArrayIndexOutOfBoundsException in PhredUtils$.phredToSuccessProbability #1714
  • Add ability to specify region bounds to pipe command #1707
  • Unable to run pyadam, SQLException: Failed to start database 'metastore_db' #1666
  • SAMFormatException: Unrecognized tag type: ^@ #1657
  • IndexOutOfBoundsException in BAMInputFormat.getSplits #1656
  • overlaps considers that Strand.FORWARD cannot overlap with Strand.INDEPENDENT #1650
  • migration converters #1629
  • RFC: Removing Spark 1.x, Scala 2.10 support in 0.24.0 release #1597
  • Eliminate unused ConcreteADAMRDDFunctions class #1580
  • Add set theory/statistics packages to ADAM #1533
  • Evaluate Apache Carbondata INDEXED column store file format for genomics #1527
  • Stranded vs unstranded in getReferenceRegions() for features #1513
  • Question:How to tranform a line of sam to AlignmentRecord? #1425
  • Excessive compilation warnings about multiple scala libraries #695
  • Support Hive-style partitioning #651

Merged and closed pull requests:

  • [ADAM-1964] Lower point where phred conversions are done using log code. #1965 (fnothaft)
  • Add utility methods for adam-shell. #1958 (heuermh)
  • [ADAM-1955] Add support for downsampling during recalibration table generation #1963 (fnothaft)
  • [ADAM-1957] Don't initialize missing likelihoods to MinValue. #1961 (fnothaft)
  • [ADAM-1959] Support VCF rows at position 0. #1960 (fnothaft)
  • [ADAM-651] Implement Hive-style partitioning by genomic range of Parquet backed datasets #1948 (fnothaft)
  • [ADAM-1914] Python profile needs to be specified for egg to be in distribution. #1946 (fnothaft)
  • [ADAM-1917] Delete dependency on fulltoc. #1944 (fnothaft)
  • [ADAM-1917] Try 3: fix Sphinx fulltoc. #1943 (fnothaft)
  • [ADAM-1917] Set Sphinx version in requirements.txt. #1942 (fnothaft)
  • [ADAM-1917] Set minimal Sphinx version for Readthedocs build. #1941 (fnothaft)
  • [ADAM-1939] Allow validation stringency to waive off FLAG arrays. #1940 (fnothaft)
  • [ADAM-1915] Bump to Hadoop-BAM 7.9.2. #1938 (fnothaft)
  • [ADAM-1934] Clip phred values to 3233, instead of Int.MaxValue. #1936 (fnothaft)
  • Ignore VCF INFO fields with number=G when stringency=LENIENT #1935 (jpdna)
  • [ADAM-1931] Bump to Spark 2.3.0. #1933 (fnothaft)
  • [ADAM-1840] Support adding VCF header lines from Python. #1930 (fnothaft)
  • [ADAM-1927] Increase visibility for util.FileExtensions for use downstream. #1929 (heuermh)
  • [ADAM-1925] Reduce logging level for ADAMKryoRegistrator. #1928 (heuermh)
  • [ADAM-1923] Revert 1eed8e8 #1926 (fnothaft)
  • Use SparkFiles.getRootDirectory in local mode. #1924 (heuermh)
  • [ADAM-651] Implement Hive-style partitioning by genomic range of Parquet backed datasets #1922 (jpdna)
  • Make Spark SQL APIs supported across all types #1921 (fnothaft)
  • [ADAM-1909] Refactor pipe cmd parameter from String to Seq[String]. #1920 (heuermh)
  • Add Google Cloud documentation #1918 (Georgehe4)
  • [ADAM-1917] Load sphinxcontrib.fulltoc with imp.load_sources. #1916 (akmorrow13)
  • [ADAM-1912] Add asSingleFile, disableFastConcat to adam2fasta/q. #1913 (heuermh)
  • [ADAM-651] Hive-style partitioning of parquet files by genomic position #1911 (jpdna)
  • Minor unit test/style fixes. #1910 (heuermh)
  • [ADAM-1907] Bump to HTSJDK 2.13.2. #1908 (fnothaft)
  • [ADAM-1882] Don't abort adam-submit if ADAM_HOME is set. #1905 (fnothaft)
  • [ADAM-1806] Add toVariants conversion from GenotypeRDD. #1904 (fnothaft)
  • [ADAM-1882] Return true if ADAM_HOME is set, not exit 0. #1903 (heuermh)
  • [ADAM-1900] Close stream after reading VCF header. #1901 (fnothaft)
  • [ADAM-1898] Support converting INFO fields set to empty ('.'). #1899 (fnothaft)
  • Add Kryo registration for two classes required for Spark 2.3.0. #1897 (jpdna)
  • [ADAM-1853] Various improvements to readthedocs documentation. #1893 (heuermh)
  • [ADAM-1889][ADAM-1884] updated ReferenceRegion in python #1892 (akmorrow13)
  • [ADAM-1889] Run R/Python tests. #1890 (fnothaft)
  • [ADAM-1886] fix for pyadam to recognize >1 egg file #1887 (akmorrow13)
  • [ADAM-1883] Python and R caching #1885 (akmorrow13)
  • [ADAM-1875] Add ability to timeout a piped command. #1881 (fnothaft)
  • [ADAM-1871] Fix print call that broke python 3 support. #1880 (fnothaft)
  • [ADAM-1832] Use awesome list style and link to bigdatagenomics/awesome-adam. #1879 (heuermh)
  • [ADAM-651] Hive-style partitioning of parquet files by genomic position #1878 (jpdna)
  • [ADAM-1874] Dedupe samples when loading VCFs. #1876 (fnothaft)
  • Fixes Coverage python API and adds tests #1870 (akmorrow13)
  • added filterByOverlappingRegion for python #1869 (akmorrow13)
  • Add command line option for populating nested variant.annotation field in Genotype records. #1865 (heuermh)
  • Hive partitioned(v4) rebased #1864 (jpdna)
  • [ADAM-1597] Move to Scala 2.11 and Spark 2.x. #1861 (heuermh)
  • [ADAM-1857] Fix formatting error due to forward slashes. #1860 (heuermh)
  • [ADAM-1858] Use getattr instead of Class.forName from python API. #1859 (fnothaft)
  • [ADAM-1836] Adds loadIndexedBam API to Python and Java. #1837 (fnothaft)
  • Added check for bam index files in loadIndexedBam #1831 (akmorrow13)
  • [ADAM-1793] Adding vcf2adam and adam2vcf that handle separate variant and genotype data. #1794 (heuermh)
  • added adam notebook #1778 (akmorrow13)
  • [ADAM-1666] SQLContext creation fix for Spark 2.x #1777 (akmorrow13)
  • Add optional accumulator for VCF header lines to VCFOutFormatter. #1727 (heuermh)
  • add hive style partitioning for contigName #1620 (jpdna)
  • Add loadReadsFromSamString function into ADAMContext #1434 (xubo245)

Version 0.23.0

Closed issues:

  • Readthedocs build error #1854
  • Add pip release to release scripts #1847
  • Publish scaladoc script still attempts to build markdown docs #1845
  • Allow variant annotations to be loaded into genotypes #1838
  • Specify correct extensions for SAM/BAM output #1834
  • Fix link anchors and other issues in readthedocs #1822
  • Sphinx fulltoc is not included #1821
  • Readme link to bigdatagenomics/lime 404s #1819
  • Bump to Hadoop-BAM 7.9.1 #1817
  • LoadVariants Header Format #1815
  • Right and Left Outer Shuffle Region Join don't match #1813
  • Pipe command can fail with empty partitions #1807
  • adam files with outdated formats throw FileNotFoundException #1804
  • Move GenomicRDD.writeTextRDD outside of GenomicRDD #1803
  • find-adam-assembly fails to recognize more than 1 jar #1801
  • tests/testthat.R failed on git head #1799
  • Run python and R tests conditionally in build #1795
  • scala-lang should be a provided dependency #1789
  • loadIndexedBam does an unnecessary union #1784
  • Release bdgenomics.adam R package on CRAN #1783
  • Issue with transformVariant // Adam to vcf #1782
  • Add code of conduct #1779
  • Reinstantiation of SQLContext in pyadam ADAMContext #1774
  • Genotypes should only contain the core variant fields #1770
  • Add SingleFASTQInFormatter #1768
  • INDEL realigner can emit negative partition IDs #1763
  • Request for a new release #1762
  • INDEL realigner generates targets for reads with more than 1 INDEL #1753
  • Fragment Issue #1752
  • Variant Caller!!! #1751
  • Spark Version!! #1750
  • ReferenceRegion.subtract eliminating valid regions #1747
  • New Shuffle Join Implementation - Left Outer + Group By Left #1745
  • command failure after build success #1744
  • Recalibrate_base_Qualities #1743
  • Standardize regionFn for ShuffleJoin returned objects #1740
  • Shuffle, Broadcast Joins with threshold #1739
  • Adam on Spark 2.1 #1738
  • Opening up permission on GenericGenomicRDD constructor #1735
  • Consistency on ShuffleRegionJoin returns #1734
  • vcf2adam support #1731
  • Cloud-scale BWA MEM #1730
  • Aligned Human Genome couldn't convert to Adam #1729
  • Mark Duplicates #1726
  • Genomics Pipeline #1724
  • .fastq Alignment #1723
  • Is it correct Adam file #1720
  • .fastQ to .adam #1718
  • Unable to create .adam from .sam #1717
  • Add adam- prefix to distribution module name #1716
  • Python load methods don't have ability to specify validation stringency #1715
  • NPE when trying to map loadVariants over RDD #1713
  • Add left normalization of INDELs as an RDD level primitive #1709
  • Allow validation stringency to be set in AnySAMOutFormatter #1703
  • InterleavedFastqInFormatter should sort by readInFragment #1702
  • Allow silencing the # of reads in fragment warning in InterleavedFastqInFormatter #1701
  • GenomicRDD.toXxx method names should be consistent #1699
  • Exception thrown in VariantContextConverter.formatAllelicDepth despite SILENT validation stringency #1695
  • Make GenomicRDD.toString more adam-shell friendly #1694
  • Add adam-shell friendly VariantContextRDD.saveAsVcf method #1693
  • change bdgenomics.adam package name for adam-python to bdg-adam #1691
  • Conflict in bdg-formats dependency version due to org.hammerlab:genomic-loci #1688
  • Convert and store variant quality field. #1682
  • Region join shows non-determinism #1680
  • Shuffle region join throws multimapped exception for unmapped reads #1679
  • Push validation checks down to INFO/FORMAT fields #1676
  • IndexOutOfBounds thrown when saving gVCF with no likelihoods #1673
  • Generate docs from R API for distribution #1672
  • Support loading a subset of VCF fields #1670
  • Error with metadata: Multivalued flags are not supported for INFO lines #1669
  • Include bdg.adam-0.23.0.tar.gz in distribution tarballs #1668
  • Include bdgenomics.adam-0.23.0_SNAPSHOT-py2.7.egg in distribution tarball #1667
  • Add SUPPORT.md file to complement CONTRIBUTING.md #1664
  • Can't merge BAM files containing the same sample #1663
  • Incorrect README.md kmer.scala loadAliments method parameter name #1662
  • Add performance benchmarks similar to Samtools CRAM benchmarking page #1661
  • Transient bad GZIP header bug when loading BGZF FASTQ #1658
  • bdgenomics.adam vs bdg.adam for R/Python APIs #1655
  • Need adamR script #1649
  • incorrect grep for assembly jars in bin/pyadam #1647
  • VariantRDD union creates multiple records for the same SNP ID #1644
  • S3 access documentation #1643
  • Algorithms docs formatting #1639
  • Building downstream apps docs reformatting #1638
  • FastqInputFormat.FILE_SPLITTABLE in conf not getting passed properly #1635
  • Add benchmarks to documentation #1634
  • Intro docs contain outdated/incompatible code #1633
  • Intro docs missing a number of active projects #1632
  • Installation instructions for Homebrew missing from documentation #1631
  • Architecture section is missing from docs #1630
  • Seq vs. Seq with javac #1625
  • ProcessingStep missing from adam-codegen #1623
  • Add ADAM recipe to bioconda #1618
  • adam-submit cannot find assembly jar if installed as symlink #1616
  • Expose transform/transmute in Java/Python/R #1615
  • Expose VariantContextRDD in R/Python #1614
  • Expose pipe API from Python/R #1611
  • Serialization issue with TwoBitFile #1610
  • Snapshot Distribution Does not include jar files #1607
  • ManualRegionPartitioner is broken for ParallelFileMerger codepath #1602
  • VariantRDD doesn't save partition map #1601
  • Scala copy method not supported in abstract classes such as AlignmentRecordRDD #1599
  • Interleaved FASTQ recognizes only /1 suffix pattern #1589
  • Use empty sequence dictionary when loading features #1588
  • New Illumina FASTQ spec adds metadata to read name line #1585
  • first run of ADAM #1582
  • Add unit test coverage for BED12 parser and writer #1579
  • Spark 1.x Scala 2.10 snapshot artifacts missing since 31 March 2017 #1578
  • Unable to save GenomicRDDs after a join. #1576
  • Add filterBySequenceDictionary to GenomicRDD #1575
  • Unaligned Trait does nothing #1573
  • Bump to bdg-formats 0.11.1 #1570
  • PhredUtils conversion to log probabilities has insufficient resolution for PLs #1569
  • Reference model import code is borked #1568
  • SequenceDictionary vs Feature[RDD] of reference length features #1567
  • giab-NA12878 truth_small_variants.vcf.gz header issues #1566
  • VCF header read from stream ignored in VCFOutFormatter #1564
  • VCF genotype Number=A attribute throws ArrayIndexOutOfBoundsException #1562
  • Save compressed single file VCF via HadoopBAM #1554
  • bucketing strategy #1553
  • Is parquet using delta encoding for positions? #1552
  • Export to VCF does not include symbolic non-ref if site has a called alt #1551
  • Refactor filterByOverlappingRegions not to require a List #1549
  • Move docs to Sphinx/pure Markdown #1548
  • java.lang.IncompatibleClassChangeError: Implementing class #1544
  • Support locus predicate in TransformAlignments #1539
  • Visibility from Java, jrdd has private access in AvroGenomicRDD #1538
  • Rename o.b.adam.apis.java package to o.b.adam.api.java #1537
  • VCF header genotype reserved key FT cardinality clobbered by htsjdk #1535
  • Compute a SequenceDictionary from a *.genome file #1534
  • Queryname sorted check should check for queryname grouped as well #1530
  • Bump to bdg-formats 0.11.0 #1520
  • Move to Spark 2.2, Parquet 1.8.2 #1517
  • Minor refactor for TreeRegionJoin for consistency #1514
  • Allow +Inf and -Inf Float values when reading VCF #1512
  • SparkFiles temp directory path should be accessible as a variable #1510
  • SparkFiles.get expects just the filename #1509
  • Split apart #1324 #1507
  • Where can I find "Phred-scaled quality score" (QUAL)? #1506
  • Alignment Record sort is not consistent with samtools #1504
  • Sequence dictionary records in TwoBitFile are not stable #1502
  • Move coverage counter over to Dataset API #1501
  • Allow users to set the minimum partition count across all load methods #1500
  • Enable reuse of broadcast object across broadcast region joins #1499
  • Take union across genomic RDDs #1497
  • Adam files created by vcf2adam is not recognizable #1496
  • Scalatest log output disappears with Maven 3.5.0 #1495
  • ArrayOutOfBoundsException in vcf2adam (spark2_2.11-0.22.0) on UK10K VCFs (VCFv4.1) #1494
  • ReferenceRegion overlaps and covers returns false if overlap is 1 #1492
  • Provide asSingleFile parameter for saveAsFastq and related #1490
  • Min Phred score gets bumped by 33 twice in BQSR #1488
  • Should throw error when BAM header load fails #1486
  • Default value for reads.toCoverage(collapse) should be false #1483
  • Refactor ADAMContext loadXxx methods for consistency #1481
  • loadGenotypes three time #1480
  • Fall back to sequential concat when HDFS concat fails #1478
  • VCF line with . ALT gets dropped #1476
  • ADAM works on Cloudera but does NOT work on MAPR #1475
  • Clean up ReferenceRegion.scala #1474
  • Allow joins on regions that are within a threshold (instead of requiring overlap) #1473
  • FeatureRDD.toCoverage throws NullPointerException when there is no coverage information #1471
  • Add quality score binner #1462
  • Splittable compression and FASTQ #1457
  • Don't convert .{different-type}.adam in loadAlignments and loadFragments #1456
  • New primitives for adam-core #1454
  • Port over code for populating SequenceDictionaries from .dict files #1449
  • Ignore failed push to Coveralls during CI builds #1444
  • No asSingleFile parameter for saveAsFasta in NucleotideContigFragmentRDD #1438
  • shufflejoin and ArrayIndexOutOfBoundsException #1436
  • Document using ADAM snapshot #1432
  • Improve metrics coverage across ADAMContext load methods #1428
  • loadReferenceFile missing from Java API #1421
  • loadCoverage missing from Java API #1420
  • Question: How to get paired-end alignemntRecord like RDD[AlignmentRecord, AlignmentRecordRDD]? #1419
  • Clean up possibly unused methods in Projection #1417
  • Problem loading SNPeff annotated VCF #1390
  • RecordGroupDictionary should support isEmpty #1380
  • Get rid of mutable collection transformations in ShuffleRegionJoin #1379
  • Add tab5/6 as native output format for AlignmentRecordRDD #1377
  • ValidationStringency in MDTagging should apply to reads on unknown references #1365
  • Assembly final name doesn't include spark2 for Spark 2.x builds #1361
  • Merge reads2fragments and fragments2reads into a single CLI #1359
  • Investigate failures to load ExAC.0.3.GRCh38.vcf variants #1351
  • adam-shell does not allow additional jars via Spark jars argument #1349
  • Loading GZipped VCF returns an empty RDD #1333
  • Bump Spark 2 build to Spark 2.1.0 #1330
  • Rename Transform command TransformAlignments or similar #1328
  • Replace ADAM2Vcf and Vcf2ADAM commands with TransformGenotypes and TransformVariants #1327
  • FeatureRDD instantiation tries to cache the RDD #1321
  • Repository for Pipe API wrappers for bioinformatics tools #1314
  • Trying to get Spark pipeline working with slightly out of date code. #1313
  • Support for gVCF merging and genotyping (e.g. CombineGVCFs and GenotypeGVCFs) #1312
  • Support for read alignment and variant calling in Adam? (e.g. BWA + Freebayes) #1311
  • Don't include log4j.properties in published JAR #1300
  • Removing ProgramRecords info when saving data to sam/bam? #1257
  • ADAM on Slurm/LSF #1229
  • Maintaining sorted/partitioned knowledge #1216
  • Evaluate bdg-convert external conversion library proposal #1197
  • Port AMPCamp Tutorial over #1174
  • Top level WrappedRDD or similar abstraction #1173
  • GFF3 formatted features written as single file must include gff-version pragma #1169
  • Can probably eliminate sort in RealignIndels #1137
  • Load SV type info field - need for allele uniquness #1134
  • BroadcastRegionJoin is not a broadcast join #1110
  • AlignmentRecordRDD does not extend GenomicRDD per javac #1092
  • Add generic ReferenceRegion pushdown for parquet files #1047
  • Use of dataset api in ADAM #1018
  • Difference running markdups with and without projection #1014
  • ADAM to BAM conversion fails using relative path #1012
  • Refactor SequenceDictionary to use Contig instead of SequenceRecord #997
  • NoSuchMethodError due to kryo minor-version mismatch #955
  • Autogen field names in projection package #941
  • Future of schemas in bdg-formats #925
  • genotypeType for genotypes with multiple OtherAlt alleles? #897
  • How to filter genotype RDD with FeatureRDD #890
  • How to convert genotype DataFrame to VariantContext DataFrame / RDD #886
  • R language package for Adam #882
  • How to count genotypes with a 10 node Spark/Adam cluster faster than with BCFTools on a single machine? #879
  • Ensure Java API is up-to-date with Scala API #855
  • BroadcastRegionJoin fails with unmapped reads #821
  • Resolve Fragment vs. SingleReadBucket #789
  • Updating/Publishing the docs/ directory #774
  • Next on empty iterator in BroadcastRegionJoin #661
  • Cleanup code smell in sort work balancing code #635
  • Provide low-impact alternative to transform -repartition for reducing partition size #594
  • Create an ADAM Python API #538
  • Migrate serialization libraries out of ADAM core #448
  • Create standardized, interpretable exceptions for error reporting #420
  • Build info/version info inside ADAM-generated files #188

Merged and closed pull requests:

  • [ADAM-1854] Add requirements.txt file for RTD. #1856 (fnothaft)
  • [ADAM-1783] Resolve check issues that block pushing to CRAN. #1849 (fnothaft)
  • [ADAM-1847] Update ADAM scripts to support self-contained pip install. #1848 (fnothaft)
  • [ADAM-1845] Only build and publish scaladocs in publish-scaladoc.sh. #1846 (heuermh)
  • [ADAM-1843] Install sources before calling scala:doc in publish scaladoc #1844 (fnothaft)
  • Remove python and R profiles from release script #1842 (heuermh)
  • [ADAM-1817] Bump to Hadoop-BAM 7.9.1. #1841 (fnothaft)
  • [ADAM-1838] Make populating variant.annotation field in Genotype configurable #1839 (fnothaft)
  • [ADAM-1834] Add proper extensions for SAM/BAM/CRAM output formats. #1835 (fnothaft)
  • [ADAM-1822] Misc docs cleanup #1827 (fnothaft)
  • Added missing init.py for fulltoc. #1824 (fnothaft)
  • [ADAM-1821] Add missing fulltoc for Sphinx documentation. #1823 (fnothaft)
  • Fix link to documentation #1820 (nzachow)
  • [ADAM-1634] Add algorithm benchmarks to documentation. #1818 (fnothaft)
  • [ADAM-1813] Delegate right outer shuffle region join to left OSRJ implementation. #1814 (fnothaft)
  • [ADAM-1807] Check for empty partition when running a piped command. #1812 (fnothaft)
  • [ADAM-1803] Refactor GenomicRDD.writeTextRdd to util.TextRddWriter. #1809 (heuermh)
  • Added Filter error when file loaded does not match schema #1805 (akmorrow13)
  • changed num_jars count #1802 (akmorrow13)
  • [ADAM-1795] Map -DskipTests=true to exec.skip for Python and R tests. #1800 (heuermh)
  • [ADAM-1672] Use working directory for R devtools::document(). #1798 (heuermh)
  • [ADAM-1789] Move scala-lang to provided scope. #1790 (fnothaft)
  • [ADAM-1784] loadIndexedBam should pass the raw globbed path to Hadoop-BAM #1785 (fnothaft)
  • [ADAM-1664] Add SUPPORT.md file to complement CONTRIBUTING.md. #1781 (heuermh)
  • [ADAM-1779] Adding code of contact adapted from the Contributor Convenant, version 1.4. #1780 (heuermh)
  • [ADAM-1661] Add file storage benchmarks. #1772 (fnothaft)
  • [ADAM-1770] Genotype should only store core variant fields. #1771 (fnothaft)
  • [ADAM-1768] Add InFormatter for unpaired FASTQ. #1769 (fnothaft)
  • [ADAM-1643] Add S3 access documentation. #1767 (fnothaft)
  • [ADAM-1763] Apply absolute value to destination partition in ModPartitioner #1766 (fnothaft)
  • Add R and Python into distribution artifacts #1765 (fnothaft)
  • [ADAM-1655] Move R package to bdgenomics.adam. #1764 (fnothaft)
  • [ADAM-1753] Only emit realignment targets for reads containing a single INDEL #1756 (fnothaft)
  • [ADAM-1715] Support validation stringency in Python/R. #1755 (fnothaft)
  • [ADAM-1680] Eliminate non-determinism in the ShuffleRegionJoin. #1754 (fnothaft)
  • update to _replaceRdd with tests #1749 (akmorrow13)
  • [ADAM-1747] Fixed subtract bug and tests #1748 (devin-petersohn)
  • [ADAM-1745] Adding LeftOuterShuffleRegionJoinAndGroupByLeft and tests #1746 (devin-petersohn)
  • Enabled thresholding for joins and standardized regionFn #1741 (devin-petersohn)
  • Making join return types consistent #1737 (devin-petersohn)
  • Opening up permissions on GenericGenomicRDD #1736 (devin-petersohn)
  • [ADAM-1716] Add adam- prefix to distribution module name. #1733 (heuermh)
  • [ADAM-1695] Check for illegal genotype index after splitting multi-allelic variants. #1725 (heuermh)
  • [ADAM-1517] Bump Parquet version in a manner compatible with Spark 2.2.x #1722 (fnothaft)
  • [ADAM-1512] Support VCFs with +Inf/-Inf float values. #1721 (fnothaft)
  • [ADAM-1709] Add ability to left normalize reads containing INDELs. #1711 (fnothaft)
  • [ADAM-1691] Move bdgenomics.adam to use a namespace package. #1706 (fnothaft)
  • moved bdgenomics.adam package to bdgenomics-adam #1705 (akmorrow13)
  • Misc cleanup needed for bigdatagenomics/cannoli#65 #1704 (fnothaft)
  • [ADAM-1699] Make GenomicRDD.toXxx method names consistent. #1700 (heuermh)
  • [ADAM-1694] Add short readable descriptions for toString in subclasses of GenomicRDD. #1698 (heuermh)
  • [ADAM-1693] Add adam-shell friendly VariantContextRDD.saveAsVcf method. #1696 (heuermh)
  • [ADAM-1688] Add bdg-formats exclusion to org.hammerlab:genomic-loci dependency. #1690 (heuermh)
  • [ADAM-1679] Unmapped items should not get caught in requirement when sorting #1687 (fnothaft)
  • [ADAM-1566] Merge VCF header lines with VCFHeaderLineCount.INTEGER correctly. #1685 (heuermh)
  • [ADAM-1682] Add variant quality field. #1684 (fnothaft)
  • Remove adam- prefix from module directory names. #1681 (heuermh)
  • Update to hadoop-bam 7.9.0 and htsjdk 2.11.0. #1678 (heuermh)
  • [ADAM-1676] Add more finely grained validation for INFO/FORMAT fields. #1677 (fnothaft)
  • Python API fixes for AlignmentRecordRDD #1675 (akmorrow13)
  • [ADAM-1673] Don't set PL to empty when no PL is attached to a gVCF record #1674 (fnothaft)
  • [ADAM-1670] Add ability to selectively project VCF fields. #1671 (fnothaft)
  • [ADAM-1663] Enable read groups with repeated names when unioning. #1665 (fnothaft)
  • Maint 2.11 0.18.0 #1659 (Douglas-H)
  • [ADAM-1630] Overhauled docs introduction and added architecture section. #1653 (fnothaft)
  • Add adamR script #1651 (fnothaft)
  • [ADAM-1647] Fix bad JAR discovery grep in bin/pyadam. #1648 (fnothaft)
  • [ADAM-1548] Generate reStructuredText from pandoc markdown. #1646 (fnothaft)
  • Algorithms docs formatting #1645 (gunjanbaid)
  • Cleaned up docs. #1642 (gunjanbaid)
  • Making example code compatible with current ADAM build #1641 (devin-petersohn)
  • Cleaning up formatting and spacing of docs. #1640 (devin-petersohn)
  • added ExtractRegions #1637 (antonkulaga)
  • [ADAM-1635] Eliminate passing FASTQ splittable status via config. #1636 (fnothaft)
  • [ADAM-1614] Add VariantContextRDD to R and Python APIs. #1628 (fnothaft)
  • [ADAM-1615] Add transform and transmute APIs to Java, R, and Python #1627 (fnothaft)
  • [ADAM-1625] Use explicit types for header lines #1626 (heuermh)
  • [ADAM-1623] Add ProcessingStep to adam-codegen. #1624 (heuermh)
  • [ADAM-1607] Update distribution assembly task to attach assembly überjar #1622 (fnothaft)
  • [ADAM-1490] Add asSingleFile to saveAsFastq and related. #1621 (heuermh)
  • Update load method docs in Python and R. #1619 (heuermh)
  • [ADAM-1616] Resolve installation directory if scripts are symlinks. #1617 (heuermh)
  • [ADAM-1611] Extend pipe APIs to Java, Python, and R. #1613 (fnothaft)
  • [ADAM-1610] Mark non-serializable field in TwoBitFile as transient. #1612 (fnothaft)
  • [ADAM-1554] Support saving BGZF VCF output. #1608 (fnothaft)
  • Adding examples of how to use joins in the real world #1605 (devin-petersohn)
  • [ADAM-1599] Add explicit functions for updating GenomicRDD metadata. #1600 (fnothaft)
  • [ADAM-1576] Allow translation between two different GenomicRDD types. #1598 (fnothaft)
  • [ADAM-1444] Ignore failed push to Coveralls. #1595 (fnothaft)
  • Testing, testing, 1... 2... 3... #1592 (fnothaft)
  • [ADAM-1417] Removed unused Projection.apply method, add test for Filter. #1591 (fnothaft)
  • [ADAM-1579] Add unit test coverage for BED12 format. #1587 (fnothaft)
  • [ADAM-1585] Support additional Illumina FASTQ metadata. #1586 (fnothaft)
  • [ADAM-1438] Add ability to save FASTA back as a single file. #1581 (fnothaft)
  • Bump bdg-formats correctly to 0.11.1, not SNAPSHOT. #1577 (fnothaft)
  • [ADAM-1573] Remove unused Unaligned trait. #1574 (fnothaft)
  • Slurm deployment readme #1571 (jpdna)
  • [ADAM-1564] Read VCF header from stream in VCFOutFormatter. #1565 (heuermh)
  • [ADAM-1562] Index off by one for VCF genotype Number=A attributes. #1563 (heuermh)
  • [ADAM-1533] Set Theory #1561 (devin-petersohn)
  • Freebayes FORMAT=<ID=AO,Number=A attribute throws ArrayIndexOutOfBoundsException #1560 (heuermh)
  • [ADAM-1551] Emit non-reference model genotype at called sites. #1559 (fnothaft)
  • [ADAM-1449] Add loadSequenceDictionary to ADAM context. #1557 (heuermh)
  • [ADAM-1537] Rename o.b.adam.apis.java package to o.b.adam.api.java #1556 (heuermh)
  • [ADAM-1549] Make regions provided to filterByOverlappingRegions an Iterable. #1550 (fnothaft)
  • [ADAM-941] Automatically generate projection enums. #1547 (fnothaft)
  • [ADAM-1361] Fix misnamed ADAM überjar. #1546 (fnothaft)
  • [ADAM-1257] Add program record support for alignment/fragment files. #1545 (fnothaft)
  • [ADAM-1359] Merge reads2fragments and fragments2reads into transformFragments #1543 (fnothaft)
  • Fix minor format mistakes (and typo) in docs #1542 (kkaneda)
  • Add a simple unit test to SingleFastqInputFormat #1541 (kkaneda)
  • Support locus predicate in Transform #1540 (fnothaft)
  • [ADAM-1421] Add java API for loadReferenceFile. #1536 (fnothaft)
  • Refactor Vcf2ADAM and ADAM2Vcf into TransformGenotypes and TransformVariants #1532 (heuermh)
  • [ADAM-1530] Support loading GO:query (S/CR/B)AMs as fragments. #1531 (fnothaft)
  • [ADAM-1169] Write GFF header line pragma in single file mode. #1529 (fnothaft)
  • [ADAM-1501] Compute coverage using Dataset API. #1528 (fnothaft)
  • [ADAM-1497] Add union to GenomicRDD. #1526 (fnothaft)
  • [ADAM-1486] Respect validation stringency if BAM header load fails. #1525 (fnothaft)
  • [ADAM-1499] Enable reuse of broadcasted objects in region join. #1524 (fnothaft)
  • [ADAM-1520] Bump to bdg-formats 0.11.0. #1523 (fnothaft)
  • Adding fragment InFormatter for Bowtie tab5 format #1522 (heuermh)
  • [ADAM-1328] Rename Transform to TransformAlignments. #1521 (fnothaft)
  • [ADAM-1517] Move to Parquet 1.8.2 in preparation for moving to Spark 2.2.0 #1518 (fnothaft)
  • Fixed minor typos in README. #1516 (gunjanbaid)
  • Making TreeRegionJoin consistent with ShuffleRegionJoin #1515 (devin-petersohn)
  • Resolve #1508, #1509 for Pipe API #1511 (fnothaft)
  • [ADAM-1502] Preserve contig ordering in TwoBitFile sequence dictionary. #1508 (fnothaft)
  • [ADAM-1483] Remove collapse parameter from AlignmentRecordRDD.toCoverage #1493 (fnothaft)
  • [ADAM-1377] Adding fragment InFormatter for Bowtie tab6 format #1491 (heuermh)
  • [ADAM-1488] Only increment BQSR min quality by 33 once. #1489 (fnothaft)
  • [ADAM-1481] Refactor ADAMContext loadXxx methods for consistency #1487 (heuermh)
  • Add quality score binner #1485 (fnothaft)
  • Clean up ReferenceRegion.scala and add thresholded overlap and covers #1484 (devin-petersohn)
  • [ADAM-1456] Remove .{type}.adam file extension conversions in type-guessing methods. #1482 (heuermh)
  • [ADAM-1480] Add switch to disable the fast concat method. #1479 (fnothaft)
  • [ADAM-1476] Treat . ALT allele as symbolic non-ref. #1477 (fnothaft)
  • Adding require for Coverage Conversion and related tests #1472 (devin-petersohn)
  • Add cache argument to loadFeatures, additional Feature timers #1427 (heuermh)
  • [ADAM-882] R API #1397 (fnothaft)
  • [ADAM-1018] Add support for Spark SQL Datasets. #1391 (fnothaft)
  • WIP Python API #1387 (fnothaft)
  • [ADAM-1365] Apply validation stringency to reads on missing contigs when MD tagging #1366 (fnothaft)
  • Update dependency and plugin versions #1360 (heuermh)
  • [ADAM-1330] Move to Spark 2.1.0. #1332 (fnothaft)
  • Efficient Joins and (re)Partitioning #1324 (devin-petersohn)

Version 0.22.0

Closed issues:

  • Realign all reads at target site, not just reads with no mismatches #1469
  • Parallel file merger fails if the output file is smaller than the HDFS block size #1467
  • Add new realigner arguments to docs #1465
  • Recalibrate method misspelled as recalibateBaseQualities #1463
  • FASTQ may try to split GZIPed files #1459
  • Update to Hadoop-BAM 7.8.0 #1455
  • Publish Markdown and Scaladoc to the interwebs #1453
  • Make VariantContextConverter public #1451
  • Apply method in FragmentRDD is package private #1445
  • Thread pool will block inside of pipe command for streams too large to buffer #1442
  • FeatureRDD.apply() does not allow addition of other parameters with defaults in the case class #1439
  • Question : Why the number of paired sequence in adam-0.21.0 less than adam-0.19.0? #1424
  • loadCoverage missing from Java API #1420
  • Estimate contig lengths in SequenceDictionary for BED, GFF3, GTF, and NarrowPeak feature formats #1410
  • loadIntervalList FeatureRDD has empty SequenceDictionary #1409
  • problem using transform command #1406
  • Add coveralls #1403
  • INDEL realigner binary search conditional is flipped #1402
  • Delete adam-scripts/R #1398
  • Data missing when transfroming FASTQ to Adam #1393
  • java.io.FileNotFoundException when file exists #1385
  • Off-by-1 error in FASTQ InputFormat start positioning code #1383
  • Set the wrong value for end for symbolic alts #1381
  • RecordGroupDictionary should support isEmpty #1380
  • Add pipe API in and out formatters for Features #1374
  • Increase visibility for SupportedHeaderLines.allHeaderLines #1372
  • Bits of VariantContextConverter don't get ValidationStringencied #1371
  • Add Markdown docs for Pipe API #1368
  • Array[Consensus] not registered #1367
  • ValidationStringency in MDTagging should apply to reads on unknown references #1365
  • When doing a release, the SNAPSHOT should bump by 0.1.0, not 0.0.1 #1364
  • FromKnowns consensus generator fails if no reads overlap a consensus #1362
  • Performance tune-up in BQSR #1358
  • Increase visibility for ADAMContext.sc and/or getFs... methods #1356
  • Pipe API formatters need to be public #1354
  • Version 0.21.0: VariantContextConverter fails for 1000G VCF data #1353
  • ConsensusModel's can't really be instantiated #1352
  • Runtime conflicts in transitive versions of Guava dependency #1350
  • Transcript Effects ignored if more than 1 #1347
  • Remove "fork" tag from releases #1344
  • Refactor isSorted boolean parameters to sorted #1341
  • Loading GZipped VCF returns an empty RDD #1333
  • Follow up on error messages in build scripts #1331
  • Bump Spark 2 build to Spark 2.1.0 #1330
  • FeatureRDD instantiation tries to cache the RDD #1321
  • Load queryname sorted BAMs as Fragments #1303
  • Run Duplicate Marking on Fragments #1302
  • GenomicRDD.pipe may hang on failure error codes #1282
  • IllegalArgumentException Wrong FS for vcf_head files on HDFS #1272
  • java.io.NotSerializableException: org.bdgenomics.formats.avro.AlignmentRecord #1240
  • Investigate sorted join in dataset api #1223
  • Support looser validation stringency for loading some VCF Integer fields #1213
  • Add new feature-overlap command to demonstrate new region joins #1194
  • What should our API at the command line look like? #1178
  • Split apart partition and join in ShuffleRegionJoin #1175
  • Merging files should be multithreaded #1164
  • File _rgdict.avro does not exist #1150
  • how to collect the .adam files from Spark cluster multiple nodes and some questions about avocado #1140
  • JFYI: tiny forked adam-core "0.20.0" release #1139
  • Samtools (htslib) integration testing #1120
  • AlignmentRecordRDD does not extend GenomicRDD per javac #1092
  • Release ADAM version 0.21.0 #1088
  • Difference running markdups with and without projection #1014
  • ADAM to BAM conversion fails using relative path #1012
  • Refactor SequenceDictionary to use Contig instead of SequenceRecord #997
  • Customize adam-main cli from configuration file #918
  • genotypeType for genotypes with multiple OtherAlt alleles? #897
  • How to convert genotype DataFrame to VariantContext DataFrame / RDD #886
  • Ensure Java API is up-to-date with Scala API #855
  • Improve parallelism during FASTA output #842
  • Explicitly validate user args passed to transform enhancement #841
  • BroadcastRegionJoin fails with unmapped reads #821
  • Resolve Fragment vs. SingleReadBucket #789
  • Add profile for skipping test compilation/resolution #713
  • Next on empty iterator in BroadcastRegionJoin #661
  • Cleanup code smell in sort work balancing code #635
  • Remove reliance on MD tags #622
  • Provide low-impact alternative to transform -repartition for reducing partition size #594
  • Clean up Rich records #577
  • Create standardized, interpretable exceptions for error reporting #420
  • Create ADAM Benchmarking suite #120

Merged and closed pull requests:

  • [ADAM-1469] Don't filter on whether reads have mismatches during realignment #1470 (fnothaft)
  • [ADAM-1467] Skip concat call if there is only one shard. #1468 (fnothaft)
  • [ADAM-1465] Updating realigner CLI docs. #1466 (fnothaft)
  • [ADAM-1463] Rename recalibateBaseQualities method as recalibrateBaseQualities #1464 (heuermh)
  • [ADAM-1453] Add hooks to publish ADAM docs from CI flow. #1461 (fnothaft)
  • [ADAM-1459] Don't split FASTQ when compressed. #1459 (fnothaft)
  • [ADAM-1451] Make VariantContextConverter class and convert methods public #1452 (fnothaft)
  • Moving API overview from building apps doc to new source file. #1450 (heuermh)
  • [ADAM-1424] Adding test for reads dropped in 0.21.0. #1448 (heuermh)
  • [ADAM-1439] Add inferSequenceDictionary ctr to FeatureRDD. #1447 (heuermh)
  • [ADAM-1445] Make apply method for FragmentRDD public. #1446 (fnothaft)
  • [ADAM-1442] Fix thread pool deadlock in GenomicRDD.pipe #1443 (fnothaft)
  • [ADAM-1164] Add parallel file merger. #1441 (fnothaft)
  • Dependency version bump + BroadcastRegionJoin fix #1440 (fnothaft)
  • added JavaApi for loadCoverage #1437 (akmorrow13)
  • Update versions, etc. in build docs #1435 (heuermh)
  • Add test sample(verify number of reads in loadAlignments function) and ADAM SNAPSHOT document #1433 (xubo245)
  • Add cache argument to loadFeatures, additional Feature timers #1427 (heuermh)
  • feat: speed up 2bit file extract #1426 (Blaok)
  • BQSR refactor for perf improvements #1423 (fnothaft)
  • Add ADAMContext/GenomicRDD/pipe docs #1422 (fnothaft)
  • INDEL realigner cleanup #1412 (fnothaft)
  • Estimate contig lengths in SequenceDictionary for BED, GFF3, GTF, and NarrowPeak feature formats #1411 (heuermh)
  • Add coveralls badge to README.md. #1408 (fnothaft)
  • [ADAM-1403] Push coverage reports to Coveralls. #1404 (fnothaft)
  • Added instrumentation timers around joins. #1401 (fnothaft)
  • Add Apache Spark version to --version text #1400 (heuermh)
  • [ADAM-1398] Delete adam-scripts/R. #1399 (fnothaft)
  • [ADAM-1383] Use gt instead of gteq in FASTQ input format line size checks #1396 (fnothaft)
  • Maint spark2 2.11 0.21.0 #1395 (A-Tsai)
  • [ADAM-1393] fix missing reads when transforming fastq to adam #1394 (A-Tsai)
  • [ADAM-1380] Adds isEmpty method to RecordGroupDictionary. #1392 (fnothaft)
  • [ADAM-1381] Fix Variant end position. #1389 (fnothaft)
  • Make javac see that AlignmentRecordRDD extends GenomicRDD #1386 (fnothaft)
  • Added ShuffleRegionJoin usage docs #1384 (devin-petersohn)
  • Misc. INDEL realigner bugfixes #1382 (fnothaft)
  • Add pipe API in and out formatters for Features #1378 (heuermh)
  • [ADAM-1356] Make ADAMContext.getFsAndFiles and related protected visibility #1376 (heuermh)
  • [ADAM-1372] Increase visibility for DefaultHeaderLines.allHeaderLines #1375 (heuermh)
  • [ADAM-1371] Wrap ADAM->htsjdk VariantContext conversion with validation stringency. #1373 (fnothaft)
  • [ADAM-1367] Register Consensus array for serialization. #1369 (fnothaft)
  • [ADAM-1365] Apply validation stringency to reads on missing contigs when MD tagging #1366 (fnothaft)
  • [ADAM-1362] Fixing issue where FromKnowns consensus model fails if no reads hit a target. #1363 (fnothaft)
  • [ADAM-1352] Clean up consensus model usage. #1357 (fnothaft)
  • Increase visibility for InFormatter case classes from package private to public #1355 (heuermh)
  • Use htsjdk getAttributeAsList for VCF INFO ANN key #1348 (heuermh)
  • Fixes parsing variant annotations for multi-allelic rows #1346 (majkiw)
  • Sort pull requests by id #1345 (heuermh)
  • HBase genotypes backend -revised #1335 (jpdna)
  • [ADAM-1330] Move to Spark 2.1.0. #1332 (fnothaft)
  • Support deduping fragments #1309 (fnothaft)
  • [ADAM-1280] Silence CRAM logging in tests. #1294 (fnothaft)
  • Added test to try and repro #1282. #1292 (fnothaft)

Version 0.21.0

Closed issues:

  • Update Markdown docs with ValidationStringency in VCF<->ADAM CLI #1342
  • Variant VCFHeaderLine metadata does not handle wildcards properly #1339
  • Close called multiple times on VCF header stream #1337
  • BroadcastRegionJoin has serialization failures #1334
  • adam-cli uses git-commit-id-plugin which breaks release? #1322
  • move_to_xyz scripts should have interlocks... #1317
  • Lineage for partitionAndJoin in ShuffleRegionJoin causes StackOverflow Errors #1308
  • Add move_to_spark_1.sh script and update README to mention #1307
  • adam-submit transform fails with Exception in thread "main" java.lang.IncompatibleClassChangeError: Implementing class #1306
  • private ADAMContext constructor? #1296
  • AlignmentRecord.mateAlignmentEnd never set #1290
  • how to submit my own driver class via adam-submit? #1289
  • ReferenceRegion on Genotype seems busted? #1286
  • Clarify strandedness in ReferenceRegion apply methods #1285
  • Parquet and CRAM debug logging during unit tests #1280
  • Add more ANN field parsing unit tests #1273
  • loadVariantAnnotations returns empty RDD #1271
  • Implement joinVariantAnnotations with region join #1259
  • Count how many chromosome in the range of the kmer #1249
  • ADAM minor release to support htsjdk 2.7.0? #1248
  • how to config kryo.registrator programmatically #1245
  • Does the nested record Flattener drop Maps/Arrays? #1244
  • Dead-ish code cleanup in org.bdgenomics.adam.utils #1242
  • java.io.FileNotFoundException for old adam file after upgrade to adam0.20 #1240
  • please add maven-source-plugin into the pom file #1239
  • Assembly jar doesn't get rebuilt on CLI changes #1238
  • how to compare with the last the column for the same chromosome name? #1237
  • Need a way for users to add VCF header lines #1233
  • Enhancements to VCF save #1232
  • Must we split multi-allelic sites in our Genotype model? #1231
  • Can't override default -collapse in reads2coverage #1228
  • Reads2coverage NPEs on unmapped reads #1227
  • Strand bias doesn't get exported #1226
  • Move ADAMFunSuite helper functions upstream to SparkFunSuite #1225
  • broadcast join using interval tree #1224
  • Instrumentation is lost in ShuffleRegionJoin #1222
  • Bump Spark, Scala, Hadoop dependency versions #1221
  • GenomicRDD shuffle region join passes partition count to partition size #1220
  • Scala compile errors downstream of Spark 2 Scala 2.11 artifacts #1218
  • Javac error: incompatible types: SparkContext cannot be converted to ADAMContext #1217
  • Release 0.20.0 artifacts failed Sonatype Nexus validation #1212
  • Release script failed for 0.20.0 release #1211
  • gVCF - can't load multi-allelic sites #1202
  • Allow open-ended intervals in loadIndexedBam #1196
  • Interval tree join in ADAM #1171
  • spark-submit throw exception in spark-standalone using .adam which transformed from .vcf #1121
  • BroadcastRegionJoin is not a broadcast join #1110
  • Improve test coverage of VariantContextConverter #1107
  • Variant dbsnp rs id tracking in vcf2adam and ADAM2Vcf #1103
  • Document core ADAM transform methods #1085
  • Document deploying ADAM on Toil #1084
  • Clean up packages #1083
  • VariantCallingAnnotations is getting populated with INFO fields #1063
  • How to load DatabaseVariantAnnotation information ? #1049
  • Release ADAM version 0.20.0 #1048
  • Support VCF annotation ANN field in vcf2adam and adam2vcf #1044
  • How to create a rich(er) VariantContext RDD? Reconstruct VCF INFO fields. #878
  • Add biologist targeted section to the README #497
  • Update usage docs running for EC2 and CDH #493
  • Add docs about building downstream apps on top of ADAM #291
  • Variant filter representation #194

Merged and closed pull requests:

Version 0.20.0

Closed issues:

  • Sorting by reference index seems doesn't work or sorted by DESC order? #1204
  • master won't compile #1200
  • VCF format tag SB field parse error in loading #1199
  • Publish sources JAR with snapshots #1195
  • Type SparkFunSuite in package org.bdgenomics.utils.misc is not available #1193
  • MDTagging fails on GRCh38 #1192
  • Fix stack overflow in IndelRealigner serialization #1190
  • Delete ./scripts/commit-pr.sh #1188
  • Hadoop globStatus returns null if no glob matches #1186
  • Swapping out IntervalRDD under GenomicRDDs #1184
  • How to get "SO coordinate" instead of "SO unsorted"? #1182
  • How to read glob of multiple parquet Genotype #1179
  • Update command line doc and examples in README.md #1176
  • FastqRecordConverter needs cleanup and tests #1172
  • TransformFormats write to .gff3 file path incorrectly writes as parquet #1168
  • Should be able to merge shards across two different file systems #1165
  • RG ID gets written as the index, not the record group name #1162
  • Users should be able to save files as -single without merging them #1161
  • Users should be able to set size of buffer used for merging files #1160
  • Bump Hadoop-BAM to 7.7.0 #1158
  • adam-shell prints command trace to stdout #1154
  • Map IntervalList format column four to feature name or attributes? #1152
  • Parquet storage of VariantContext #1151
  • vcf2adam unparsable vcf record #1149
  • Reorder kryo.register statements in ADAMKryoRegistrator #1146
  • Make region joins public again #1143
  • Support CRAM input/output #1141
  • Transform should run with spark.kryo.requireRegistration=true #1136
  • adam-shell not handling bash args correctly #1132
  • Remove Gene and related models and parsing code #1129
  • Generate Scoverage reports when running CI #1124
  • Remove PairingRDD #1122
  • SAMRecordConverter.convert takes unused arguments #1113
  • Add Pipe API #1112
  • Improve coverage in Feature unit tests #1106
  • K-mer.scala code #1105
  • add -single file output option to ADAM2Vcf #1102
  • adam2vcf Fails with Sample not serializable #1100
  • ReferenceRegion.apply(AlignmentRecord) should not NPE on unmapped reads #1099
  • Add outer region join implementations #1098
  • VariantContextConverter never returns DatabaseVariantAnnotation #1097
  • loadvcf: conflicting require statement #1094
  • ADAM version 0.19.0 will not run on Spark version 2.0.0 #1093
  • Be more rigorous with FileSystem.get #1087
  • Remove network-connected and default test-related Maven profiles #1073
  • Releases should get pushed to Spark Packages #1067
  • Invalid POM for cli on 0.19.0 #1066
  • scala.MatchError RegExp does not catch colons in value part properly #1061
  • Support writing IntervalList header for features #1059
  • Add -single support when writing features in native formats #1058
  • Remove workaround for gzip/BGZF compressed VCF headers #1057
  • Clean up if clauses in Transform #1053
  • Adam-0.18.2 can not load Adam-0.14.0 adamSave function data (sam) #1050
  • filterByOverlappingRegion Incorrect for Genotypes #1042
  • Move Interval trait to utils, added in #75 #1041
  • Remove implicit GenomicRDD to RDD conversion #1040
  • VCF sample metadata - proposal for a GenotypedSampleMetadata object #1039
  • [build system] ADAM test builds pollute /tmp, leaving lots of cruft... #1038
  • adamMarkDuplicates function in AlignmentRecordRDDFunctions class can not mark the same read? #1037
  • test MarkDuplicatesSuite with two similar read in ref and start position and different avgPhredScore, error! #1035
  • Explore protocol buffers vs Avro #1031
  • Increase Avro dependency version to 1.8.0 #1029
  • ADAM specific logging #1024
  • Reenable Travis CI for pull request builds #1023
  • Bump Apache Spark version to 1.6.1 in Jenkins #1022
  • ADAM compatibility with Spark 2.0 #1021
  • ADAM to BAM conversion failing on 1000G file #1013
  • Factor out *RDDFunctions classes #1011
  • Port single file BAM and header code to VCF #1009
  • Roll Jenkins JDK 8 changes into ./scripts/jenkins-test #1008
  • Support GFF3 format #1007
  • Separate fat jar build from adam-cli to new maven module #1006
  • adam-cli POM invalid: maven.build.timestamp #1004
  • Sub-partitioning of Parquet file for ADAM #1003
  • Flattening the Genotype schema #1002
  • install adam 0.19 error! #1001
  • How to solve it please? #1000
  • Has the project realized alignment reads to reference genome algorithm? #996
  • All file-based input methods should support running on directories, compressed files, and wildcards #993
  • Contig to ContigName Change not reflected in AlignmentRecordField #991
  • Add homebrew guidelines to release checklist or automate PR generation #987
  • fix deprecation warnings #985
  • rename fragments package #984
  • Explore if SeqDict data can be factored out more aggressively #983
  • Make "Adam" all caps in filename Adam2Fastq.scala #981
  • Adam2Fastq should output reverse complement when 0x10 flag is set for read #980
  • Allow lowercase letters in jar/version names #974
  • Add stringency parameter to flagstat #973
  • Arg-array parsing problem in adam-submit #971
  • Pass recordGroup parameter to loadPairedFastq #969
  • Send a number of partitions to sc.textFile calls #968
  • adamGetReferenceString doesn't reduce pairs correctly #967
  • Update ADAM formula in homebrew-science to version 0.19.0 #963
  • BAM output in ADAM appears to be corrupt #962
  • Remove code workarounds necessary for Spark 1.2.1/Hadoop 1.0.x support #959
  • Issue with version 18.0.2 #957
  • Expose sorting by reference index #952
  • .rgdict and .seqdict files are not placed in the adam directory #945
  • Why does count_kmers not return k-mers that are split between two records? #930
  • Load legacy file formats to Spark SQL Dataframes #912
  • Clean up RDD method names #910
  • Load/store sequence dictionaries alongside Genotype RDDs #909
  • vcf2adam -print_metrics throws IllegalStateException on Spark 1.5.2 or later #902
  • error: no reads in first split: bad BAM file or tiny split size? #896
  • FastaConverter.FastaDescriptionLine not kryo-registered #893
  • Work With ADAM fasta2adam in a distributed mode #881
  • vcf2adam -> Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.$conforms()Lscala/Predef$$less$colon$less; #871
  • Code coverage profile is broken #849
  • Building Adam on OS X 10.10.5 with Java 1.8 #835
  • Normalize AlignmentRecord.recordGroup* fields onto a separate record type #828
  • Gracefully handle missing Spark- and Hadoop-versions in jenkins-test; document how to set them. #827
  • Use Adam File with Hive #820
  • How do we handle reads that don't have original quality scores when converting to FASTQ with original qualities? #818
  • SAMFileHeader "sort order" attribute being un-set during file-save job #800
  • Use same sort order as Samtools #796
  • RNAME and RNEXT fields jumbled on transform BAM->ADAM->BAM #795
  • Support loading multiple indexed read files #787
  • Duplicate OUTPUT command line argument metaVar in adam2fastq #776
  • Allow Variant to ReferenceRegion conversion #768
  • Spark Errors References Deprecated SPARK_CLASSPATH #767
  • Spark Errors References Deprecated SPARK_CLASSPATH #766
  • adam2vcf fails with -coalesce #735
  • Writing to a BAM file with adamSAMSave consistently fails #721
  • BQSR on C835.HCC1143_BL.4 uses excessive amount of driver memory #714
  • Support writing RDD[Feature] to various file formats #710
  • adamParquetSave has a menacing false error message about *.adam extension #681
  • BAMHeader not set when running on a cluster #676
  • spark 1.3.1 upgarde to hortonworks HDP 2.2.4.2-2? #675
  • Symbol case class is nucleotide-centric #672
  • xAssembler cannot be build using mvn #658
  • adam-submit VerifyError #642
  • vcf2adam : Unsupported type ENUM #638
  • Update CDH documentation #615
  • Remove and generalize plugin code #602
  • Fix record oriented shuffle #599
  • Migrate preprocessing stages out of ADAM #598
  • Publish/socialize a roadmap #591
  • Eliminate format detection and extension checks for loading data #587
  • Improve error message when we can't find a ReferenceRegion for a contig #582
  • Do reference partitioners restrict a partition to contain keys from a single contig? #573
  • Connection refused errors when transforming BAM file with BQSR #516
  • ReferenceRegion shouldn't extend Ordered #511
  • Documentation for common usecases #491
  • Improve handling of "*" sequences during BQSR #484
  • Original qualities are parsed out, but left in attribute fields #483
  • Need a FileLocator that mirrors the use of Path in HDFS #477
  • FileLocator should support finding "child" locators. #476
  • Add S3 based Parquet directory loader #463
  • Should FASTQ output use reads' "original qualities"? #436
  • VcfStringUtils unused? #428
  • We should be able to filter genotypes that overlap a region #422
  • Create a simplified vocabulary for naming projections. #419
  • Update documentation #406
  • Bake off different region join implementations #395
  • Handle no-ops more intelligently when creating MD tags #392
  • Remove all the commands in the "CONVERSION OPERATIONS" CommandGroup #373
  • Fail to Write RDD into HDFS with Parquet Format #344
  • Refactor ReferencePositionWithOrientation #317
  • Add docs about SPARK_LOCAL_IP #305
  • PartitionAndJoin should throw an exception if it sees an unmapped read #297
  • Add insert size calculation #296
  • Newbie questions - learning resources? Reading a range of records from Adam? #281
  • Add variant effect ontology #261
  • Don't flatten optional SAM tags into a string #240
  • Characterize impact of partition size on pileup creation #163
  • Need to support BCF output format #153
  • Allow list of commands to be injected into adam-cli AdamMain #132
  • Parse out common annotations stored in VCF format #118
  • Update normalization code to enable normalization of sequences with more than two indels #64
  • Add clipping heuristic to indel realigner #63
  • BQSR should support recalibration across multiple ADAM files #58

Merged and closed pull requests:

  • fix SB tag parsing #1209 (fnothaft)
  • Fastq record converter #1208 (fnothaft)
  • Doc suggested partitionSize in ShuffleRegionJoin #1207 (jpdna)
  • Test demonstrating region join failure #1206 (jpdna)
  • fix SB tag parsing #1203 (jpdna)
  • fix build #1201 (ryan-williams)
  • [ADAM-1192] Correctly handle other whitespace in FASTA description. #1198 (fnothaft)
  • [ADAM-1190] Manually (un)pack IndelRealignmentTarget set. #1191 (fnothaft)
  • [ADAM-1188] Delete scripts/commit-pr.sh #1189 (fnothaft)
  • [ADAM-1186] Mask null from fs.globStatus. #1187 (fnothaft)
  • Fastq record converter #1185 (zyxue)
  • [ADAM-1182] isSorted=true should write SO:coordinate in SAM/BAM/CRAM header. #1183 (fnothaft)
  • Add scoverage aggregator and fail on low coverage. #1181 (fnothaft)
  • [ADAM-1179] Improve error message when globbing a parquet file fails. #1180 (fnothaft)
  • [ADAM-1176] Update command line doc and examples in README.md #1177 (heuermh)
  • Refactor CLIs for merging sharded files #1167 (fnothaft)
  • Update Hadoop-BAM to version 7.7.0 #1166 (heuermh)
  • [ADAM-1162] Write record group string name. #1163 (fnothaft)
  • Map IntervalList format column four to feature name #1159 (heuermh)
  • Make AlignmentRecordConverter public so that it can be used from other projects #1157 (tomwhite)
  • added predicate option to loadCoverage #1156 (akmorrow13)
  • [ADAM-1154] Change set -x to set -e in ./bin/adam-shell. #1155 (fnothaft)
  • Remove Gene and related models and parsing code #1153 (heuermh)
  • Reorder kryo.register statements in ADAMKryoRegistrator #1148 (heuermh)
  • Updated GenomicPartitioners to accept additional key. #1147 (akmorrow13)
  • [ADAM-1141] Add support for saving/loading AlignmentRecords to/from CRAM. #1145 (fnothaft)
  • misc pom/test/resource improvements #1142 (ryan-williams)
  • [ADAM-1136] Transform runs successfully with kryo registration required #1138 (fnothaft)
  • [ADAM-1132] Fix improper quoting of bash args in adam-shell. #1133 (fnothaft)
  • Remove StructuralVariant and StructuralVariantType, add names field to Variant #1131 (heuermh)
  • Remove StructuralVariant and StructuralVariantType, add names field to Variant #1130 (heuermh)
  • PR #1108 with issue #1122 #1128 (fnothaft)
  • [ADAM-1038] Eliminate writing to /tmp during CI builds. #1127 (fnothaft)
  • Update for bdg-formats code style changes #1126 (heuermh)
  • [ADAM-1124] Add Scoverage and generate coverage reports in Jenkins. #1125 (fnothaft)
  • [ADAM-1093] Move to support Spark 2.0.0. #1123 (fnothaft)
  • remove duplicated dependency #1119 (ryan-williams)
  • Clean up ADAMContext #1118 (fnothaft)
  • [ADAM-993] Support loading files using globs and from directory paths. #1117 (fnothaft)
  • [ADAM-1087] Migrate away from FileSystem.get #1116 (fnothaft)
  • [ADAM-1099] Make reference region not throw NPE. #1115 (fnothaft)
  • Add pipes API #1114 (fnothaft)
  • [ADAM-1105] Use assembly jar in adam-shell. #1111 (fnothaft)
  • Add outer joins #1109 (fnothaft)
  • Modified CalculateDepth to calcuate coverage from alignment files #1108 (akmorrow13)
  • Resolves various single file save/header issues #1104 (fnothaft)
  • [ADAM-1100] Resolve Sample Not Serializable exception #1101 (fnothaft)
  • added loadIndexedVcf and loadIndexedBam for multiple ReferenceRegions #1096 (akmorrow13)
  • Added support for Indexed VCF files #1095 (akmorrow13)
  • [ADAM-582] Eliminate .get on option in FragmentCoverter. #1091 (fnothaft)
  • [ADAM-776] Rename duplicate OUTPUT metaVar in ADAM2Fastq. #1090 (fnothaft)
  • refactored ReferenceFile to require SequenceDictionary #1086 (akmorrow13)
  • [ADAM-1073] Remove network-connected and default test-related Maven profiles #1082 (heuermh)
  • [ADAM-1053] Clean up Transform #1081 (fnothaft)
  • [ADAM-1061] Clean up attributes regex and denormalized fields #1080 (fnothaft)
  • Extended TwoBitFile and NucleotideContigFragmentRDDFunctions to behave more similar #1079 (akmorrow13)
  • Refactor variant and genotype annotations #1078 (heuermh)
  • [ADAM-1039] Add basic support for Sample record. #1077 (fnothaft)
  • Remove code workarounds necessary for Spark 1.2.1/Hadoop 1.0.x support #1076 (heuermh)
  • [ADAM-194] Use separate filtersFailed and filtersPassed arrays for variant quality filters #1075 (heuermh)
  • Whitespace code style fixes #1074 (heuermh)
  • [ADAM-1006] Split überjar out to adam-assembly submodule. #1072 (fnothaft)
  • Remove code coverage profile #1071 (heuermh)
  • [ADAM-768] ReferenceRegion from variant/genotypes #1070 (fnothaft)
  • [ADAM-1044] Support VCF annotation ANN field #1069 (heuermh)
  • [ADAM-1067] Add release documentation and scripting for Spark Packages. #1068 (fnothaft)
  • [ADAM-602] Remove plugin code. #1065 (fnothaft)
  • Refactoring org.bdgenomics.adam.io package. #1064 (fnothaft)
  • Cleanup in org.bdgenomics.adam.converters package. #1062 (fnothaft)
  • [ADAM-1057] Remove workaround for gzip/BGZF compressed VCF headers #1057 (heuermh)
  • Cleanup on org.bdgenomics.adam.algorithms.smithwaterman package. #1056 (fnothaft)
  • Documentation cleanup and minor refactor on the consensus package. #1055 (fnothaft)
  • Add KEYS with public code signing keys #1054 (heuermh)
  • Adding GA4GH 0.5.1 converter for reads. #1052 (fnothaft)
  • [ADAM-1011] Refactor to add GenomicRDDs for all Avro types #1051 (fnothaft)
  • removed interval trait and redirected to interval in utils-intervalrdd #1046 (akmorrow13)
  • [ADAM-952] Expose sorting by reference index. #1045 (fnothaft)
  • overlap query reflects new formats #1043 (erictu)
  • Changed loadIndexedBam to use hadoop-bam InputFormat #1036 (fnothaft)
  • Increase Avro dependency version to 1.8.0 #1034 (heuermh)
  • Improved README fix using feedback from other approach review. #1034 (InvisibleTech)
  • Error in the README.md for kmer.scala example, need to get rdd first. #1032 (InvisibleTech)
  • Add fragmentEndPosition to NucleotideContigFragment #1030 (heuermh)
  • Logging to be done by ADAM utils code rather than Spark #1028 (jpdna)
  • add maxScore #1027 (xubo245)
  • [ADAM-1008] Modify jenkins-test script to support Java 8 build. #1026 (fnothaft)
  • whitespace change, do not merge #1025 (shaneknapp)
  • require kryo registration in tests #1020 (ryan-williams)
  • print full stack traces on test failures #1019 (ryan-williams)
  • bump commons-io version #1017 (ryan-williams)
  • exclude javadoc jar in adam-shell #1016 (ryan-williams)
  • [ADAM-909] Refactoring variation RDDs. #1015 (fnothaft)
  • Modified CalculateDepth to get coverage on whole alignment adam files #1010 (akmorrow13)
  • [ADAM-1004] Remove recursive maven.build.timestamp declaration #1005 (heuermh)
  • Maint 2.11 0.19.0 #999 (tushu1232)
  • [ADAM-710] Add saveAs methods for feature formats GTF, BED, IntervalList, and NarrowPeak #998 (heuermh)
  • Moving Adam2Fastq to ADAM2Fastq #995 (heuermh)
  • Update release doc for CHANGES.md and homebrew #994 (heuermh)
  • Update to AlignmentRecordField and its usages as contig changed to co… #992 (jpdna)
  • [ADAM-974] Short term fix for multiple ADAM cli assembly jars check #990 (heuermh)
  • Update hadoop-bam dependency version to 7.5.0 #989 (heuermh)
  • Replaced Contig with ContigName in AlignmentRecord and related changes #988 (jpdna)
  • fix some deprecation/style things and rename a pkg #986 (ryan-williams)
  • Fix Adam2fastq in case of read with both reverse and unmapped flags #982 (jpdna)
  • [ADAM-510] Refactoring RDD function names #979 (heuermh)
  • Use .adam/_{seq,rg}dict.avro paths for Avro-formatted dictionaries #978 (heuermh)
  • Remove unused file VcfHeaderUtils.scala #977 (heuermh)
  • add validation stringency to bam parsing, flagstat #976 (ryan-williams)
  • more permissible jar regex in adam-submit #975 (ryan-williams)
  • fix bash arg array processing in adam-submit #972 (ryan-williams)
  • adamGetReferenceString reduces pairs correctly, fixes #967 #970 (erictu)
  • A few improvements #966 (ryan-williams)
  • improve SW performance by replacing functional reductions with imperative ones #965 (noamBarkai)
  • [ADAM-962] Fix corrupt single-file BAM output. #964 (fnothaft)
  • [ADAM-960] Updating bdg-utils dependency version to 0.2.4 #961 (heuermh)
  • [ADAM-946] Fixes to FlagStat for Samtools concordance issue #954 (jpdna)
  • Use hadoop-bam BAMInputFormat to do loadIndexedBam #953 (andrewmchen)
  • Add -print_metrics option to Jenkins build #947 (heuermh)
  • adam2vcf doesn't have info fields #939 (andrewmchen)
  • [ADAM-893] Register missing serializers. #933 (fnothaft)

Version 0.19.0

Closed issues:

  • Update bdg-utils dependency version to 0.2.4 #960
  • Drop support for Spark version 1.2.1, Hadoop version 1.0.x #958
  • Exception occurs when running tests on master #956
  • Flagstat results still don't match samtools flagstat #946
  • readInFragment value is not properly read from parquet file into RDD[AlignmentRecord] #942
  • adam2vcf -sort_on_save flag broken #940
  • Transform -limit_projection requires .sam.seqdict file #937
  • MarkDuplicates fails if library name is not set #934
  • fastqtobam or sam #928
  • Vcf2Adam uses SB field instead of FS field for fisher exact test for strand bias #923
  • Add back limit_projection on Transform #920
  • BAM header is not getting set on partition 0 with headerless BAM output format #916
  • Add numParts apply method to GenomicRegionPartitioner #914
  • Add Spark version 1.6.x to Jenkins build matrix #913
  • Target Spark 1.5.2 as default Spark version #911
  • Move to bdg-formats 0.7.0 #905
  • secondOfPair and firstOfPair flag is missing in the newest 0.18 adam transformed results from BAM #903
  • Future pull request #900
  • error in vcf2adam #899
  • Importing directory of VCFs seems to fail #898
  • How to filter genotypeRDD on sample names? org.apache.spark.SparkException: Task not serializable? #891
  • Add Spark version 1.5.x to Jenkins build matrix #889
  • Transform DAG causes stages to recompute #883
  • adam-submit buildinfo is confused #880
  • move_to_scala_2.11 and maven-javadoc-plugin #863
  • NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable #837
  • Fix record oriented shuffle #599
  • Avro.GenericData error with ADAM 0.12.0 on reading from ADAM file #290

Merged and closed pull requests:

  • [ADAM-960] Updating bdg-utils dependency version to 0.2.4 #961 (heuermh)
  • [ADAM-946] Fixes to FlagStat for Samtools concordance issue #954 (jpdna)
  • Fix for travis build, replace reads2ref with reads2fragments #950 (heuermh)
  • [ADAM-940] Fix adam2vcf -sort_on_save flag #949 (massie)
  • Remove BuildInformation and extraneous git-commit-id-plugin configuration #948 (heuermh)
  • Update readme for spark 1.5.2 and hadoop 2.6.0 #944 (heuermh)
  • [ADAM-942] Replace first/secondInRead with readInFragment #943 (heuermh)
  • [ADAM-937] Adding check for aligned read predicate or limit projection flags and non-parquet input path #938 (heuermh)
  • [ADAM-934] Properly handle unset library name during duplicate marking #935 (fnothaft)
  • [ADAM-911] Move to Spark 1.5.2 and Hadoop 2.6.0 as default versions. #932 (fnothaft)
  • added start and end values to Interval Trait. Used for IntervalRDD #931 (akmorrow13)
  • Removing buildinfo command #929 (heuermh)
  • Removing symbolic test resource links, read from test classpath instead #927 (heuermh)
  • Changed fisher strand bias field for VCF2Adam from SB to FS #924 (andrewmchen)
  • [ADAM-920] Limit tag/orig qual flags in Transform. #921 (fnothaft)
  • Change the README to use adam-shell -i instead of pasting #919 (andrewmchen)
  • [ADAM-916] New strategy for writing header. #917 (fnothaft)
  • [ADAM-914] Create a GenomicRegionPartitioner given a partition count. #915 (fnothaft)
  • Squashed #907 and ran format-sources #908 (fnothaft)
  • Various small fixes #907 (huitseeker)
  • ADAM-599, 905: Move to bdg-formats:0.7.0 and migrate metadata #906 (fnothaft)
  • Rewrote the getType method to handle all ploidy levels #904 (NeillGibson)
  • Single file save from #733, rebased #901 (fnothaft)
  • Added is* genotype methods from HTS-JDK Genotype to RichGenotype #895 (NeillGibson)
  • [ADAM-891] Mark SparkContext as @transient. #894 (fnothaft)
  • Update README URLs based on HTTP redirects #892 (ReadmeCritic)
  • adding --version command line option #888 (heuermh)
  • Add exception in move_to_scala_2.11.sh for maven-javadoc-plugin #887 (heuermh)
  • Fix tightlist bug in Pandoc #885 (massie)
  • [ADAM-883] Add caching to Transform pipeline. #884 (fnothaft)

Version 0.18.2

  • ISSUE 877: Minor fix to commit script to support https.
  • ISSUE 876: Separate command line argument words by underscores
  • ISSUE 875: P Operator parsing for MDTag
  • ISSUE 873: [ADAM-872] Modify regex to capture release and SNAPSHOT jars but not javadoc or sources jars
  • ISSUE 866: [ADAM-864] Don't force shuffle if reducing partition count.
  • ISSUE 856: export valid fastq
  • ISSUE 847: Updating build dependency versions to latest minor versions

Version 0.18.1

  • ISSUE 870: [ADAM-867] add pull requests missing from 0.18.0 release to CHANGES.md
  • ISSUE 869: [ADAM-868] make release branch and tag names consistent
  • ISSUE 862: [ADAM-861] use -d to check for repo assembly dir

Version 0.18.0

  • ISSUE 860: New release and pr-commit scripts
  • ISSUE 859: [ADAM-857] Corrected handling of env vars in bin scripts
  • ISSUE 854: [ADAM-853] allow main class in adam-submit to be specified
  • ISSUE 852: [ADAM-851] Slienced Parquet logging.
  • ISSUE 850: [ADAM-848] TwoBitFile now support nBlocks and maskBlocks
  • ISSUE 846: Updating maven build plugin dependency versions
  • ISSUE 845: [ADAM-780] Make DecadentRead package private.
  • ISSUE 844: [ADAM-843] Aggressively project out metadata fields.
  • ISSUE 840: fix flagstat output file encoding
  • ISSUE 839: let flagstat write to file
  • ISSUE 831: Support loading paired fastqs
  • ISSUE 830: better validation when saving paired fastqs
  • ISSUE 829: fix Long != null warnings
  • ISSUE 819: Implement custom ReferenceRegion hashcode
  • ISSUE 816: [ADAM-793] adding command to convert ADAM nucleotide contig fragments to FASTA files
  • ISSUE 815: Upgrade to bdg-formats:0.6.0, add Fragment datatype converters
  • ISSUE 814: [ADAM-812] fix for javadoc errors on JDK8
  • ISSUE 813: [ADAM-808] build an assembly cli jar with maven shade plugin
  • ISSUE 810: [ADAM-807] workaround for git-commit-id/git-commit-id-maven-plugin#61
  • ISSUE 809: [ADAM-785] Add support for all numeric array (TYPE=B) tags
  • ISSUE 806: [ADAM-755] updating utils dependency version to 0.2.3
  • ISSUE 805: Better transform error when file doesn't exist
  • ISSUE 803: fix unmapped-read sorting
  • ISSUE 802: stop writing contig names as md5 sums
  • ISSUE 798: fix SAM-attr conversion bug; int[]'s not byte[]'s
  • ISSUE 790: optionally add MDTags to reads with transform
  • ISSUE 782: Fix SAM Attribute parser for numeric array tags
  • ISSUE 773: [ADAM-772] fix some bash var quoting
  • ISSUE 765: [ADAM-752] Build for many combos of Spark/Hadoop versions.
  • ISSUE 764: More involved README restructuring
  • ISSUE 762: [ADAM-132] allowing list of commands to be injected into adam-cli ADAMMain

Version 0.17.1

  • ISSUE 784: [ADAM-783] Write @SQ header lines in sorted order.
  • ISSUE 792: [ADAM-791] Add repartition parameter to Fasta2ADAM.
  • ISSUE 781: [ADAM-777] Add validation stringency flag for BQSR.
  • ISSUE 757: We should print a warning message if the user has ADAM_OPTS set.
  • ISSUE 770: [ADAM-769] Fix serialization issue in known indel consensus model.
  • ISSUE 763: Clean up README links, other nits
  • ISSUE 749: Remove adam-cli jar from classpath during adam-submit
  • ISSUE 754: Bump ADAM to Spark 1.4
  • ISSUE 753: Bump Spark to 1.4
  • ISSUE 748: Fix for mdtag issues with insertions
  • ISSUE 746: Upgrade to Parquet 1.8.1.
  • ISSUE 744: [ADAM-743] exclude conflicting jackson dependencies
  • ISSUE 737: Reverse complement negative strand reads in fastq output
  • ISSUE 731: Fixed bug preventing use of TLEN attribute
  • ISSUE 730: [ADAM-729] Stuff TLEN into attributes.
  • ISSUE 728: [ADAM-709] Remove FeatureHierarchy and FeatureHierarchySuite
  • ISSUE 719: [ADAM-718] Use filesystem path to get underlying file system.
  • ISSUE 712: unify header-setting between BAM/SAM and VCF
  • ISSUE 696: include SequenceRecords from second-in-pair reads
  • ISSUE 698: class-ify ShuffleRegionJoin, force setting seqdict
  • ISSUE 706: restore clause guarding pruneCache check
  • ISSUE 705: GeneFeatureRDDFunctions → FeatureRDDFunctions

Version 0.17.0

  • ISSUE 691: fix BAM/SAM header setting when writing on cluster
  • ISSUE 688: make adamLoad public
  • ISSUE 694: Fix parent reference in distribution module
  • ISSUE 684: a few region-join nits
  • ISSUE 682: [ADAM-681] Remove menacing error message about reqd .adam extension
  • ISSUE 680: [ADAM-674] Delete Bam2ADAM.
  • ISSUE 678: upgrade to bdg utils 0.2.1
  • ISSUE 668: [ADAM-597] Move correction out of ADAM and into a downstream project.
  • ISSUE 671: Bug fix in ReferenceUtils.unionReferenceSet
  • ISSUE 667: [ADAM-666] Clean up key not found error in partitioner code.
  • ISSUE 656: Update Vcf2ADAM.scala
  • ISSUE 652: added filterByOverlappingRegion in GeneFeatureRDDFunctions
  • ISSUE 650: [ADAM-649] Support transform of all BAM/SAM files in a directory.
  • ISSUE 647: [ADAM-646] Special case reads with '*' quality during BQSR.
  • ISSUE 645: [ADAM-634] Create a local ParquetLister for testing purposes.
  • ISSUE 633: [Adam] Tests for SAMRecordConverter.scala
  • ISSUE 641: [ADAM-640] Fix incorrect exclusion for org.seqdoop.htsjdk.
  • ISSUE 632: [ADAM-631] Allow VCF conversion to sort on output after coalescing.
  • ISSUE 628: [ADAM-627] Makes ReferenceFile trait extend Serializable.
  • ISSUE 637: check for mac brew alternate spark install structure
  • ISSUE 624: Conceptual fix for duplicate marking and sorting stragglers
  • ISSUE 629: [ADAM-604] Remove normalization code.
  • ISSUE 630: Add flatten command.
  • ISSUE 619: [ADAM-540] Move to new HTSJDK release; should support Java 8.
  • ISSUE 626: [ADAM-625] Enable globbing for BAM.
  • ISSUE 621: Removes the predicates package.
  • ISSUE 620: [ADAM-600] Adding RegionJoin trait.
  • ISSUE 616: [ADAM-565] Upgrade to Parquet filter2 API.
  • ISSUE 613: [ADAM-612] Point to proper k-mer counters.
  • ISSUE 588: [ADAM-587] Clean up loading checks.
  • ISSUE 592: [ADAM-513] Remove ReferenceMappable trait.
  • ISSUE 606: [ADAM-605] Remove visualization code.
  • ISSUE 596: [ADAM-595] Delete the 'comparisons' code.
  • ISSUE 590: [ADAM-589] Removed pileup code.
  • ISSUE 586: [ADAM-452] Fixes SM attribute on ADAM to BAM conversion.
  • ISSUE 584: [ADAM-583] Add k-mer counting functionality for nucleotide contig fragments

Version 0.16.0

  • ISSUE 570: A few small conversion fixes
  • ISSUE 579: [ADAM-578] Update end of read when trimming.
  • ISSUE 564: [ADAM-563] Add warning message when saving Parquet files with incorrect extension
  • ISSUE 576: Changed hashCode implementations to improve performance of BQSR
  • ISSUE 569: Typo in the narrowPeak parser
  • ISSUE 568: Moved the Timers object from bdg-utils back to ADAM
  • ISSUE 478: Move non-genomics code
  • ISSUE 550: [ADAM-549] Added documentation for testing and CI for ADAM.
  • ISSUE 555: Makes maybeLoadVCF private.
  • ISSUE 558: Makes Features2ADAMSuite use SparkFunSuite
  • ISSUE 557: Randomize ports and turn off Spark UI to reduce bind exceptions in tests
  • ISSUE 552: Create test suite for FlagStat
  • ISSUE 554: privatize ADAMContext.maybeLoad{Bam,Fastq}
  • ISSUE 551: [ADAM-386] Multiline FASTQ input
  • ISSUE 542: Variants Visualization
  • ISSUE 545: [ADAM-543][ADAM-544] Fix issues with ADAM scripts and classpath
  • ISSUE 535: [ADAM-441] put a check in for Nothing. Throws an IAE if no return type is provided
  • ISSUE 546: [ADAM-532] Fix wigFix intermittent test failure
  • ISSUE 534: [ADAM-528][ADAM-533] Adds new RegionJoin impl that is shuffle-based
  • ISSUE 531: [ADAM-529] Attaching scaladoc to released distribution.
  • ISSUE 413: [ADAM-409][ADAM-520] Added local wigfix2bed tool
  • ISSUE 527: [ADAM-526] VcfAnnotation2ADAM only counts once
  • ISSUE 523: don't open non-.adam-extension files as ADAM files
  • ISSUE 521: quieting wget output
  • ISSUE 482: [ADAM-462] Coverage region calculation
  • ISSUE 515: [ADAM-510] fix for bash syntax error; add ADDL_JARS check to adam-submit

Version 0.15.0

  • ISSUE 509: Add a 'distribution' module to create assemblies
  • ISSUE 508: Upgrade from Parquet 1.4.3 to 1.6.0rc4
  • ISSUE 498: [ADAM-496] Changes VCF to flat ADAM command name and usage
  • ISSUE 500: [ADAM-495] Require SPARK_HOME for adam-submit
  • ISSUE 501: [ADAM-499] Add -onlyvariants option to vcf2adam
  • ISSUE 507: [ADAM-505] Removed adam-local from docs
  • ISSUE 504: [ADAM-502] Add missing Long implicit to ColumnReaderInput
  • ISSUE 503: [ADAM-473] Make RecordCondition and FieldCondition public
  • ISSUE 494: Fix foreach block for vcf ingest
  • ISSUE 492: Documentation cleanup and style improvements
  • ISSUE 481: [ADAM-480] Switch assembly to single goal.
  • ISSUE 487: [ADAM-486] Add port option to viz command.
  • ISSUE 469: [ADAM-461] Fix ReferenceRegion and ReferencePosition impl
  • ISSUE 440: [ADAM-439] Fix ADAM to account for BDG-FORMATS-35: Avro uses Strings
  • ISSUE 470: added ReferenceMapping for Genotype, filterByOverlappingRegion for GenotypeRDDFunctions
  • ISSUE 468: refactor RDD loading; explicitly load alignments
  • ISSUE 474: Consolidate documentation into a single location in source.
  • ISSUE 471: Fixed typo on MAVEN_OPTS quotation mark
  • ISSUE 467: [ADAM-436] Optionally output original qualities to fastq
  • ISSUE 451: add adam view command, analogous to samtools view
  • ISSUE 466: working examples on .sam included in repo
  • ISSUE 458: Remove unused val from Reads2Ref
  • ISSUE 438: Add ability to save paired-FASTQ files
  • ISSUE 457: A few random Predicate-related cleanups
  • ISSUE 459: a few tweaks to scripts/jenkins-test
  • ISSUE 460: Project only the sequence when kmer/qmer counting
  • ISSUE 450: Refactor some file writing and reading logic
  • ISSUE 455: [ADAM-454] Add serializers for Avro objects which don't have serializers
  • ISSUE 447: Update the contribution guidelines
  • ISSUE 453: Better null handling for isSameContig utility
  • ISSUE 417: Stores original position and original cigar during realignment.
  • ISSUE 449: read “OQ” attr from structured SAMRecord field
  • ISSUE 446: Revert "[ADAM-237] Migrate to Chill serialization libraries."
  • ISSUE 437: random nits
  • ISSUE 434: Few transform tweaks
  • ISSUE 435: [ADAM-403] Remove seqDict from RegionJoin
  • ISSUE 431: A few tweaks, typo corrections, and random cleanups
  • ISSUE 430: [ADAM-429] adam-submit now handles args correctly.
  • ISSUE 427: Fixes for indel realigner issues
  • ISSUE 418: [ADAM-416] Removing 'ADAM' prefix
  • ISSUE 404: [ADAM-327] Adding gene, transcript, and exon models.
  • ISSUE 414: Fix error in adam-local alias
  • ISSUE 415: Update README.md to reflect Spark 1.1
  • ISSUE 412: [ADAM-411] Updated usage aliases in README. Fixes #411.
  • ISSUE 408: [ADAM-405] Add FASTQ output.
  • ISSUE 385: [ADAM-384] Adds import from FASTQ.
  • ISSUE 400: [ADAM-399] Fix link to schemas.
  • ISSUE 396: [ADAM-388] Sets Kryo serialization with --conf args
  • ISSUE 394: [ADAM-393] Adds knobs to SparkContext creation in SparkFunSuite
  • ISSUE 391: [ADAM-237] Migrate to Chill serialization libraries.
  • ISSUE 380: Rewrite of MarkDuplicates which seems to improve performance
  • ISSUE 387: fix some deprecation warnings

Version 0.14.0

  • ISSUE 376: [ADAM-375] Upgrade to Hadoop-BAM 7.0.0.
  • ISSUE 378: [ADAM-360] Upgrade to Spark 1.1.0.
  • ISSUE 379: Fix the position of the jar path in the submit.
  • ISSUE 383: Make Mdtags handle '=' and 'X' cigar operators
  • ISSUE 369: [ADAM-369] Improve debug output for indel realigner
  • ISSUE 377: [ADAM-377] Update to Jenkins scripts and README.
  • ISSUE 374: [ADAM-372][ADAM-371][ADAM-365] Refactoring CLI to simplify and integrate with Spark model better
  • ISSUE 370: [ADAM-367] Updated alias in README.md
  • ISSUE 368: erasure, nonexhaustive-match, deprecation warnings
  • ISSUE 354: [ADAM-353] Fixing issue with SAM/BAM/VCF header attachment when running distributed
  • ISSUE 357: [ADAM-357] Added Java Plugin hook for ADAM.
  • ISSUE 352: Fix failing MD tag
  • ISSUE 363: Adding maven assembly plugin configuration to create tarballs
  • ISSUE 364: [ADAM-364] Fixing remaining cs.berkeley.edu URLs.
  • ISSUE 362: Remove mention of uberjar from README

Version 0.13.0

  • ISSUE 343: Allow retrying on failure for HTTPRangedByteAccess
  • ISSUE 349: Fix for a NullPointerException when hostname is null in Task Metrics
  • ISSUE 347: Bug fix for genome browser
  • ISSUE 346: Genome visualization
  • ISSUE 342: [ADAM-309] Update to bdg-formats 0.2.0
  • ISSUE 333: [ADAM-332] Upgrades ADAM to Spark 1.0.1.
  • ISSUE 341: [ADAM-340] Adding the TrackedLayout trait and implementation.
  • ISSUE 337: [ADAM-335] Updated README.md to reflect migration to appassembler.
  • ISSUE 311: Adding several simple normalizations.
  • ISSUE 330: Make mismatch and deletes positions accessible
  • ISSUE 334: Moving code coverage into a profile
  • ISSUE 329: Add count of mismatches to mdtag
  • ISSUE 328: [ADAM-326] Adding a 5-second retry on the HttpRangedByteAccess test.
  • ISSUE 325: Adding documentation for commit/issue nomenclature and rebasing

Version 0.12.1

  • ISSUE 308: Fixing the 'index 0' bug in features2adam
  • ISSUE 306: Adding code for lifting over between sequences and the reference genome.
  • ISSUE 320: Remove extraneous implicit methods in ReferenceMappingContext
  • ISSUE 314: Updates to indel realigner to improve performance and accuracy.
  • ISSUE 319: Adding scripts for publishing scaladoc.
  • ISSUE 315: Added table of (wall-clock) stage durations when print_metrics is used
  • ISSUE 312: Fixing sources jar
  • ISSUE 313: Making the CredentialsProperties file optional
  • ISSUE 267: Parquet and indexed Parquet RDD implementations, and indices.
  • ISSUE 301: Add Beacon's AlleleCount
  • ISSUE 293: Add aggregation and display of metrics obtained from Spark
  • ISSUE 295: Fix broken link to ADAM specification for storing reads.
  • ISSUE 292: Cleaning up scaladoc generation warnings.
  • ISSUE 289: Modifying interleaved fastq format to be hadoop version independent.
  • ISSUE 288: Add ADAMFeature to Kryo registrator
  • ISSUE 286: Removing some debug printout that was left in.
  • ISSUE 287: Cleaning hadoop dependencies
  • ISSUE 285: Refactoring read groups to increase the amount of data stored.
  • ISSUE 284: Cleaning up build warnings.
  • ISSUE 280: Move to bdg-formats
  • ISSUE 283: Fix reference name comment
  • ISSUE 282: Minor cleanup on interleaved FASTQ input format.
  • ISSUE 277: Implemented HTTPRangedByteAccess.
  • ISSUE 274: Added clarifying note to ADAMVariantContext
  • ISSUE 279: Simplify format-source
  • ISSUE 278: Use maven license plugin to ensure source has correct license
  • ISSUE 268: Adding fixed depth prefix trie implementation
  • ISSUE 273: Fixes issue in reference models where strings are not sanitized on collection from avro.
  • ISSUE 272: Created command categories
  • ISSUE 269: Adding k-mer and q-mer counting.
  • ISSUE 271: Consolidate Parquet logging configuration

Version 0.12.0

  • ISSUE 264: Parquet-related Utility Classes
  • ISSUE 259: ADAMFlatGenotype is a smaller, flat version of a genotype schema
  • ISSUE 266: Removed extra command 'BuildInformation'
  • ISSUE 263: Added AdamContext.referenceLengthFromCigar
  • ISSUE 260: Modifying conversion code to resolve #112.
  • ISSUE 258: Adding an 'args' parameter to the plugin framework.
  • ISSUE 262: Adding reference assembly name to ADAMContig.
  • ISSUE 256: Upgrading to Spark 1.0
  • ISSUE 257: Adds toString method for sequence dictionary.
  • ISSUE 255: Add equals, canEqual, and hashCode methods to MdTag class

Version 0.11.0

  • ISSUE 254: Cleanup import statements
  • ISSUE 250: Adding ADAM to SAM conversion.
  • ISSUE 248: Adding utilities for read trimming.
  • ISSUE 252: Added a note about rebasing-off-master to CONTRIBUTING.md
  • ISSUE 249: Cosmetic changes to FastaConverter and FastaConverterSuite.
  • ISSUE 251: CHANGES.md is updated at release instead of per pull request
  • ISSUE 247: For #244, Fragments were incorrect order and incomplete
  • ISSUE 246: Making sample ID field in genotype nullable.
  • ISSUE 245: Adding ADAMContig back to ADAMVariant.
  • ISSUE 243: Rebase PR#238 onto master

Version 0.10.0

  • ISSUE 242: Upgrade to Parquet 1.4.3
  • ISSUE 241: Fixes to FASTA code to properly handle indices.
  • ISSUE 239: Make ADAMVCFOutputFormat public
  • ISSUE 233: Build up reference information during cigar processing
  • ISSUE 234: Predicate to filter conversion
  • ISSUE 235: Remove unused contiglength field
  • ISSUE 232: Add -pretty and -o to the print command
  • ISSUE 230: Remove duplicate mdtag field
  • ISSUE 231: Helper scripts to run an ADAM Console.
  • ISSUE 226: Fix ReferenceRegion from ADAMRecord
  • ISSUE 225: Change Some to Option to check for unmapped reads
  • ISSUE 223: Use SparkConf object to configure SparkContext
  • ISSUE 217: Stop using reference IDs and use reference names instead
  • ISSUE 220: Update SAM to ADAM conversion
  • ISSUE 213: BQSR updates

Version 0.9.0

  • ISSUE 214: Upgrade to Spark 0.9.1
  • ISSUE 211: FastaConverter Refactor
  • ISSUE 212: Cleanup build warnings
  • ISSUE 210: Remove Scalariform from process-sources phase
  • ISSUE 209: Fix Scalariform issues and Maven warnings
  • ISSUE 207: Change from deprecated manifest erasure to runtimeClass
  • ISSUE 206: Add Scalariform settings to pom
  • ISSUE 204: Update Avro code gen to not mark fields as deprecated.

Version 0.8.0

  • ISSUE 203: Move package from edu.berkeley.cs.amplab to org.bdgenomics
  • ISSUE 199: Updating pileup conversion code to convert sequences that use the X and = (EQ) CIGAR operators
  • ISSUE 191: Add repartition parameter
  • ISSUE 183: Fixing Job.getInstance call that breaks hadoop 1 compatibility.
  • ISSUE 192: Add docs and scripts for creating a release
  • ISSUE 193: Issue #137, clarify role of CHANGES.{md,txt}

Version 0.7.2

  • ISSUE 187: Add summarize_genotypes command
  • ISSUE 178: Upgraded to Hadoop-BAM 0.6.2/Picard 1.107.
  • ISSUE 173: Parse annotations out of vcf files
  • ISSUE 162: Refactored SequenceDictionary
  • ISSUE 180: BQSR using vcf loader
  • ISSUE 179: Update maven-surefire-plugin dependency version to 2.17, also create an ...
  • ISSUE 175: VariantContext converter refactor
  • ISSUE 169: Cleaning up mpileup command
  • ISSUE 170: Adding variant field enumerations

Version 0.7.1

Version 0.7.3

Version 0.7.2

  • ISSUE 166: Pair-wise genotype concordance of genotype RDDs, with CLI tool

Version 0.7.0

  • ISSUE 171: Add back in allele dosage for genotypes.

Version 0.7.0

  • ISSUE 167: Fix for Hadoop 1.0.x support
  • ISSUE 165: call PluginExecutor in apply method, fixes issue 164
  • ISSUE 160: Refactoring FASTA work to break contig sizes.
  • ISSUE 78: Upgrade to Spark 0.9 and Scala 2.10
  • ISSUE 138: Display Git commit info on command line
  • ISSUE 161: Added switches to spark context creation code
  • ISSUE 117: Add a "range join" method.
  • ISSUE 151: Vcf work concordance and genotype
  • ISSUE 150: Remaining variant changes for adam2vcf, unit tests, and CLI modifications
  • ISSUE 147: Resurrect VCF conversion code
  • ISSUE 148: Moving createSparkContext into core
  • ISSUE 142: Enforce Maven and Java versions
  • ISSUE 144: Merge of last few days of work on master into this branch
  • ISSUE 124: Vcf work rdd master merge
  • ISSUE 143: Changing package declaration to match test file location and removing un...
  • ISSUE 140: Update README.md
  • ISSUE 139: Update README.md
  • ISSUE 129: Modified pileup transforms to improve performance + to add options
  • ISSUE 116: add fastq interleaver script
  • ISSUE 125: Add design doc to CONTRIBUTING document
  • ISSUE 114: Changes to RDD utility files for new variant schema
  • ISSUE 122: Add IRC Channel to readme
  • ISSUE 100: CLI component changes for new variant schema
  • ISSUE 108: Adding new PluginExecutor command
  • ISSUE 98: Vcf work remove old variant
  • ISSUE 104: Added the port erasure to SparkFunSuite's cleanup.
  • ISSUE 107: Cleaning up change documentation.
  • ISSUE 99: Encoding tag types in the ADAMRecord attributes, adding the 'tags' command
  • ISSUE 105: Add initial documentation on contributing
  • ISSUE 97: New schema, variant context converter changes, and removal of old genoty...
  • ISSUE 79: Adding ability to convert reference FASTA files for nucleotide sequences
  • ISSUE 91: Minor change, increase adam-cli usage width to 150 characters
  • ISSUE 86: Fixes to pileup code
  • ISSUE 88: Added function for building variant context from genotypes.
  • ISSUE 81: Update README and cleanup top-level cli help text
  • ISSUE 76: Changing hadoop fs call to be compatible with Hadoop 1.
  • ISSUE 74: Updated CHANGES.txt to include note about the recursive-load branch.
  • ISSUE 73: Support for loading/combining multiple ADAM files into a single RDD.
  • ISSUE 72: Added ability to create regions from reads, and to merge adjacent regions
  • ISSUE 71: Change RecalTable to use optimized phred calculations
  • ISSUE 68: sonatype-nexus-snapshots repository is already in parent oss-parent-7 pom
  • ISSUE 67: fix for wildcard exclusion maven warnings
  • ISSUE 65: Create a cache for phred -> double values instead of recalculating
  • ISSUE 60: Bugfix for BQSR: Offset into qualityScore list was wrong
  • ISSUE 66: add pluginDependency section and remove versions in plugin sections
  • ISSUE 61: Filter utility for inverse of Projection
  • ISSUE 48: Fix read groups mapping and add Y as base type
  • ISSUE 36: Adding reads to rods transformation.
  • ISSUE 56: Adding Yy as base in MdTag

Version 0.6.0

  • ISSUE 53: Fix Hadoop 2.2.0 support, upgrade to Spark 0.8.1
  • ISSUE 52: Attributes: Use 't' instead of ',', as , is a valid character
  • ISSUE 47: Adding containsRefName to SequenceDictionary
  • ISSUE 46: Reduce logging for the actual adamSave job
  • ISSUE 45: Make MdTag immutable
  • ISSUE 38: Small bugfixes and cleanups to BQSR
  • ISSUE 40: Fixing reference position from offset implementation
  • ISSUE 31: Fixing a few issues in the ADAM2VCF2ADAM pipeline.
  • ISSUE 30: Suppress parquet logging in FieldEnumerationSuite
  • ISSUE 28: Fix build warnings
  • ISSUE 24: Add unit tests for marking duplicates
  • ISSUE 26: Fix unmapped reads in sequence dictionary
  • ISSUE 23: Generalizing the Projection class
  • ISSUE 25: Adding support for before, after clauses to SparkFunSuite.
  • ISSUE 22: Add a unit test for sorting reads
  • ISSUE 21: Adding rod functionality: a specialized grouping of pileup data.
  • ISSUE 13: Cleaning up VCF<->ADAM pipeline
  • ISSUE 20: Added Apache License 2.0 boilerplate to tops of all the GB-(c) files
  • ISSUE 19: Allow the Hadoop version to be specified
  • ISSUE 17: Fix transform -sort_reads partitioning. Add -coalesce option to transform.
  • ISSUE 16: Fixing an issue in pileup generation and in the MdTag util.
  • ISSUE 15: Tweaks 1
  • ISSUE 12: Subclass testing bug in AdamContext.adamLoad
  • ISSUE 11: Missing brackets in VcfConverter.getType
  • ISSUE 10: Moved record field name enum over to the projections package.
  • ISSUE 8: Fixes to sorting in ReferencePosition
  • ISSUE 4: New SparkFunSuite test support class, logging util and new BQSR test.
  • ISSUE 1: Fix scalatest configuration and fix unit tests
  • ISSUE 14: Converting some of the Option() calls to Some()
  • ISSUE 13: Cleaning up VCF<->ADAM pipeline
  • ISSUE 9: Adding support for a Sequence Dictionary from BAM files
  • ISSUE 8: Fixes to sorting in ReferencePosition
  • ISSUE 7: ADAM variant and genotype formats; and a VCF->ADAM converter
  • ISSUE 4: New SparkFunSuite test support class, logging util and new BQSR test.
  • ISSUE 3: Adding in implicit conversion functions for going between Java and Scala...
  • ISSUE 2: Update from Spark 0.7.3 to 0.8.0-incubating
  • ISSUE 1: Fix scalatest configuration and fix unit tests