All notable changes to this project will be documented in this file.
- New XML writer
MARC::UnsafeXMLWriter
which is 15-20 times faster than the default (rexml-based) writer. It mirrors code from the oldMARC::FastXMLWriter
gem in a way that integrates better with the existing writer framework. It can be used like any other writer, e.g.,writer = MARC::UnsafeXMLWriter. new(filename)
. Note that while it is "unsafe" in that it doesn't do checks for valid XML going out (it's speed comes from the fact that it's just concatenating strings together), theFastXMLWriter
gem has been used "in the wild" for years and doesn't seem to cause anyone any problems. - Added a new method,
MARC::Record.to_xml_string
which produces a valid<record>...</record>
XML snippet. It takes an optional keyword argument to include namespace attributes on the<record>
tag, and another to use the new unsafe generator asrecord.to_xml_string(fast_but_unsafe: true)
. - Added first-class support for
.jsonl
(aka "newline-delimited json") files using the marc-in-json format viaMARC::JSONLReader
andMARC::JSONLWriter
which read and write marc-in-json.ruby-marc
has supported#to_hash
and#from_hash
to deal with this format at the individual record level for a long time; this just provides the reader/writer scaffolding. - Also added
MARC::Record.to_json_string
to get a marc-in-json string representation (parallel to the new#to_xml_string
) - New option to xml readers to ignore any namespaces
via
reader = MARC::XMLReader.new(filename, ignore_namespace: true)
. While the REXML MARC-XML reader can't handle (and thus has always ignored XML namespaces), the Nokogiri-based version will enforce namespaces if present. Useful only when you have poorly-generated files where the XML namespace attributes are wonky. - All writers will now self-close if used with a block (e.g.,
MARC::Writer.new(filename) {|w| w.write(record)}
), parallel to the wayFile.open
works in regular ruby. - XML writers will now take an optional keyword argument,
include_namespace
, on both#new
and.encode
.
- Remove the
JREXML
parser, which apparently hasn't worked for years yet also wasn't running in CI because the test are running under bundler, which didn't loadjrexml
. Set to emit a warning to use nokogiri instead and fall back to REXML. - 10-15% speed improvement when parsing MARC-XML with nokogiri (PR #97, billdueber)
- Added deprecation warnings when using the
libxml
,jstax
, orjrexml
xml parsers. When introduced, Nokogiri under JRuby was iffy. It's now stable on both MRI and JRuby and faster than any of the other included options and should be preferred. (PR #98, billdueber) - MARC fields are now validated in their own post-creation stage (PR #66, cbeer)
- Reduce the noise when running tests (billdueber)
- Reformatted this CHANGELOG.md file and added examples/structure to README.md.
- MARC-XML has requirements on the leader that are applied when writing out
MARC-XML by
MARC::XMLWriter.encode
. Previous versions would actually mutate the record being written, resulting in a silent modification to a record just because you were writing it out. Changed to use a duplicate (PR #73, cbeer) - Guard against multiple character calls when parsing XML (PR #74, cbeer)
- Minor Dublin Core code fixes (PRs #83 and #84, fjorba)
JRubyStaxReader
now supports Java 9+ / JRuby 9.3+ (PR #87, dmolesUC)
- Fix a regression when normalizing indicator values when serializing marcxml
- Add support for additional valid subfield codes in marcxml
- Now (correctly) throw an error if datafield string is the empty string (thanks to @bibliotechy)
- Non-user-facing change in implementation of FieldMap strictly for performance
- Mostly changes that deal with encoding, plus the plunge to a 1.0 release
- Extensive rewrite of MARC::Reader (ISO 2709 binary reader) to provide a
fairly complete and consistent handing of char encoding issues in ruby 1.9.
- This code is well covered by automated tests, but ends up complex, there may be bugs, please report them.
- May not work properly under jruby with non-unicode source encodings.
- Still can't handle Marc8 encoding.
- May not have entirely backwards compatible behavior with regard to char encodings under ruby 1.9.x as previous 0.4.x versions. Test your code. In particular, previous versions may have automatically transcoded non-unicode encodings to UTF-8 for you. This version will not do so unless you ask it to with correct arguments.
- Fixed performance regression: strict reader will parse about 5x faster now
- Updated CHANGES file for first time in a long time :-)
- Nokogiri and jrexml parser integration added as well as Ruby 1.9 support
- DataField tags that are all numeric are now padded with leading zeros
- can now process records that have fields tags that are non-numeric (thanks Ross Singer)
- added newline to output generated by REXML::Formatters::Default to make it a bit more friendly. REXML::Formatters::Pretty and Transitive just don't do what I want (whitespace in weird places).
- small docfix change in XMLReader
- use REXML::Formatters::Default instead of deprecated REXML::Element.write
- added examples directory
- fixed problem with leading whitespace and the leader in xml reader (thanks Morgan Cundiff)
- updated Record.to_marc documentation to be a bit more precise
- removed doc references to MARC::Field which is no longer around
- changed from Artistic to MIT License
- fixed bad record length test
- removed MARC::XMLWriter convert_to_utf8 which wasn't really working and shouldn't be there if it isn't good
- added unescaping of entities to MARC::XMLReader
- docfix in MARC::DataField (thanks Jason Ronallo)
- multiple docfixes (thanks Jonathan Rochkind)
- fixed bug in MARC::XMLWriter that was outputting all control field tags as 00z (thanks Ross Singer)
- added :include_namespace option to MARC::XMLWriter::encode to include the marcxml namespace, which allows MARC::Record::to_xml to emit the namespace for a single record.
- added ability to map a MARC record to the Dublin Core fields. Calling to_dublin_core on a MARC::Record returns a hash that has Dublin Core fields as the hash keys.
- fixed MARC::Record::to_xml so that it actually is tested and works (thanks Ross Singer)
- added ability to pass File like objects to the constructor for MARC::XMLReader like MARC::Reader (thanks Jake Glenn)
- fixed pretty xml when stylesheet is used
- added value() to MARC::DataField
- added Rakefile for testing/building
- changed XMLWriter.write to output pretty-printed XML
- normalized Text in XML output
- added XMLWriter checks and replacements for bad subfield codes and indicator values
- added XMLWriter check and replacement for invalid control codes in xml data values
- added XMLWriter checks for values in the leader that are invalid MARCXML
- added bin/marc2xml
- collapsed tc_xmlreader.rb tc_xmlwriter.rb into tc_xml.rb for full write/read test.
- added :stylesheet argument to XLMWriter.new
- removed control tests out of tc_field.rb into tc_control.rb
- fixed some formatting
- changed control/field to controlfield/datafield
- added == check for controlfield
- removed namespace declarations on record elements in favor of default namespace on collection element
- added spaces around subfield code and delimeter in to_s
- fixed up relevant tests that were expecting old formatting
- fixed xmlreader strip_ns which was rerturning Nil when no namespace was found on an element (exposed by namespace changes).
- MARC::XMLWriter added
- removed encode/decode methods in MARC::MARC21 into MARC::Writer and MARC::Reader respectively. This required pushing MARC21 specific constants out into MARC::Constants which is required as necessary.
- moved encode from MARC::MARXML into MARC::XMLWriter and added constants to MARC::Constants
- added MARC::XMLReader for reading MARX as XML
- added xml reading tests
- fixed indentation to be two spaces
- MARC::MARC21::decode throws an exception when a directory can't be found. Exception is caught and ignored in MARC::ForgivingReader
- when unspecified field indicators are forced to blanks
- checking for when a field appears to not have indicators and subfields in which case the field is skipped entirely
- fixed off by one error when reading in leader, previous versions were reading an extra character
- added ForgivingReader class and support for reading records without using possibly faulty offsets when the user needs them.
- updated version string to see if it'll fix some gem oddness
- initial release