Skip to content
This repository has been archived by the owner on Sep 25, 2022. It is now read-only.

Add support for d:content properties #5

Open
pmonks opened this issue Dec 12, 2013 · 1 comment
Open

Add support for d:content properties #5

pmonks opened this issue Dec 12, 2013 · 1 comment

Comments

@pmonks
Copy link
Owner

pmonks commented Dec 12, 2013

Migrated from https://code.google.com/p/alfresco-bulk-filesystem-import/issues/detail?id=62

Need to add support for metadata properties of type d:content.

Note to self: this isn't as trivial as it sounds, at least if "reads during write transactions" are to be avoided. The problem is in figuring out which metadata properties are of type d:content - this requires a call to the DictionaryService, which in turn reads (i.e. SELECTs from) the database (or may, depending on a whole slew of factors).

Further note to self: we're already hitting DictionaryService to determine whether a property is multi-valued or not. Given that we're already taking the hit we might as well go ahead and implement this too.

Additional note to self: there may be some complexity in figuring out how to stream large content into a d:content field. I don't think we'd want it inline in the XML properties file, for example...

@pmonks pmonks removed this from the Version 2.0 milestone Jun 5, 2015
@pmonks
Copy link
Owner Author

pmonks commented Aug 23, 2018

The key question here is how to represent the data of d:content properties on disk, in a general-purpose, performant manner. There are two basic approaches, neither of which is an obvious "good" solution:

  1. Put the data inline in the XML, in which case you have to figure out how to handle binary data. BASE64 (the usual choice in XML land) is shockingly poor for this purpose as it approximately triples the size of the data, which causes a cascade of negative performance impacts (i.e. due to IO and memory bloat).
  2. As separate files on disk, with a "pointer" to that file in the XML. This quickly becomes thorny when you consider that the tool already has a complicated job to determine which files on disk represent net-new content vs metadata vs content versions vs metadata versions. Having d:content files floating out there as well only compounds this complexity. It should be noted that this complexity is in terms of code (i.e. is experienced at development time), not in terms of negative performance impacts (i.e. is not experienced at runtime).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant