-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: MarkdownElementNodeProcessor does not pass 'exclude_llm_metadata_keys' and 'exclude_embed_metadata_keys' from document to nodes #13468
Comments
The issue you've encountered with To address this, you can modify the def _build_node_from_split(
self,
text_split: str,
node: BaseNode,
metadata: dict,
) -> TextNode:
"""Build node from single text split."""
node = build_nodes_from_splits([text_split], node, id_func=self.id_func)[0]
if self.include_metadata:
# Filter out specific metadata keys
filtered_metadata = {key: value for key, value in metadata.items() if key not in ['source_id', 'source_date']}
node.metadata = {**node.metadata, **filtered_metadata}
return node In this example, Please note, this solution requires modifying the LlamaIndex codebase, which might not be ideal for all situations and would need to be reapplied after any updates to the library.
|
@logan-markewich can you implement a fix that does not require modifying source code? ideally all node parsers / transformations inherit these qualities from base document |
Comes down to these two methods from what I can tell: build_nodes_from_splits in node utils:
versus:
|
…embed_metadata_keys' from Nodes to MarkdownElementNodeProcessor and UnstructuredElementNodeParser
Bug Description
When setting 'exclude_llm_metadata_keys' and 'exclude_embed_metadata_keys' of documents, they usually get passed to child nodes.
However, the MarkdownElementNodeParser does not exhibit this behaviour and inherits from BaseElementNodeBarser and does not pass these two parameters to child nodes
Version
0.10.30
Steps to Reproduce
Relevant Logs/Tracbacks
No response
The text was updated successfully, but these errors were encountered: