Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Build a transform to remove headers from code files #63

Open
1 of 2 tasks
Bytes-Explorer opened this issue May 3, 2024 · 0 comments
Open
1 of 2 tasks
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@Bytes-Explorer
Copy link
Collaborator

Search before asking

  • I searched the issues and found no similar issues.

Component

Other

Feature

Code files often have headers. These do not contain information relevant to LLMs, and may also contain PII. We want to build a new transform to remove this header information from code files. This transform should be built in such a way that it can work across 300+ programming languages. One possible way to do is that the transform takes as input as a configuration file with Programming language names and characters to used for commenting for that language. It should then identify the header information in various programming languages specified in the input configuration file and edit the files to remove the header information.

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!
@Bytes-Explorer Bytes-Explorer added enhancement New feature or request good first issue Good for newcomers labels May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

1 participant