Skip to content
View ZILiAT-NASK's full-sized avatar
Block or Report

Block or report ZILiAT-NASK

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
ZILiAT-NASK/README.md

Linguistic Engineering and Text Analysis Department

Welcome to the official GitHub repository of the Linguistic Engineering and Text Analysis Department at NASK (National Research Institute)! 🌐 This repository houses our projects, research papers, and tools related to linguistic engineering, natural language processing, and text analysis. We aim to advance the field of linguistics and language technology through our collaborative efforts. πŸš€

About Us

The Linguistic Engineering and Text Analysis Department is dedicated to exploring and harnessing the power of language in various applications. Our team of linguists, data scientists, and software engineers work together to develop innovative solutions for text analysis, information extraction, summarization, text classification, and much more. πŸ“šπŸ§ πŸ’»

Projects

  1. Polish Dataset of Banned Harmful and Offensive Content from Wykop.pl web service

BAN-PL

  1. StyloMetrix s a powerful tool that enables the creation of text representations in the form of StyloMetrix vectors. Each metric in the vector quantifies a specific linguistic feature, allowing for a detailed analysis of the text's style through numeric values. With the ability to customize metrics, StyloMetrix is a versatile solution for tasks such as stylometric analysis, machine learning classifiers, statistical analyses, and linguistic reference. Available for Polish, English, and Ukrainian.

  2. Summarizer is an innovative tool designed for generating concise and informative summaries of text documents. Using advanced natural language processing techniques, Summarizer distills the key points and main ideas from lengthy texts into coherent summaries.

  3. PrivMasker is a tool for anonymizing personal and sensitive data in documents. Depending on the text type and user preferences, an optional selection of masked components is available, including names, contact details (phone numbers, email addresses), physical addresses, dates, identification numbers, and monetary amounts.

Research Papers

Our department actively contributes to the scientific community through research papers published in top-tier conferences and journals. Some of our recent papers include:

  • "Styles with Benefits. The StyloMetrix Vectors for Stylistic and Semantic Text Classification of Small-Scale Datasets and Different Sample Length" - Published at PPRAI 2022. You can find the paper here. πŸ“πŸ”¬πŸŒ
  • "The Grammar and Syntax Based Corpus Analysis Tool For The Ukrainian Language". Access the paper here. πŸ“πŸ”ŽπŸ”„
  • "Team Up! Cohesive Text Summarization Scoring Sentence Coalitions" - Published at ICAISC 2020. You can find the paper here. πŸ“πŸ”¬πŸŒ
  • "BAN-PL: a Novel Polish Dataset of Banned Harmful and Offensive Content from Wykop.pl web service". 2023. arXiv:2308.1059. Access the paper here. πŸ“
  • "Towards Harmful Erotic Content Detection through Coreference-Driven Contextual Analysis". 2023. arXiv:2310.14325. Access the paper here. πŸ“
  • "StyloMetrix: An Open-Source Multilingual Tool for Representing Stylometric Vectors". 2023. arXiv:2309.12810. Access the paper here. πŸ“

Contribution Guidelines

We welcome contributions from the open-source community to enhance our projects and advance the field of linguistic engineering. If you are interested in contributing, please follow our guidelines outlined in the CONTRIBUTING.md file of each project repository. πŸ™ŒπŸ”§πŸ“

Contact Us

For any inquiries, collaborations, or questions, feel free to reach out to us:

Email: ziliat@nask.pl βœ‰οΈ

Website: https://www.science.nask.pl

Pinned

  1. StyloMetrix StyloMetrix Public

    StyloMetrix

    Python 33 3

  2. PrivMasker PrivMasker Public

    Python 1