Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
-
Updated
May 24, 2024 - Python
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
A WikiText table parser written in Rust.
Sentiment Analysis using Machine Learning
Integration of a trained sentiment classification model into a Flask web app for real-time inference on product reviews from Flipkart store.
Predicting job salary level from full job description
A Python package with ready-to-use models for various NLP tasks and text preprocessing utilities. The implementation allows fine-tuning.
This project aims to perform sentiment analysis on a Twitter dataset using Convolutional Neural Networks (CNNs). The goal is to classify tweets into positive, negative, or neutral sentiments.
Panda is a Pandoc Lua filter that works on internal Pandoc's AST. Panda is heavily inspired by [abp](http:/cdelord.fr/abp) reimplemented as a Pandoc Lua filter.
Results of a Data analytics project at TH Wildau. Created with Orange data analytics tool, Data source: https://www.kaggle.com/datasets/PromptCloudHQ/us-jobs-on-monstercom
Successfully established a Seq2Seq with attention model which can perform English to Spanish language translation up to an accuracy of almost 97%.
Successfully developed a fine-tuned DistilBERT transformer model which can accurately predict the overall sentiment of a piece of financial news up to an accuracy of nearly 81.5%.
Successfully developed a fine-tuned BERT transformer model which can accurately classify symptoms to their corresponding diseases upto an accuracy of 89%.
Tensor Extraction of Latent Features (T-ELF). Within T-ELF's arsenal are non-negative matrix and tensor factorization solutions, equipped with automatic model determination (also known as the estimation of latent factors - rank) for accurate data modeling. Our software suite encompasses cutting-edge data pre-processing and post-processing modules.
기관 협업 공동 연구(23.08~12) - 한국기초과학지원연구원(Korea Basic Science Institute)
Documents classification using KNN Algorithm a graph based approach along with scrapped data
NLP
Article title, authors, date and body extraction dataset.
This repository serves as a comprehensive resource for learning and implementing Natural Language Processing (NLP) techniques. The content is organized to provide an understanding of NLP challenges, real-world applications, and various approaches used to solve NLP use cases.
Add a description, image, and links to the text-preprocessing topic page so that developers can more easily learn about it.
To associate your repository with the text-preprocessing topic, visit your repo's landing page and select "manage topics."