Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
-
Updated
May 24, 2024 - Python
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Remove extra whitespace from text.
A Python package with ready-to-use models for various NLP tasks and text preprocessing utilities. The implementation allows fine-tuning.
文本挖掘和预处理工具(文本清洗、新词发现、情感分析、实体识别链接、关键词抽取、知识抽取、句法分析等),无监督或弱监督方法
Sentiment Analysis For Restaurant Reviews
Language-Detection
ValX is an open-source Python package for text cleaning tasks, including profanity detection and removal. Now also includes sensitive information detection, and removal.
Extract text content from an HTML page, process it, and extract unique words from the processed text. This notebook utilizes various text processing techniques including cleaning, normalization, tokenization, lemmatization or stemming, and stop words removal.
Article title, authors, date and body extraction dataset.
NLP
NLP预/后处理工具。
A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one package
The recommendation that recommends the right candidates to the recruiters to a job applicantion. The content is the personal information and their job desires. Implementation of a recommender system based using filtering techniques and Natural language processing to recommend top jobs based on similarity.
Semantic Enrichment, Data Augmentation and Deep Learning for Boosting Invoice Text Classification Performance: A Novel Natural Language Processing Strategy
In this project, I utilized the TripAdvisor Hotel Review dataset from Kaggle to perform sentiment analysis on hotel reviews. The main objective was to build a predictive model using LSTM (Long Short-Term Memory) neural networks to classify hotel reviews as positive or negative based on their textual content.
Repo with basic start on Recurrent Neural Networks, Word2Vec, Doc2Vec, TFIDF vectors and NLP basics
👀 Everything Everyway All At Once Text Preprocessing for Natural Language Processing.
JS / Python3 / PHP Lib to work with UTF8 polytonic greek and latin
A Python package to get useful information from documents using TopicRank Algorithm.
🖹 Offline Text Cleaner and Formatter
Add a description, image, and links to the text-cleaning topic page so that developers can more easily learn about it.
To associate your repository with the text-cleaning topic, visit your repo's landing page and select "manage topics."