GitHub - Darveivoldavara/clustering_and_naming_categories: Summarization, clastering and characterization of text categories using LLM

Clustering and defining text categories

The presented examples demonstrate how LLM can be utilized for:

Extracting the brief essence from texts
Clustering texts into categories based on their content
Forming descriptions and characteristics of categories

Objective

The results obtained can be leveraged by businesses, for instance, to understand the most common inquiries made to customer service centers or technical support by clients and company employees.

Used tools

GPT 3.5 and GPT 4 were used depending on the volume of texts and the complexity of the task, as well as the final processing cost.

Additionally, on large datasets, KMeans was employed for clustering and RuBERT tiny 2 was used for generating text embeddings.

Receiving Q&A file based on Telegram messages

OpenAI API key setup

To get image descriptions from your chat, first, you need to set your OpenAI API key environment variable on your OS. Just run the following script in your command line and specify your API key:

bash setup_openai_key.sh

Telegram message history export

To retrieve your chat history in Telegram, go to the chat interface, click on the three dots for options at the top right corner, and select "Export chat history". Next, make sure to select "Format": JSON and other necessary parameters as needed. Specify the save path as "Path" to the root of this project, and you will have a similar folder named source with chat data.

Retrieving Q&A file

Then, you can run qa_extract.py:

python3 qa_extract.py

and the resulting qa.json file will appear in the data folder.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
notebooks		notebooks
source		source
tables		tables
tools		tools
.gitignore		.gitignore
README.md		README.md
qa_extract.py		qa_extract.py
setup_openai_key.sh		setup_openai_key.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

notebooks

notebooks

source

source

tables

tables

tools

tools

.gitignore

.gitignore

README.md

README.md

qa_extract.py

qa_extract.py

setup_openai_key.sh

setup_openai_key.sh

Repository files navigation

Clustering and defining text categories

Objective

Used tools

Receiving Q&A file based on Telegram messages

OpenAI API key setup

Telegram message history export

Retrieving Q&A file

About

Languages

Darveivoldavara/clustering_and_naming_categories

Folders and files

Latest commit

History

Repository files navigation

Clustering and defining text categories

Objective

Used tools

Receiving Q&A file based on Telegram messages

OpenAI API key setup

Telegram message history export

Retrieving Q&A file

About

Topics

Resources

Stars

Watchers

Forks

Languages