TASE (Telegram Audio Search Engine)

A lightning fast audio full-text search engine on top of Telegram

It allows users to quickly and easily find information that is of genuine interest or value, without the need to wade through numerous irrelevant channels. It provides users with search results that lead to relevant information on high-quality audio files.

What makes TASE special?

TASE is a growing open source full-text audio search engine platform that serves high-volume requests from users. Based on Python and Telegram, the latest major update introduces many new features among which a highly abstracted and modular design pattern powered by Elasticsearch and ArangoDB with support for parallel clusters on different servers located in different parts of the world.

TASE at a glance

Advanced full-text search engine for audio files
Extremely fast audio file indexer (benchmark: minimum 4 million songs per day per client)
Support for multiple parallel clients as indexer
Support for distributed parallel clusters on multiple servers (searching and indexing) (all audio files, graph and document models)
Graph of users and items
Dynamic URLs
Asynchronous
Reach admin tools
Multilingual
Audio file caching
Easy configuration and customization
Friendly look and feel

TASE is free and always will be. Help us out… If you love free stuff and great software, give us a star! :star::star2:

How to install and run

* Note: please make sure to read the configuration and customization section before you run the project

There are two different ways to use TASE

(*note: before running the project make sure to configure the tase.json and .env files)

Clone the repository
Setting up services:
1. Manually install the dependencies
  1. Install Elasticsearch (v8.3) (instructions)
  2. Install ArangoDB (v3.9.1) (instructions)
  3. Install RabbitMQ (instructions)
  4. Install Redis (instructions)
2. Run using docker compose
  The easier method (recommended) (*note: before running the project make sure to configure the tase.json file)
```
docker compose up -d
```
  * install docker compose if you haven't already (instructions)
```
poetry install
```
* install poetry if you haven't already (instructions)
Run the tase_client.py file located in the tase package

Configuration and customization

Before you run your project you need to customize the tase.json file in the root directory which is used as the config file by TASE

In order to run the project you have to provide basic information which the bot works with. For instance you must provide telegram bot token and your Telegram client authentication information to run your own clients.

Features

Features for developers

Add new languages in locales (we recommend using Poedit)
Easily add new buttons and functionalities (query and inline) by implementing the abstract methods in the base button class
Realtime visualizations for graph models and audio files (Kibana, ArangoDB)
Abstraction and facade design pattern

Wide Range of Features 💡

Search engine

Search audio files through the direct bot search
Search audio files from groups and private chats using @bot_name mention and send them directly to the chat
Real-time search using @bot_name mention, by showing an inline list of results
Real-time search directly in the private and group chats
Search based on file-name, performer name, and audio-name
Shows the top 10 relevant results in a message and unlimited in the more results; returned as an inline list
Play the songs in the inline lists before downloading them
Caches searched audio files to avoid unnecessary redundant DB requests
Dynamic URL for the results
Allows the owner to trace the downloaded audio files
High accuracy and relevance
Search in a wide variety of languages
Show the source-channel name and the link to the file
Sort results in reverse mode (to make more relevant results in the bottom)

Indexing features

Automatically finds new channels in an optimistic way (first assumes it is a valid channel and validates it later before starting to index)
1. Extract from texts and captions
2. Extract from "forwarded mention"
3. Extract from links
Automatically indexes new channels
Iterates through previous channels and resumes indexing from the previous checkpoint
Extremely fast indexing (minimum 4 million songs per day per client
Analyzes channels and calculates a score (0-5) based on their
1. Density of audio files (ratio of audio files
2. Activity of the channel (how frequent it shares new files)
3. Number of members
Avoids getting banned by the Telegram servers
Support for parallel indexing using multiple Telegram clients
Hashes the file IDs in a specific way that avoids conflicts to a high degree and still keeps them as short as eight characters
Users and channel owners can send request to index a specific channel useing "/index channel_name"
Constructs a graph for users and audio files in real time which can be used for recommendation systems and link prediction tasks

User limiting/controlling features

Handle user membership in your channel(s) in near real-time
Set limitations for users based on their membership status
Limits not-a-member users to search 5 audio files freely, and then they should wait for one minute until they receive their searched audio files
Not members have limitations with direct in-chat searches

User interface

User guide
Multiple menus (home, help, playlist etc.)
A keyboard for each part to ease the process for users
Multilingual bot - currently supported:
- 🇺🇸 English
- 🇪🇸 Spanish
- 🇷🇺 Russian
- 🇦🇪 Arabic
- 🇧🇷 Portuguese
- 🇮🇳 Hindi
- 🇩🇪 German
- 🇹🇯 Kurdish (Sorani)
- 🇹🇯 Kurdish (Kurmanji)
- 🇳🇱 Dutch
- 🇮🇹 Italian
- 🇮🇷 Persian
Greeting messages to users based on their activity if they haven't been active for more than a week or more than two weeks
Shows search history for each user through a scrollable inline list by pressing history button in the home keyboard
Beautiful and vibrant user interface (messages and emojis)

Playlists
1. Users can have unlimited playlists and save unlimited audio files in each
2. Users can edit playlist meta-data
3. Users can edit saved audio files

Admin features

Real-time graph visualization (supports ArangoDB dashboard)
Real-time indexed audio file visualization (supports Kibana dashboard)
* Kibana is a data visualization and exploration tool used for log and time-series analytics, application monitoring, and operational intelligence use cases. It offers powerful and easy-to-use features such as histograms, line graphs, pie charts, heat maps, and built-in geospatial support.

Other

Extremely fast
Documentation is provided in the codes (docstring)
Handles database related exceptions
Multi-threaded search (searches multiple requests asynchronously)
Handles RTL texts perfectly

Technology stack

Main tools & technologies used in developing TASE are as following:

Elasticsearch
ArangoDB
Pyrogram
Python get_text
Celery
RabbitMQ
Redis
Pydantic
Jinja

Call for Contributions

We welcome your expertise and enthusiasm!

Ways to contribute to Telegram audio search engine:

Writing code
Review pull requests
Develop tutorials, presentations, documentation, and other educational materials
Translate documentation and readme contents

We love your contributions and do our best to provide you with mentorship and support. If you are looking for an issue to tackle, take a look at issues.

Issues

If you happened to encounter any issue in the codes, please report it here. A better way is to fork the repository on Github and/or to create a pull request.

Future work

Voice search
Add artist support
[ ]

If you found it helpful, please give us a ⭐

License

TASE is licensed under the Apache License, Version 2.0. See LICENSE for the full license text.

Copyright © 2020-2022

Soran Ghaderi (soran.gdr.cs@gmail.com)
- Personal website: soran-ghaderi.github.io
- Linkedin: Soran-Ghaderi
- Twitter: SoranGhadri
Taleb Zarhesh (taleb.zarhesh@gmail.com)
- Linkedin: Taleb Zarhesh
- Twitter: Taleb Zarhesh

Name		Name	Last commit message	Last commit date
Latest commit History 1,686 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
aioarango		aioarango
images		images
locales		locales
tase		tase
.env_sample		.env_sample
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
_config.yml		_config.yml
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
sample_tase.json		sample_tase.json

License

appheap/TASE

Folders and files

Latest commit

History

Repository files navigation

TASE (Telegram Audio Search Engine)

A lightning fast audio full-text search engine on top of Telegram

Contents

What makes TASE special?

TASE at a glance

TASE is free and always will be. Help us out… If you love free stuff and great software, give us a star! :star::star2:

How to install and run

There are two different ways to use TASE

Clone the repository

Setting up services:

Manually install the dependencies

Run using docker compose

poetry install

Configuration and customization

Features

Features for developers

Wide Range of Features 💡

Search engine

Indexing features

User limiting/controlling features

User interface

Playlists

Admin features

Other

Technology stack

Call for Contributions

Issues

Future work

If you found it helpful, please give us a ⭐

License

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages