README

Motivation

I am faszinated by radioWissen podcasts from BR2. They provide a wide range of topics: History, literature, music, mythology, religions, philosophy, psychology, nature, biology, economy and politics. Currently (2019-10-14) there are 2068 podcasts online. Therefore it is sometimes hard to find podcasts that I want to listen to based on quite specific query. Since I like the idea of embeddings to represent content, I have written a function to inspire me what I can listen to next.

The embeddings are based on feed-forward Neural-Net Language Models with 128-dimensional embedding vectors from tfhub.dev.

Usage of radioWissen search function (radioWissen usage)

The idea is to get an inspiration which podcasts -- with similar content -- are available.

Find top_n closest podcasts to search_string.

$ python quick_search_radiowissen.py --search_string="Klassizismus" --top_n=15
Searching for best matches...
                                                    Klassizismus
Gottfried Semper - Architekt und Revolutionär           0.731763
Hieronymus Bosch - Im Lustgarten der Dämonen            0.691967
Humanismus, Renaissance, Reformation - Aufbruch...      0.679882
Béla Bartók- Meister der Moderne                        0.666955
Henri de Toulouse-Lautrec - Genie der Plakatkunst       0.666286
Igor Strawinsky - Der Schock der Moderne                0.662641
Architekt und Bauhausgründer - Walter Gropius           0.661809
Jugendstil - Natur als Kunst, Schönheit als Rev...      0.657266
Ludwig II. - Der Mondkönig                              0.654857
Merkantilismus - Zölle für den Adelsprunk               0.649388
Versailles - Musik am Hof des Sonnenkönigs              0.649136
Renaissance-Gärten - Kunstwerk aus Himmel, Erde...      0.648772
Herrlichkeit und Katzenjammer - Die Epoche der ...      0.648647
Der Englische Garten - Münchens grünes Herz             0.646806
Das frühe Byzanz - Ein christliches Imperium im...      0.645438

Most of the results are related to architecture or at least art.

Demo (general usage)

Use the code for your own data.

Find clostest match from list of sentences (or list of single words) to a search query.

1. Imports

#Suppress tensorflow infos, deprecation warnings
import warnings
warnings.filterwarnings("ignore")
import tensorflow as tf
tf.logging.set_verbosity(1000)
tf.__version__ # Tested with '1.13.1', does not work with 2.x
# 
import embeddings as es

2. Get Embeddings

Embed sentences or a list of words.

e = es.TfHubEmbedder()
data = ["Auto", "Bus", "Caesar"]
e.embed(data)
print("Embeddings' shape: {}".format(e.embeddings.shape))
e.save("data/demo_emb.npy", "data/demo_titles.npy")
e.load("data/demo_emb.npy", "data/demo_titles.npy")
> Embeddings' shape: (3, 128)

3. Find ClosestEmbedding

Compare embeddings from (2) with your search_string.

ce = es.ClosestEmbedding(e.embeddings, e.titles)
sim_df = ce.find_closest(search_string="VW")
ce.plot_result()
print(sim_df)
>               VW
> Auto    0.720517
> Bus     0.602985
> Caesar  0.468566

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
README.md		README.md
embeddings.py		embeddings.py
quick_search_radiowissen.py		quick_search_radiowissen.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

README.md

README.md

embeddings.py

embeddings.py

quick_search_radiowissen.py

quick_search_radiowissen.py

requirements.txt

requirements.txt

Repository files navigation

README

Usage of radioWissen search function (radioWissen usage)

Demo (general usage)

1. Imports

2. Get Embeddings

3. Find ClosestEmbedding

About

Releases

Packages

Languages

phiyodr/embedding_search

Folders and files

Latest commit

History

Repository files navigation

README

Usage of radioWissen search function (radioWissen usage)

Demo (general usage)

1. Imports

2. Get Embeddings

3. Find ClosestEmbedding

About

Topics

Resources

Stars

Watchers

Forks

Languages