Semascope 👁️ - Text Visualization Tool

Semascope 👁️ is a JavaScript-based text visualization tool that utilizes Plotly JS to graphically represent relationships between words in a collection of texts. This tool provides an interactive 3D scatter plot where each point represents a word, and the axes correspond to different properties of the words.

Demo

Explore the tool in action at textvisualization.app.

Dependencies

Plotly JS: Ensure you load Plotly JS for the script to work correctly.
Corpus_utils: This tool relies on a text mining dataset in tab-separated format prepared by corpus_utils. Make sure to check out corpus_utils for a new statistical method of extracting keywords from a vast collection of texts.

Usage

Clone the repository.
Load Plotly JS by including the Plotly JS library.
Ensure you have the necessary data in tab-separated format, prepared using Corpus_utils.
Open the index.html file in a web browser.

Global Variables

maxClusterValue: A global variable to store the maximum cluster value.
parsedData: A variable to hold the parsed tab-separated dataset.

Functions

updatePlot(): Fetches data from a specified URL, applies user-defined filters, and updates the 3D scatter plot.
removeFilters(): Resets filter values to their defaults and updates the plot.
heros(), clue(): Sets predefined filter values for specific scenarios and updates the plot.
setMaxClusterValue(): Sets filter values to display the maximum cluster value and updates the plot.
setreduceDFvalue(): Sets filter values to reduce the document frequency and updates the plot.
searchAndUpdatePlot(): Updates the plot based on a user-inputted search word.

Filter Controls

The user can filter data based on parameters such as 'k', 'p', 'DF', 'fr', and 'cluster'.
Buttons like 'heros()', 'clue()', 'setMaxClusterValue()', 'setreduceDFvalue()' provide convenient ways to apply specific filters.

Plot Settings

The plot is a 3D scatter plot with adjustable width and height.
Various axes represent different properties like 'k', 'p', and 'Doc Frequency (DF)'.
Color and size of markers represent the cluster and frequency values, respectively.
The plot allows interactive exploration with hover labels displaying relevant information.

Customization

Feel free to customize the tool according to your dataset and requirements by modifying the filter controls, plot settings, and layout parameters in the JavaScript code.

Dataset

For the JS viewer to work, you need to input a dataset, which is a tab separated text file in such format, prepared with Corpus_utils.

The header parameters are explained on textvisualization.app. Here is the formatted table:

word	k	p	sqrt_kp	sum	DF	DFtoWF	cluster	pq
the	1.28857792385278	0.00566397049317553	1.28859037184145	4115	18	0.00437424058323208	2	0.00569623379330303
emperor	0.0276737248260626	0.0123267537320039	0.0302949484785517	40	2	0.05	7	0.0124805990023336
new	0.27368851562244	0.0948757865655114	0.289666736889203	47	8	0.170212765957447	5	0.104820736377724
clothes	0.608283594968966	0.274077127474438	0.667178539607658	29	10	0.344827586206897	4	0.377556814708007
many	0.993549684902053	0.226712673376116	1.01908763736899	61	14	0.229508196721311	3	0.293180381431992
years	0.333012627916599	0.222055107299219	0.400257268553098	21	7	0.333333333333333	4	0.285438093858181
ago	0.572215818866319	0.631898852497187	0.852482904900804	6	4	0.666666666666667	10	1.71664461462283
...

Explanation of the columns:

word: The word in the dataset.
k: Negative Binomial k-value related to the word.
p: Negative Binomial p-value related to the word.
sqrt_kp: Square root of the product of k and p.
sum: Sum of values related to the word.
DF: Document Frequency.
DFtoWF: Document Frequency to Word Frequency ratio.
cluster: Cluster value.
pq: Product of p and q.

index.html Overview

The index.html file serves as the entry point for the Semascope 👁️ Online Text Visualization App. This HTML file integrates the necessary styles, scripts, and components to create an interactive and visually engaging environment for exploring textual data.

Key Features:

Navigation: The file includes a navigation bar for easy access to different sections of the app.
Data Loading: It incorporates buttons to filter and load data from an external dataset (dataset/NBD1597.tab).
Visualization: Utilizes Plotly, a JavaScript graphing library, through the included plot.js script to create interactive visualizations of the loaded textual data.
User Interaction: Provides user-friendly features such as search functionality, data filtering, and progress indicators.
Help Section: Includes a descriptive section that guides users on how to interact with the visualization, interpret the data, and download the dataset.

index.html file is provided here for illustration purposes and may need customization based on your specific use case or integration requirements.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
dataset		dataset
README.md		README.md
index.html		index.html
package.json		package.json
plot.js		plot.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataset

dataset

README.md

README.md

index.html

index.html

package.json

package.json

plot.js

plot.js

Repository files navigation

Semascope 👁️ - Text Visualization Tool

Demo

Dependencies

Usage

Global Variables

Functions

Filter Controls

Plot Settings

Customization

Dataset

index.html Overview

About

Languages

roverbird/plotly_semascope

Folders and files

Latest commit

History

Repository files navigation

Semascope 👁️ - Text Visualization Tool

Demo

Dependencies

Usage

Global Variables

Functions

Filter Controls

Plot Settings

Customization

Dataset

index.html Overview

About

Topics

Resources

Stars

Watchers

Forks

Languages