Skip to content

maitree7/Tales_from_the_Cryptos_NLP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Tales from the Crypto - Natural Language Processing

Stock Sentiment

Background

As there's been a lot of hype in the news lately about cryptocurrency, we would like to invest, so to speak, of the latest news headlines regarding Bitcoin & Ethereum to get a better feel for the current public sentiment around each coin.

Using fundamental NLP techniques to understand the sentiment in the latest news article featuring Bitcoin & Ethereum and also other factors involved with the coin prices such as common words & phrases and organizations & entities mentioned in the articles.

Packages Used:


1. Sentiment Analysis

  • Use of Vader Sentiment Analysis

        from nltk.sentiment.vader import SentimentIntensityAnalyzer
        analyzer = SentimentIntensityAnalyzer()


2. Natural Language Processing


3. Named Entity Recognition


Files

Starter Notebook


btc_eth_analysis

Sentiment Analysis


1. Use of newsapi to pull the latest news articles for Bitcoin and Ethereum
btc_articles = newsapi.get_everything(q='bitcoin', language='en', sort_by='relevancy', )


2. Creation of Dataframe of Sentiment Scores for each coin

Bitcoin Ethereum


3. Descriptive statistics

Bitcoin Ethereum
  • Which coin had the highest mean positive score?

       Bitcoin - 0.07
  • Which coin had the highest negative score?

       Ethereum - 0.025 
  • Which coin had the highest positive score?

       Ethereum - 0.9198 
Natural Language Processing


1. Import the following Libraries from nltk:

```python
    from nltk.tokenize import word_tokenize, sent_tokenize
    from nltk.corpus import stopwords
    from nltk.stem import WordNetLemmatizer, PorterStemmer
    from string import punctuation
    import re
```


2. Use NLTK and Python to tokenize the text for each coin

  • Remove punctuation
        regex = re.compile("[^a-zA-Z0-9 ]")
        re_clean = regex.sub('', text)
  • Lowercase each word
        words = word_tokenize(re_clean.lower())
  • Remove stop words
        sw = set(stopwords.words('english'))
  • Lemmatize Words into Root words
        lemmatizer = WordNetLemmatizer()
        lem = [lemmatizer.lemmatize(word) for word in words]


3. Look at the ngrams and word frequency for each coin

  • Use NLTK to produce the ngrams for N = 2

        def get_token(df):
             tokens = []
        for i in df['tokens']:
            tokens.extend(i)
        return tokens
        btc_tokens = get_token(btc_sentiment_df)
        eth_tokens = get_token(eth_sentiment_df)
    
        #Generate the Bitcoin N-grams where N=2
        def bigram_counter(tokens, N):
        words_count = dict(Counter(ngrams(tokens, n=N)))
        return words_count
    
        bigram_btc = bigram_counter(btc_tokens, 2)
  • List the top 10 words for each coin

        # Use the token_count function to generate the top 10 words from each coin
        def token_count(tokens, N=10):
        """Returns the top N tokens from the frequency count"""
        return Counter(tokens).most_common(N)
Bitcoin Ethereum
  • Generate word clouds for each coin to summarize the news for each coin.
        from wordcloud import WordCloud
        import matplotlib.pyplot as plt
        plt.style.use('seaborn-whitegrid')
        import matplotlib as mpl
        mpl.rcParams['figure.figsize'] = [20.0, 10.0]

btc-word-cloud.png

eth-word-cloud.png

Named Entity Recognition


1. Import SpaCy and displacy
python import spacy from spacy import displacy # Load the spaCy model nlp = spacy.load('en_core_web_sm')
2. Build a named entity recognition model for both coins
python # Run the NER processor on all of the text doc = nlp(btc_content) # Add a title to the document doc.user_data["title"] = "BITCOIN NER"
3. Visualize the tags using SpaCy
python displacy.render(doc, style='ent')

btc-ner.png

eth-ner.png


4. List all Entities
python for ent in doc.ents: print('{} {}'.format(ent.text, ent.label_))

Bitcoin Ethereum