Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker revamp #19

Open
wants to merge 58 commits into
base: master
Choose a base branch
from
Open

Docker revamp #19

wants to merge 58 commits into from

Conversation

heyqule
Copy link

@heyqule heyqule commented Sep 24, 2019

Hey Chris

Here are the list of major changes. Let me know if there are things that needed to be change.

major changes

  • Dockerized the system. CLI scripts are retired.
    • All settings are fetched from src/config.yml.
  • Replaced ElasticSearch 5.6 with ElasticSearch 7.3.
  • Added Redis for caching.
  • Automated requirements installation and kibana dashboard setup.
  • Converted original scripts to modules and classes to simplify the process of building new extensions
  • Data mapping have changed.
    • Each Symbol has it's own set of index. One for sentiment and one for price.
    • See src/Stocksight/EsMap for mapping details
  • Sentiment and price crawlers are spawned concurrently based on your specified stock symbols.
  • Improved memory footprint by spawning python instances when it's needed.

added

  • Added SeekingAlpha crawler
  • Added integration test cases
  • Added support for generating random proxy and random user-agent.
    • may not be useful for sophisticated blockers.

current issues:

  • SeekingAlpha blocks frequent accesses with 403. Follow_link is disabled for it.
  • It doesn't have data upgrade migration.


while True:

if self.isNotLive(eastern_timezone):
#logger.info("Stock market is not live. Current time: %s" % datetime.datetime.now(timezone).strftime("%Y-%m-%d %H:%M"))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currentTime = datetime.datetime.now(timezone).strftime("%Y-%m-%d %H:%M") logger.info(f"Stock market is not live. Current time: {currentTime}")

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's from an outdated commit.

logger.info("Grabbing stock data for symbol %s..." % symbol)

try:

# add stock symbol to url
url = re.sub("SYMBOL", symbol, url)
url = regex.sub("SYMBOL", symbol, url)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice touch, improves readability.

@shaggy63
Copy link

shaggy63 commented Oct 8, 2019

Line 61 of src/StockSight/StockPriceListener.py
You have D = []
which produces "list indices must be integers or slices, not str"
Shouldn't line 61 be
D = {}

@shaggy63
Copy link

Tweets arn't showing in Kibana.
Shouldn't line 145 in src/StockSight/TweetListener.py
"_id": redis_id,
be
"msg_id": redis_id,

Copy link

@caseyjkey caseyjkey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review considering the amount of commits for the single pull request. Consider creating feature branches more frequently so Chris can approve them faster. Thank you for dockerizing.

parser.add_argument("-f", "--frequency", metavar="FREQUENCY", default=600, type=int,
help="How often in seconds to retrieve stock data (default: 120 sec)")
parser.add_argument("-f", "--frequency", metavar="FREQUENCY", default=price_frequency, type=int,
help="How often in seconds to retrieve stock data (default: %d sec)" % price_frequency)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

help=f"How often in seconds to retrieve stock data (default: {price_frequency} sec)"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is no longer applicable in the latest commit.

src/sentiment.py Outdated
parser.add_argument("--frequency", metavar="FREQUENCY", default=3600, type=int,
help="How often in seconds to retrieve news headlines (default: 3600 sec)")
parser.add_argument("--frequency", metavar="FREQUENCY", default=sentiment_frequency, type=int,
help="How often in seconds to retrieve news headlines (default: %d sec)" % sentiment_frequency)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider using f string instead of old school variable replacement, like so: f"How often in seconds to retrieve news headlines (default: {sentiment_frequency} sec)"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is no longer applicable in the latest commit.

src/sentiment.py Outdated

self.headlines = new_headline

if len(self.followedlinks) > self.max_cache:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the loop be replaced with a array slice like: self.headlines = self.headlines[self.max_cache / 2:]?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is not applicable in the latest commit.

today = datetime.datetime.now(eastern_timezone)
logger.info("Stock market is not live. Current time: %s" % today.strftime('%H'))
logger.info("Stock market is not live. Current time: %s" % today.strftime("%Y-%m-%d %H:%M"))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logger.info(f"Stock market is not live. Current time { today.strftime('%Y-%m-%d %H:%M') }")

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know about this lol.

@@ -1,4 +1,4 @@
FROM python:3
FROM python:3-alpine

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why alpine?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small image foodprint. Should I use the regular container? Regular one is probably easier to add new features.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's up to you. I'm reviewing to learn more about the project and also about Docker. I haven't dockerized anything before.

src/delindex.py Outdated
from Initializer.LoggerInit import *
from Initializer.ElasticSearchInit import es
from Sentiment.Initializer.ElasticSearchInit import es
from Sentiment.Initializer.LoggerInit import *

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for alphabetizing. Good clean code principles.

@@ -11,21 +11,14 @@
"""

import argparse
import json

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were these removed because they are imported indirectly through NewsHeadlineListener? Looks a lot cleaner without all the imports.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya.. They imported through NewsHeadlineListener. It's suggested to remove by PyCharm.

- cluster.name=docker-cluster
- cluster.name=elasticsearch
- node.name=stockdata
- cluster.initial_master_nodes=stockdata

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do?

Copy link
Author

@heyqule heyqule Oct 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It set the cluster name and node name. ElasticSearch wanted me to specify it.



if self.symbol in nltk_tokens_required:
nltk_tokens = nltk_tokens_required[self.symbol]
else:
nltk_tokens = nltk_tokens_required['default']

# check required tokens from config
tokenspass = False

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvement in readability.

@@ -122,6 +119,7 @@

# create instance of NewsHeadlineListener
newslistener = NewsHeadlineListener(symbol, url)
time.sleep(random.randrange(2,5))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the addition of a random sleep interval?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want it to have a consistent request pattern.

@heyqule
Copy link
Author

heyqule commented Oct 17, 2019

Partial review considering the amount of commits for the single pull request. Consider creating feature branches more frequently so Chris can approve them faster. Thank you for dockerizing.

@caseykey

Thanks for the review. However, it seems you are reviewing the older commits. There were a lot of changes and experiments on earlier commits since this is my first Python project.

This pull request was submitted after I have done a lot of changes and I asked Chris' whether he's interested on dockerizing it. I initially did this as a Python practice. Submitting the pull request wasn't planned till the later stage.

#stocksight is released under the Apache 2.0 license. See
#LICENSE for the full license text.
FROM python:3-alpine

Copy link

@aminya aminya Feb 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Required for regex

Suggested change
RUN apk add build-base

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants