Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop #982

Open
wants to merge 68 commits into
base: master
Choose a base branch
from
Open

Develop #982

wants to merge 68 commits into from

Conversation

lusi1990
Copy link

对 3.7,3.8, 3.9 进行适配
升级一些依赖库, 比如 flask
删除一些适配 python2.7 的代码
优化部分代码
调整了代码格式

Dockerfile Outdated
@@ -26,6 +26,8 @@ RUN pip install -r /opt/pyspider/requirements.txt

# add all repo
ADD ./ /opt/pyspider
# install seadaka requ
RUN pip install -r /opt/pyspider/requirements-seadaka.txt
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't your package in base docker image, you could create another Dockerfile that base on this image while install your additional dependency.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for that, I remove it.

Copy link
Owner

@binux binux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the patch. Overall looks good, just have some question regarding the behavior change in Elasticsearch. And how did you test it?

@@ -161,4 +171,4 @@ def dbcur(self):
db._update(db.__tablename__, "id = 1", age=16)
assert db._select(db.__tablename__, "name, age").next() == (None, 16)
db._delete(db.__tablename__, "id = 1")
assert [row for row in db._select(db.__tablename__)] == []
assert [row for row in db._select(db.__tablename__)]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior changed from assert the table to be empty to not empty. Why?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because empty list is False in python.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert 是 true 断言。原来的测试是说delete之后列表为空,改过之后变成列表非空了。

@@ -19,7 +20,7 @@ def __init__(self, hosts, index='pyspider'):
self.index = index
self.es = Elasticsearch(hosts=hosts)

self.es.indices.create(index=self.index, ignore=400)
self.es.indices.create(index=self.index)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did API change? I guess it's trying to ignore the IndexAlreadyExistsException error.
https://elasticsearch-py.readthedocs.io/en/7.x/api.html#ignore

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ElasticSearch 是根据 pylint 提示优化的, 这部分做的不好, 没有测试. 我想先停止这次合并请求. 我漏掉了很多测试.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我对 ES 的使用 不是很熟练. 哪怕是 ElasticSearch client 版本 7.10.0 和 7.17.0 之间也有很多区别. 我觉得直接升级到 最新的版本,并做好测试更好

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, upgrade to the latest version is good choice, just make sure it wouldn't raise error when the index has already created.

obj.update(kwargs)
obj['updatetime'] = time.time()
return self.es.update(index=self.index, doc_type=self.__type__,
body={'doc': obj}, id=name, refresh=True, ignore=404)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, refresh is to make sure the change to project are visible immediately. The updates on projectdb is expected to be low, that while used as database, it's exepcted for user to see the change immediately.

@@ -57,8 +59,7 @@ def get_all(self, fields=None):
yield record['_source']

def get(self, name, fields=None):
ret = self.es.get(index=self.index, doc_type=self.__type__, id=name,
_source_include=fields or [], ignore=404)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't remember, but doesn't the default behavior for ProjectDB is to return None when a project doesn't exists?


self.es.indices.create(index=self.index, ignore=400)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this cause IndexAlreadyExistsException error when the resultdb are already exists?

@@ -36,7 +36,7 @@ def projects(self):
ret = self.es.search(index=self.index, doc_type=self.__type__,
body={"aggs": {"projects": {
"terms": {"field": "project"}
}}}, _source=False)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems the API changed to source, that this API is meant to return just project names, that the source doesn't matter.
https://elasticsearch-py.readthedocs.io/en/latest/api.html#elasticsearch.Elasticsearch.search

@@ -62,7 +62,7 @@ def select(self, project, fields=None, offset=0, limit=0):
else:
for record in self.es.search(index=self.index, doc_type=self.__type__,
body={'query': {'term': {'project': project}}},
_source_include=fields or [], from_=offset, size=limit,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

@lusi1990 lusi1990 May 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore this. My bad.
It's still their in 7.10.0

setup.py Outdated
@@ -67,6 +67,8 @@
'Programming Language :: Python :: 3.5',
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't you drop support for 3.6 and lower?

放弃 3.6 以下版本

Copy link
Author

@lusi1990 lusi1990 May 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

因为 python2 已经停止维护, 3.4, .3.5 比较低的版本如果也维护会消耗大量的精力,而且这些版本使用的人应该比较少了.
很多依赖库也会慢慢放弃低版本的支持, 所以我想放弃低版本的维护.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我的意思是去掉setup设置的 3.5 支持。

@xyb
Copy link
Contributor

xyb commented Apr 16, 2023

@lusi1990 great work! lets roll this out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants