Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

article.date_modify returns 'None' despite the article having a modified date #178

Open
3 of 6 tasks
Anacoder1 opened this issue Sep 26, 2020 · 3 comments
Open
3 of 6 tasks

Comments

@Anacoder1
Copy link

Mandatory

  • I read the documentation (readme and wiki).
  • I searched other issues (including closed issues) and could not find any to be related. If you find related issues post them below or directly add your issue to the most related one.

Related issues:

  • add them here

Describe the bug
I have been trying to use the article.date_modify function to extract the modified date and time from different newspaper websites.
The function returns None despite the site having a modified date. This is the case for every article URL I tried this function with.

To Reproduce

!pip3 install news-please         #ran this on Google Colab
from newsplease import NewsPlease

url1 = 'https://www.thequint.com/news/law/supreme-court-article-370-jammu-and-kashmir-reorganisation-cases-hearing-govt-affidavit-rejoinder'
article = NewsPlease.from_url(url1)
print(article.date_modify)

# prints None

Expected behavior
I expected the code to return the date-time instance when the article was modified, in this case 2019-11-14 19:40:00

Log
Nothing to add here. I just tried the code as shown in the To Reproduce section.

Versions (please complete the following information):

  • Google Colab
  • Python Version 3.6.9
  • news-please Version 1.5.3

Intent (optional; we'll use this info to prioritize upcoming tasks to work on)

  • personal

  • academic

  • business

  • other

  • Some information on your project: Extracting modified date from newspaper articles

@fhamborg
Copy link
Owner

Can you confirm date extraction works for you on the following URL? https://www.rt.com/news/203203-ukraine-russia-troops-border (also refer to https://github.com/fhamborg/news-please/blob/master/newsplease/examples/sample.json)

@IqbalLx
Copy link

IqbalLx commented Dec 24, 2020

Can you confirm date extraction works for you on the following URL? https://www.rt.com/news/203203-ukraine-russia-troops-border (also refer to https://github.com/fhamborg/news-please/blob/master/newsplease/examples/sample.json)

but the sample.json also not containing date_modified ??

@IqbalLx
Copy link

IqbalLx commented Dec 29, 2020

Hi! I confuse when exploring the main/core code, so my solution to this problem is creating a new pipeline dedicated to altering the default date_modify. I use same concept as DateExtractor but now I am looking for dateModified in application/ld+json tag

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants