Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection doesn't get rollbacked using PostgresqlStorage Pipeline #218

Open
4 of 7 tasks
flatplate opened this issue Sep 5, 2021 · 1 comment
Open
4 of 7 tasks

Comments

@flatplate
Copy link

flatplate commented Sep 5, 2021

Mandatory

  • I read the documentation (readme and wiki).
  • I searched other issues (including closed issues) and could not find any to be related. If you find related issues post them below or directly add your issue to the most related one.
  • I confirm that this bug report does not report on a specific news site where news-please does not work. Please keep in mind that news-please is a generic crawler so it is expected that it will not work for all sites well or even at all.

Related issues:

Describe the bug

After there has been an error when inserting into or reading from the postgres database, pyscopg2 requires the user to call connection.rollback(). Otherwise the following calls to a cursor results in the error current transaction is aborted, commands ignored until end of transaction block. In my concrete example the insertion error is caused by the missing maintext in the scraped article, which is defined as NOT NULL in the database schema.

[newsplease.pipeline.pipelines:463|ERROR] Something went wrong in commit: null value in column "maintext" of relation "currentversions" violates not-null constraint

This also later causes cursor.fetchone() to throw a ProgrammingError.

[newsplease.pipeline.pipelines:420|ERROR] Something went wrong in query: current transaction is aborted, commands ignored until end of transaction block

[scrapy.core.scraper:249|ERROR] Error processing {'abs_local_path': 'C:\\Users\\ural_\\Projects\\news_comparison\\newsplease\\news-please-repo\\data\\2021\\09\\05\\theblaze.com\\news_poll-wyoming-gop-voters-liz-cheney_1630856850.html',
 'article_author': ['Chris Pandolfo'],
 'article_description': 'As House Republicans seem increasingly likely to '
                        'force conference chairwoman Rep. Liz Cheney (R-Wyo.) '
                        'out of leadership, a new poll of Wyoming Republicans '
                        'indicates primary voters are ready to toss Cheney out '
                        'of Congress. A WPA Intelligence poll commissioned by '
                        'the Club for Growth PAC, a grassroots ...',
 'article_image': 'https://assets.rebelmouse.io/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpbWFnZSI6Imh0dHBzOi8vYXNzZXRzLnJibC5tcy8yNjE0MzQ4MC9vcmlnaW4uanBnIiwiZXhwaXJlc19hdCI6MTY2MDUwODkwNX0.9BuqT6dJD_6zJKNRjHc2JBWdrkyHAcOhKi7attLwAHY/img.jpg?width=1200&coordinates=0%2C82%2C0%2C89&height=600',
 'article_language': 'en',
 'article_publish_date': '2021-05-06 17:32:51',
 'article_text': 'As House Republicans seem increasingly likely to force '
                 'conference chairwoman Rep. Liz Cheney (R-Wyo.) out of '
                 'leadership, a new poll of Wyoming Republicans indicates '
                 'primary voters are ready to toss Cheney out of Congress.\n'
                 'A WPA Intelligence poll commissioned by the Club for Growth '
                 'PAC, a grassroots organization that supports candidates who '
                 'believe in limited government and economic freedom, found '
                 'that 52% of Republican voters in Wyoming will vote for '
                 'anyone but Cheney in the 2022 primary.\n'
                 "The beleaguered congresswoman's favorability is 36 points "
                 'under water, with just 29% of GOP voters having a favorable '
                 'view of Cheney and a whopping 65% viewing her unfavorably. '
                 'Only 14% of voters say they will vote to re-elect Cheney '
                 'regardless of who runs against her. Another 31% say they '
                 'will consider another candidate before making up their '
                 'mind.\n'
                 'These numbers paint a clear picture: Unless something '
                 'drastic and unforeseen happens, Liz Cheney will not be '
                 're-elected in 2022.\n'
                 "It's an astonishing fall for the daughter of former Vice "
                 'President Dick Cheney, who himself served for ten years in '
                 'the House of Representatives and once held the very '
                 'leadership position she seems likely to lose.\n'
                 'First elected to Congress in 2016, Liz Cheney was well '
                 'respected by the Republican establishment and seen as a '
                 'rising star in the party. Rush Limbaugh once called her '
                 '"Republican Party royalty" and praised her as a solid '
                 'conservative. After winning re-election in 2018, she was '
                 'elected to a leadership position as conference chair, the '
                 'No. 3 position for Republicans in the House and a role that '
                 "is largely responsible for the conference's messaging to "
                 'voters.\n'
                 'Cheney once may have had a bright future in the GOP. But she '
                 'is out of step with Republican voters on the key issue of '
                 'former President Donald Trump.\n'
                 'As she herself explained in a recent op-ed for the '
                 'Washington Post, Cheney holds Trump responsible for '
                 'provoking the violence at the U.S. Capitol on Jan. 6. She '
                 'thinks Trump is a liar who is undermining "confidence in the '
                 'result of elections and the rule of law" by continuing to '
                 'assert that the 2020 presidential election was fraudulent '
                 "and that Biden's win was illegitimate.\n"
                 'Her convictions led her to vote to impeach the former '
                 'president, which led Wyoming Republicans to officially '
                 'censure her and call for her resignation in response. '
                 "Nevertheless, Cheney has continued to be one of Trump's most "
                 'outspoken Republican critics.\n'
                 'But opposition to Trump has consequences in the modern '
                 'Republican Party. House Republicans recognize this, and it '
                 'is for this reason that Republican Minority Leader Kevin '
                 'McCarthy said this week he has "lost confidence" in her '
                 'ability to carry the GOP message in leadership.\n'
                 'The congresswoman most likely to succeed Cheney as '
                 'conference chair is Rep. Elise Stefanik (R-N.Y.) who was '
                 'endorsed for the position by Trump and is well liked in the '
                 'GOP conference. Stefanik is vocally supportive of Trump and '
                 'was one of several House Republicans to vote against '
                 'certifying the Electoral College results for several states '
                 'that President Joe Biden won.\n'
                 'Interestingly, the Club for Growth — which commissioned the '
                 "poll on Cheney's favorability — opposes Stefanik for "
                 "Republican leadership even though she has Trump's support.\n"
                 "According to the club's scorecard of members of Congress, "
                 'Stefanik is one of the most liberal Republicans in the GOP '
                 'conference. Though she supports Trump rhetorically, her '
                 'record in Congress was to vote against major pieces of the '
                 "president's agenda.\n"
                 'Stefanik voted for amnesty with citizenship for illegal '
                 'immigrants; voted against the 2017 Trump tax cuts; voted to '
                 "terminate Trump's emergency declaration at the border; and "
                 'joined 11 other Republicans to override funding for the '
                 'border wall. She supported the first version of the '
                 '"Equality Act" before voting against it after Biden became '
                 'president. Stefanik also voted with Democrats to force Trump '
                 'to stay in the Paris climate accord.',
 'article_title': 'Poll: More than half of Wyoming GOP primary voters will '
                  'vote for anyone but Liz Cheney',
 'download_date': '2021-09-05 15:47:30',
 'filename': 'news_poll-wyoming-gop-voters-liz-cheney_1630856850.html',
 'html_title': b'Poll: More than half of Wyoming GOP primary voters will vote'
               b' for anyone but Liz Cheney - TheBlaze',
 'local_path': 'C:\\Users\\ural_\\Projects\\news_comparison\\newsplease/news-please-repo//data/2021/09/05/theblaze.com/news_poll-wyoming-gop-voters-liz-cheney_1630856850.html',
 'modified_date': '2021-09-05 15:47:30',
 'rss_title': 'NULL',
 'source_domain': b'theblaze.com',
 'spider_response': <200 https://www.theblaze.com/news/poll-wyoming-gop-voters-liz-cheney>,
 'url': 'https://www.theblaze.com/news/poll-wyoming-gop-voters-liz-cheney'}
Traceback (most recent call last):
  File "C:\Python38\lib\site-packages\twisted\internet\defer.py", line 858, in _runCallbacks
    current.result = callback(  # type: ignore[misc]
  File "C:\Python38\lib\site-packages\scrapy\utils\defer.py", line 150, in f
    return deferred_from_coro(coro_f(*coro_args, **coro_kwargs))
  File "C:\Users\ural_\AppData\Roaming\Python\Python38\site-packages\newsplease\pipeline\pipelines.py", line 424, in process_item
    old_version = self.cursor.fetchone()
psycopg2.ProgrammingError: no results to fetch

To Reproduce

  • Use PostgresqlStorage as part of the pipeline.
  • Scrape an article with an empty maintext
  • Following tries to query the database will fail within the same crawler

Expected behavior
Transaction should be aborted, and following calls to the database shouldn't be affected by the error.

Log
errorlog.txt

Versions (please complete the following information):

  • OS: Windows 10
  • Python Version 3.8.3
  • news-please Version 1.5.21

Intent (optional; we'll use this info to prioritize upcoming tasks to work on)

  • personal
  • academic
  • business
  • other
  • Some information on your project:
@flatplate
Copy link
Author

I have found that adding rollback calls to except blocks inside PostgresqlStorage.process_item solves the problem. Example:

try:
    self.cursor.execute(self.insert_current, current_version_list)
    self.conn.commit()
    self.log.info("Article inserted into the database.")
except psycopg2.DatabaseError as error:
    self.log.error("Something went wrong in commit: %s", error)
    self.conn.rollback()  # New
    return  # Shouldn't continue since this call failed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant