Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bill 1900145 Parsing Error Due to Crawling Error #34

Open
hunkim opened this issue Dec 1, 2015 · 0 comments
Open

Bill 1900145 Parsing Error Due to Crawling Error #34

hunkim opened this issue Dec 1, 2015 · 0 comments
Labels

Comments

@hunkim
Copy link
Contributor

hunkim commented Dec 1, 2015

html2json에 을 돌리다 에러가 나서 뭔일인가 보니

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/gevent/greenlet.py", line 327, in run
    result = self._run(*self.args, **self.kwargs)
  File "/home/ubuntu/crawlers/bills/specific/html2json.py", line 242, in parse_page
    d = extract_specifics(assembly_id, bill_id, meta)
  File "/home/ubuntu/crawlers/bills/specific/html2json.py", line 166, in extract_specifics
    table       = utils.get_elems(page, X['spec_table'])[1]
IndexError: list index out of range
<Greenlet at 0x7f27e79417d0: parse_page(19, '1900145',        bill_id  status                            , u'./json/19')> failed with IndexError

sources/specifics/19/1900145.html 파일을 받을때 오류가 발생한것 같습니다.

^M
^M
^M
<SCRIPT LANGUAGE="javascript">^M
<!--^M
        function onLoad() {^M
                alert(document.all["MSG"].innerText);^M
        }^M
-->^M
</SCRIPT>^M
^M
^M
^M
<HTML>^M
<BODY ONLOAD="javascript:onLoad()">^M
        <TEXTAREA ID="MSG" STYLE="display:none">[SQLException] Code[24757] Msg[ORA-24757: Æ®·£Àè¼Ç ½Äº°ÀÚ°¡ Áߺ¹µÇ¾ú½À´Ï´Ù
ORA-02063: line°¡ ¼±ÇàµÊ (NALAW_LINK·Î ºÎÅÍ)
][µ¥ÀÌÅͺ£À̽º ¿À·ù]</TEXTAREA> ^M
</BODY>^M
</HTML>

이런 경우 어떻게 하면 될까요? SQL Exception이 나왔는데 이런경우 crawler에서 다시 받아 오기 기능이 필요할듯 합니다.


Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

@e9t e9t added the bug label Dec 3, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants