Skip to content

FigureHook/figure_parser

Repository files navigation

Pypi Pypi CI Coverage

Figure Parser

A web parser focus on parsing product information on Japan ACG figure sites.

Supporting site list

Install

pip install figure_parser

Usage

from pprint import pprint

import requests as rq
from bs4 import BeautifulSoup

from figure_parser.exceptions import FigureParserException
from figure_parser.factories import GeneralBs4ProductFactory


factory = GeneralBs4ProductFactory.create_factory()

url = "https://www.goodsmile.info/ja/product/11246/PA+15+%E9%AB%98%E6%A0%A1%E8%83%B8%E3%82%AD%E3%83%A5%E3%83%B3%E7%89%A9%E8%AA%9E.html"
resp = rq.get(url)
try:
    product = factory.create_product(resp.url, BeautifulSoup(resp.content, 'lxml'))
    pprint(product.dict())
except FigureParserException as e:
    print(e)
{'adult': False,
 'category': 'フィギュア',
 'copyright': '© SUNBORN Network Technology Co., Ltd. © SUNBORN Japan Co., '
              'Ltd.',
 'distributer': 'グッドスマイルカンパニー',
 'images': ['https://images.goodsmile.info/cgm/images/product/20210521/11246/85011/large/346a0402da0a835b6969105e77c7bf7f.jpg',
            'https://images.goodsmile.info/cgm/images/product/20210521/11246/85012/large/e1fb5ad64d58498477611082c7219759.jpg',
            'https://images.goodsmile.info/cgm/images/product/20210521/11246/85013/large/cad59d379e0ac60b8d386eee93253502.jpg',
            'https://images.goodsmile.info/cgm/images/product/20210521/11246/85014/large/4e4957b4783cc9b8cc6e6101aaf346b3.jpg',
            'https://images.goodsmile.info/cgm/images/product/20210521/11246/85015/large/9bf879603be71259f2d673a84d1b3b2a.jpg',
            'https://images.goodsmile.info/cgm/images/product/20210521/11246/85016/large/f464915a47d744441a0574e97016e8d0.jpg',
            'https://images.goodsmile.info/cgm/images/product/20210521/11246/85017/large/f8ae4c2ebfb05d3b3c2c9a427d9dd9af.jpg',
            'https://images.goodsmile.info/cgm/images/product/20210521/11246/85018/large/d03ebd90e1fd832e5909deba3c78432c.jpg',
            'https://images.goodsmile.info/cgm/images/product/20210521/11246/85019/large/1a4421435c14c53857d5125c0f3da4aa.jpg',
            'https://images.goodsmile.info/cgm/images/product/20210521/11246/85020/large/eede36da01b9ab86ba35a3e5f30a8394.jpg',
            'https://images.goodsmile.info/cgm/images/product/20210521/11246/85021/large/8afccdd56243497830857ec612374266.jpg'],
 'jan': None,
 'manufacturer': 'Phat!',
 'name': 'PA-15 高校胸キュン物語',
 'og_image': 'http://images.goodsmile.info/cgm/images/product/20210521/11246/85023/medium/b1a1a49e9bb72ebd95670ca757e22735.jpg',
 'order_period': {'end': datetime.datetime(2021, 7, 7, 21, 0),
                  'start': datetime.datetime(2021, 5, 27, 12, 0)},
 'paintworks': ['緋色 (scarlet)'],
 'releaser': 'ファット・カンパニー',
 'releases': [{'announced_at': None,
               'price': 19800,
               'release_date': datetime.date(2022, 12, 1),
               'tax_including': True}],
 'rerelease': False,
 'scale': 7,
 'sculptors': ['Phat!'],
 'series': 'ドールズフロントライン',
 'size': 280,
 'thumbnail': 'http://images.goodsmile.info/cgm/images/product/20210521/11246/85023/medium/b1a1a49e9bb72ebd95670ca757e22735.jpg',
 'url': 'https://www.goodsmile.info/ja/product/11246/PA+15+%E9%AB%98%E6%A0%A1%E8%83%B8%E3%82%AD%E3%83%A5%E3%83%B3%E7%89%A9%E8%AA%9E.html'}

Development

This project is using poetry as package manager.

Install dependencies

poetry install

Use virtualenv

poetry shell

Install pre-commit

pre-commit install

Generate new parser (the name should be in snake case)

python cli.py generate new_site

After generating the new site, the test data can be found here.

Run the test and coverage

tox
coverage combine
coverage report -m

Type check

mypy

Lint the code

isort -e .
black .

If you add or update dependencies

poetry export --without-hashes --dev -f requirements.txt --output requirements.txt

If you use Makefile, it provides several useful command.

clean-test-cache     Clean cache of test.
cov-report           Show the coverage of tests.
format               Format the code.
freeze               Export the requirements.txt file.
help                 Show this help message.
install              Install requirements of project.
lint                 Lint the code.
test                 Run the tests.
type-check           Type check with mypy.