Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any documentation about using as REST service #299

Closed
mhf-ir opened this issue Jul 20, 2019 · 17 comments · Fixed by #647
Closed

Any documentation about using as REST service #299

mhf-ir opened this issue Jul 20, 2019 · 17 comments · Fixed by #647
Labels
enhancement Improvement on existing feature

Comments

@mhf-ir
Copy link

mhf-ir commented Jul 20, 2019

How could use as REST API service. Send raw text and doccano response the parsed data?

@icoxfog417 icoxfog417 added enhancement Improvement on existing feature good first issue Good for newcomers labels Jul 24, 2019
@icoxfog417
Copy link
Contributor

Documenting about API is now in progress. Please wait for a little!

https://github.com/chakki-works/doccano/projects/3#card-22665439

@luispsantos
Copy link

Any update for the next release with this? I would also like this feature for an annotation project I'm currently working on.

@rbagd
Copy link

rbagd commented Oct 2, 2019

@luispsantos, @Hironsan in #6 gave a sample class to illustrate the interaction with doccano API. From my experience, it's quite easy to use and understand.

I noticed that due to recent changes authentication works through the API token now, so you only need to adapt the authentication scheme in that code and pass the authentication headers each time with your request.

class Client(object):
    def __init__(self, entrypoint, username=None, password=None):
        self.entrypoint = entrypoint
        self.client = requests.Session()

        api_token = self.get_api_token(username, password)
        self.auth_headers = {"Authorization": "Token {}".format(api_token)}

    def get_api_token(self, username, password):
        url = f"{self.entrypoint}/v1/auth-token"
        login = {"username": username, "password": password}
        response = self.client.post(url, json=login)
        return response.json()["token"]

    def fetch_projects(self):
        url = f"{self.entrypoint}/v1/projects"
        response = self.client.get(url, headers=self.auth_headers)
        return response

@luispsantos
Copy link

@rbagd Thanks for the example class, it should be useful for others who stumble upon this issue on how to interact with Doccano at the API level (especially with the added token authentication on this example). I agree that the example given by @Hironsan is fairly intuitive, and in the meantime I managed to adapt it to my use case using token authentication 👍

@icoxfog417
Copy link
Contributor

The question proposed at #410 .

Question 1: /v1/projects/{project_id}/docs endpoint

I managed to create the project and upload documents using this endpoint
https://github.com/chakki-works/doccano/blob/master/app/api/urls.py#L27-L28
but the metadata is not uploaded.
From what I see here https://github.com/chakki-works/doccano/blob/master/app/api/views.py#L125 the meta field is not supported by the API even though it's in the Serializer, which would explain that the metadata is not uploaded. Is my understanding correct ?

Question 2: /v1/projects/{project_id}/docs/upload endpoint

I tried to upload documents using this endpoint
https://github.com/chakki-works/doccano/blob/master/app/api/urls.py#L37-L38

What is the type of the file argument here https://github.com/chakki-works/doccano/blob/master/app/api/views.py#L207 ?
I tried passing a JSONLine file as a string, and I get {"detail":"Unsupported media type \"application/json\" in request."}

@louisguitton
Copy link
Contributor

To follow-up on my question above, here is some reproducible code.

The client I use

class Client(object):
    """
    Client code was inspired by: https://github.com/chakki-works/doccano/issues/6#issuecomment-489924577
    Endpoints can be found here: https://github.com/chakki-works/doccano/blob/master/app/api/urls.py
    """
    def __init__(self, entrypoint, username=None, password=None):
        self.entrypoint = entrypoint
        self.client = requests.Session()
        self._login(username, password)

    def _login(self, username, password):
        url = f"{self.entrypoint}/v1/auth-token"
        login = {"username": username, "password": password}
        response = self.client.post(url, json=login)
        api_token = response.json()["token"]
        self.client.headers.update({"Authorization": f"Token {api_token}"})

    def add_document(self, project_id, data):
        url = f'{self.entrypoint}/v1/projects/{project_id}/docs'
        response = self.client.post(url, data=data)
        return response.json()

    def upload_data(self, project_id, file, file_format='csv'):
        """
        file is (?) of type file object (BytesIO or StringIO ...)
        """
        data = {
            'file': file,
            'format': file_format
        }
        url = f'{self.entrypoint}/v1/projects/{project_id}/docs/upload'
        response = self.client.post(url, data=data)
        return response

Question 1

>>> c = Client(
    'https://labelling-onefootball.herokuapp.com', 
    os.environ.get('DOCANNO_ADMIN'), 
    os.environ.get('DOCANNO_PWD')
)
>>> doc = {'text': 'Mercato / PSG\xa0: ce qu’a dit Tuchel à Neymar va vous surprendre',
 'meta': {'article_id': 27551814,
  'published_at_str': '2019-10-08 08:23:35',
  'provider_name': 'InfoMercato',
  'article_link': 'https://consumer-web.onefootball.com/cms/fr/27551814'}}
>>> c.add_document(project_id, doc)
# meta fields are not uploaded

Question 2

>>> jsonline_dataset = '{"text":"Mercato \\/ PSG\\u00a0: ce qu\\u2019a dit Tuchel \\u00e0 Neymar va vous surprendre","meta":{"article_id":27551814,"published_at_str":"2019-10-08 08:23:35","provider_name":"InfoMercato","article_link":"https:\\/\\/consumer-web.onefootball.com\\/cms\\/fr\\/27551814"}}\n{"text":"Bingo Challenge : Le num\\u00e9ro \\u00ab\\u00a0Tres\\u00a0\\u00bb","meta":{"article_id":27577681,"published_at_str":"2019-10-10 10:30:01","provider_name":"Furia Liga","article_link":"https:\\/\\/consumer-web.onefootball.com\\/cms\\/fr\\/27577681"}}'
>>> file_obj = io.StringIO(jsonline_dataset)
>>> r = c.upload_data(4, file_obj, 'json')
# I get <iframe src="//www.herokucdn.com/error-pages/application-error.html"></iframe>

@afparsons
Copy link

afparsons commented Nov 19, 2019

@louisguitton Were you ever able to find a solution? I'm using very similar code and unfortunately receiving the following response with /upload:

{'detail': 'Unsupported media type "application/x-www-form-urlencoded" in request.'}

Simple get requests for endpoints like /me work just fine.

@icoxfog417 You referenced documentation back in June; can I find this documentation somewhere?

@louisguitton
Copy link
Contributor

@louisguitton Were you ever able to find a solution?

Nope, waiting for the documentation too. But they were quite busy dealing with frontend refactoring for v1.0.0 so I don't mind waiting.

@afparsons
Copy link

afparsons commented Nov 19, 2019

@louisguitton

Small update: I think it might be of Content-Type multipart/form-data. I've tried with that and received:

{'detail': 'Empty content'}

So that is at least different. We'll see if that constitutes as progress or not.


Update 2: I got a <Response [201]>:

requests.post(
        'http://<URL>/v1/projects/1/docs/upload',
        files={'file': ('d1.json', open('d1.json', 'rb'))},
        data={'file': ('d1.json', open('d1.json', 'rb')), 'format': 'json'},
        headers = {
            'Authorization': 'Token {token}'.format(token=token)
        }
    )

Although the response is positive, the document does not appear to have been added to the project.


Update 3: Confirmed working! This also works with a preauthorized requests.Session(), as shown below.

# this is part of the Client class

    def upload(
        self,
        project_id: str,
        file_format: str,
        file_name: str,
        file_path: str = './',
    ) -> requests.models.Response:
        """        
        """
        url = '{}/v1/projects/{}/docs/upload'.format(self.entrypoint, project_id)
        files = {'file': (file_name, open(os.path.join(file_path, file_name), 'rb'))}
        data = {'file': (file_name, open(os.path.join(file_path, file_name), 'rb')), 'format': file_format}
        return self.client.post(url, files=files, data=data)

@Hironsan
Copy link
Member

Hironsan commented Nov 21, 2019

Plan

  • Use drf-yasg to generate API documentation.
  • Create doccano-client package to call APIs.

@afparsons
Copy link

afparsons commented Nov 22, 2019

@louisguitton @Hironsan

I made a simple API wrapper for temporary use while the team works on building an official client.
https://github.com/afparsons/doccano_api_client

@luispsantos - I forgot to tag you but you were also interested at some point I think.

@Hironsan Hironsan added this to To do in v1.1.0 Nov 22, 2019
@Hironsan Hironsan added this to To do in v1.0.3 Dec 2, 2019
@jonhilgart22
Copy link

I've noticed that the get_doc_download endpoint doesn't format the data using JSON formatting.

Any suggestions on implementing the JSONPainter class to return the same data structure from this endpoint?

@jbkoh
Copy link

jbkoh commented Nov 5, 2020

Hi @Hironsan, where is the API doc available? Just an OpenAPI spec would be appreciated too. Thanks!

@meltedhead
Copy link

I have deployed doccano to aws and now trying to use the client to upload data from an airflow pipeline but I can't seem to connect with the doccano-client. I am following the instructions but i keep getting an error:

from doccano_api_client import DoccanoClient

doccano_client = DoccanoClient(
    'https://doccano-dev.test.com/', 
    os.environ.get('DOCANNO_ADMIN'), 
    os.environ.get('DOCANNO_PWD')
)

I then get the error below

---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
<ipython-input-100-60f5cf7a0cc5> in <module>
      4     'https://doccano-dev.test.com/',
      5     os.environ.get('DOCANNO_ADMIN'),
----> 6     os.environ.get('DOCANNO_PWD')
      7 )

~/.pyenv/versions/3.6.10/lib/python3.6/site-packages/doccano_api_client/__init__.py in __init__(self, baseurl, username, password)
    107         self.baseurl = baseurl if baseurl[-1] == '/' else baseurl+'/'
    108         self.session = requests.Session()
--> 109         self._login(username, password)
    110 
    111     def _login(

~/.pyenv/versions/3.6.10/lib/python3.6/site-packages/doccano_api_client/__init__.py in _login(self, username, password)
    125         url = 'v1/auth-token'
    126         auth = {'username': username, 'password': password}
--> 127         response = self.post(url, auth)
    128         token = response['token']
    129         self.session.headers.update(

~/.pyenv/versions/3.6.10/lib/python3.6/site-packages/doccano_api_client/__init__.py in post(self, endpoint, data, json, files)
     62         request_url = urljoin(self.baseurl, endpoint)
     63         return self.session.post(
---> 64                 request_url, data=data, files=files, json=json).json()
     65 
     66     def delete(

~/.pyenv/versions/3.6.10/lib/python3.6/site-packages/requests/models.py in json(self, **kwargs)
    896                     # used.
    897                     pass
--> 898         return complexjson.loads(self.text, **kwargs)
    899 
    900     @property

~/.pyenv/versions/3.6.10/lib/python3.6/site-packages/simplejson/__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, use_decimal, **kw)
    523             parse_constant is None and object_pairs_hook is None
    524             and not use_decimal and not kw):
--> 525         return _default_decoder.decode(s)
    526     if cls is None:
    527         cls = JSONDecoder

~/.pyenv/versions/3.6.10/lib/python3.6/site-packages/simplejson/decoder.py in decode(self, s, _w, _PY3)
    368         if _PY3 and isinstance(s, bytes):
    369             s = str(s, self.encoding)
--> 370         obj, end = self.raw_decode(s)
    371         end = _w(s, end).end()
    372         if end != len(s):

~/.pyenv/versions/3.6.10/lib/python3.6/site-packages/simplejson/decoder.py in raw_decode(self, s, idx, _w, _PY3)
    398             elif ord0 == 0xef and s[idx:idx + 3] == '\xef\xbb\xbf':
    399                 idx += 3
--> 400         return self.scan_once(s, idx=_w(s, idx).end())

JSONDecodeError: Expecting value: line 2 column 1 (char 1)

@technologic27
Copy link

Anyone having issue with the method get_doc_download ? The data doesn't seem to be downloading.

@LighthouseInTheSea
Copy link

Anyone having issue with the method get_doc_download ? The data doesn't seem to be downloading.

Did you solve the problem?

@technologic27
Copy link

technologic27 commented Feb 24, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvement on existing feature
Projects
No open projects
v1.0.3
  
Done
v1.1.0
  
Done
Development

Successfully merging a pull request may close this issue.