Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add result_type support to csv geocoding #790

Open
tgrandje opened this issue Jul 20, 2023 · 2 comments
Open

Add result_type support to csv geocoding #790

tgrandje opened this issue Jul 20, 2023 · 2 comments

Comments

@tgrandje
Copy link

Hello

The mass geocoding API (CSV) of the BAN doesn't accept filtering by result_type (which the individual API does).

I'm currently looking to enhance cities fields in datasets (see this python project); as of today, the CSV geocoding API returns those results for instance :

Submitted result_label result_type
59 BOUCHAIN 59 Route de Bouchain 59490 Somain housenumber
59 LOOS 59 Rue de Loos 59000 Lille housenumber
62 ISBERGUES 62 Rue d'Isbergues 62120 Aire-sur-la-Lys housenumber

In the present state, I'll have to switch to multiple individual queries for each line, setting the parameter type=municipality. (To give some background, I'm trying to enhance malformed datasets which could easily end in thousands of lines).

Would it be possible to add this parameter inside the CSV (as is actually the case with official/postoffice codes) ?

@jdesboeufs
Copy link
Member

Hello,

type is an accepted filter depending on the deployment. You have to set the filter the same way as citycode and postcode.
On api-adresse.data.gouv.fr it should work.

@tgrandje
Copy link
Author

tgrandje commented Oct 11, 2023

@jdesboeufs thanks for the reply.

It's been some time since I tested this, so I'm not certain that the API's behaviour is still the same (in fact, I can't reproduce the sample above, so I'm gessing something changed). That will teach me a lesson about pasting a sample without the related code...

In any case, you can see that the type on csv geocoding is not working according to what you say (either that or I'm missing something). You can test the following (python) code and see that whatever the type, you get the same results.

import io
import pandas as pd
import requests

df = pd.DataFrame(
    [["59 BOUCHAIN"], ["59 LOOS"], ["62 ISBERGUES"]], columns=["Sample"]
)
target_type = "street"

r = requests.post(
    "https://api-adresse.data.gouv.fr/search/csv/",
    files=[
        ("data", df.to_csv(index=False)),
        ("type", (None, "street")),
    ],
)

ret = pd.read_csv(
    io.BytesIO(r.content),
    dtype={"dep": str, "result_citycode": str},
)
print(ret)

You'll get:

         Sample   latitude  longitude result_label  result_score  \
0   59 BOUCHAIN  50.280328   3.322325     Bouchain      0.669002   
1       59 LOOS  50.612320   3.023190         Loos      0.495183   
2  62 ISBERGUES  50.615551   2.451655    Isbergues      0.697388   

   result_score_next   result_type  result_id  result_housenumber result_name  \
0           0.531873  municipality      59092                 NaN    Bouchain   
1           0.331023  municipality      59360                 NaN        Loos   
2           0.643760  municipality      62473                 NaN   Isbergues   

   result_street  result_postcode result_city  \
0            NaN            59111    Bouchain   
1            NaN            59120        Loos   
2            NaN            62330   Isbergues   

                       result_context result_citycode  result_oldcitycode  \
0           59, Nord, Hauts-de-France           59092                 NaN   
1           59, Nord, Hauts-de-France           59360                 NaN   
2  62, Pas-de-Calais, Hauts-de-France           62473                 NaN   

   result_oldcity  result_district result_status  
0             NaN              NaN            ok  
1             NaN              NaN            ok  
2             NaN              NaN            ok  

Instead, if I use the individual requests (with geopandas that time, to process geometries), I do :

import geopandas as gpd
import numpy as np
ret2 = []
for x in df.values.flatten():
    r = requests.get(
        "https://api-adresse.data.gouv.fr/search/",
        params={
            "q": x,
            "type": target_type,
            "autocomplete": 0,
            "limit": 1,
        },
    ).json()
    features = r["features"]
    query = r["query"]
    for dict_ in features:
        dict_["properties"].update({"full": query})
        ret2.append(dict_)
ret2=  gpd.GeoDataFrame.from_features(np.array(ret2).flatten())
print(ret2)

That time, the results are indeed filtered by street, and I get:

                   geometry                                     label  \
0  POINT (2.86821 50.41884)         Rue de Bouchain 62430 Sallaumines   
1  POINT (1.67268 50.72004)        Le Lot 62280 Saint-Martin-Boulogne   
2  POINT (2.55562 50.48585)  Rue d'Isbergues 62700 Bruay-la-Buissière   

      score            id             name postcode citycode          x  \
0  0.507115    62771_0240  Rue de Bouchain    62430    62771  690620.04   
1  0.249381  62758_5xb6c0           Le Lot    62280    62758  606100.36   
2  0.570345    62178_1450  Rue d'Isbergues    62700    62178  668414.78   

            y                   city                             context  \
0  7035706.55            Sallaumines  62, Pas-de-Calais, Hauts-de-France   
1  7070046.78  Saint-Martin-Boulogne  62, Pas-de-Calais, Hauts-de-France   
2  7043251.99     Bruay-la-Buissière  62, Pas-de-Calais, Hauts-de-France   

     type  importance           street          full oldcitycode  \
0  street     0.57826  Rue de Bouchain   59 BOUCHAIN         NaN   
1  street     0.52097           Le Lot       59 LOOS         NaN   
2  street     0.64879  Rue d'Isbergues  62 ISBERGUES       62178   

           oldcity  
0              NaN  
1              NaN  
2  Bruay-en-Artois  

                       result_context result_citycode  result_oldcitycode  \
0           59, Nord, Hauts-de-France           59092                 NaN   
1           59, Nord, Hauts-de-France           59360                 NaN   
2  62, Pas-de-Calais, Hauts-de-France           62473                 NaN   

   result_oldcity  result_district result_status  
0             NaN              NaN            ok  
1             NaN              NaN            ok  
2             NaN              NaN            ok  

It might very well be that I made a mistake in the code: the type behaviour is not very well documented (and in fact, not even mentionned in the swagger at the bottom of the page).

Can you reproduce this behaviour and confirm this is a bug ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants