Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can someone answer why the number and x columns of '201105. shp' in the output of this code also become 0? #261

Open
1jiangxd opened this issue Jan 12, 2024 · 1 comment

Comments

@1jiangxd
Copy link

1jiangxd commented Jan 12, 2024

Can someone answer why the number and x columns of '201105. shp' in the output of this code also become 0?

(Two shp files have been uploaded to my GitHub repository)
https://github.com/1jiangxd/daskgeopandasproblems

The code I used is as follows, but when checking proceed '201105. shp', only the first 2 million lines were processed, and the remaining other original content changed into 0
May I ask where the problem lies with this code? If anyone can answer, I would greatly appreciate your help

import geopandas as gpd
import time

import dask_geopandas

def process_row(row):
    outwen = r'201105.shp'
    bianjie = r'2023xian.shp'
    jiabianjie = r'E:\201105out'
    
    start_time3 = time.time()
    
    # Read input and clipped boundary shapefiles
    target_gdf = gpd.read_file(outwen)
    join_gdf = gpd.read_file(bianjie)
    
    # Switch to dask approach
    target_gdfnew = dask_geopandas.from_geopandas(target_gdf, npartitions=4)
       
    # Reproject the boundary participating in the join to match the CRS of the target geometry
    join_gdf = join_gdf.to_crs(target_gdf.crs)
    
    # Switch to dask approach
    join_gdfnew = dask_geopandas.from_geopandas(join_gdf, npartitions=4)
    
    # Use spatial join to find intersecting parts
    joined = gpd.sjoin(target_gdfnew, join_gdfnew, how='inner', predicate='intersects')
    
    # Add attributes from 'bianjie' to 'outwen'
    joined = joined.drop(columns='index_right')  # Remove redundant index column
    result = target_gdfnew.merge(joined, how='left', on=target_gdfnew.columns.to_list())
    
    # Save the result to the output boundary
    result.to_file(jiabianjie, encoding='utf-8-sig')  # Ensure the correct encoding is used
    
    end_time3 = time.time()
    execution_time3 = end_time3 - start_time3
    
    print(f"'{jiabianjie}' has added boundaries. Start time: {start_time3:.2f}, End time: {end_time3:.2f}, Execution time: {execution_time3:.2f} seconds")

process_row()

print('Finish')

@1jiangxd 1jiangxd changed the title Can someone answer why the number and x columns of '1. shp' in the output of this code also become 0? Can someone answer why the number and x columns of '201105. shp' in the output of this code also become 0? Jan 13, 2024
@jorisvandenbossche
Copy link
Member

@1jiangxd apologies for the slow reply, but looking at your code, the following lines

    # Add attributes from 'bianjie' to 'outwen'
    joined = joined.drop(columns='index_right')  # Remove redundant index column
    result = target_gdfnew.merge(joined, how='left', on=target_gdfnew.columns.to_list())

are typically not needed. The result of the spatial join, joined, already has the columns of the original target_gdf, so this additional merge is not doing anything, except for getting back the original rows of target_gdf that didn't have a match in the spatial join. To achieve the same, you do a left join (specifying how='left' in the sjoin` call).

Also, I assume that the gpd.sjoin in your code above should be dask_geopandas.sjoin ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants