Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue reading in Zip-file #16

Open
BeishuizenTimKPMG opened this issue Mar 4, 2019 · 5 comments
Open

Issue reading in Zip-file #16

BeishuizenTimKPMG opened this issue Mar 4, 2019 · 5 comments

Comments

@BeishuizenTimKPMG
Copy link

Not all zip-files can be used to create the sqlite file.

The Dutch public transport net cannot be processed by zip, only after unzipping. (data from www.openOV.nl)

Not a major issue, but this bug creates a minor annoyance for the users. Maybe a good case study for bug testing?

@rmkujala
Copy link
Member

rmkujala commented Mar 5, 2019

Hi @BeishuizenTimKPMG,

Can you provide a sample of your code with the error message so that we can understand what the problem is about in more detail.

@BeishuizenTimKPMG
Copy link
Author

The code is directly taken from the example in " gtfspy/examples/example_temporal_distance_profile.py". The error can be found in the following method:

import_gtfs.import_gtfs([imported_data_path],processed_data_path)

In this line, using a direct link to the previously mentioned zip file at www.openOV.nl it can not be loaded.

The following is printed:

Beginning AgencyLoader
Importing agency.txt into agencies for
Indexing agencies
Post-import agency.txt into agencies
Beginning RouteLoader
Importing routes.txt into routes for
Indexing routes
Beginning MetadataLoader
Indexing metadata
Beginning CalendarLoader
calendar.txt missing in {'zipfile': '../data/raw/gtfs-nl.zip', 'zip_commonprefix': ''}
Indexing calendar
Beginning CalendarDatesLoader
Importing calendar_dates.txt into calendar_dates for
Beginning ShapeLoader
Importing shapes.txt into shapes for
Indexing shapes
Post-import shapes.txt into shapes
Beginning FeedInfoLoader
Importing feed_info.txt into feed_info for
Beginning StopLoader
Importing stops.txt into stops for
Indexing stops
Post-import stops.txt into stops
Beginning TransfersLoader
Not importing transfers.txt into transfers for
Beginning StopDistancesLoader
Post-import None into stop_distances
Calculating straight-line transfer distances
Copying information from transfers to stop_distances.
Beginning TripLoader
Importing trips.txt into trips for
Indexing trips
Beginning StopTimesLoader
Importing stop_times.txt into stop_times for

And the following error occurs:


AttributeError Traceback (most recent call last)
in
1 # Not needed to rerun, is for accessing data
----> 2 import_gtfs.import_gtfs([imported_data_path],processed_data_path)

~/Projects/GemeenteAmsterdam/gtfspy-master/gtfspy/import_gtfs.py in import_gtfs(gtfs_sources, output, preserve_connection, print_progress, location_name, **kwargs)
102
103 for loader in loaders:
--> 104 loader.import_(conn)
105
106 # Do any operations that require all tables present.

~/Projects/GemeenteAmsterdam/gtfspy-master/gtfspy/import_loaders/table_loader.py in import_(self, conn)
355 # This does insertions
356 if self.mode in ('all', 'import') and self.fname and self.exists() and self.table not in ignore_tables:
--> 357 self.insert_data(conn)
358 # This makes indexes in the DB.
359 if self.mode in ('all', 'index') and hasattr(self, 'index'):

~/Projects/GemeenteAmsterdam/gtfspy-master/gtfspy/import_loaders/table_loader.py in insert_data(self, conn)
295 from itertools import chain
296 rows = chain([row], self.gen_rows([csv_reader], [prefix]))
--> 297 cur.executemany(stmt, rows)
298 conn.commit()
299

~/Projects/GemeenteAmsterdam/gtfspy-master/gtfspy/import_loaders/stop_times_loader.py in gen_rows(self, readers, prefixes)
23 def gen_rows(self, readers, prefixes):
24 for reader, prefix in zip(readers, prefixes):
---> 25 for row in reader:
26 #print row
27 assert row['arrival_time'] != "", "Some stop_times entries is missing arrival time information."

~/Projects/GemeenteAmsterdam/gtfspy-master/gtfspy/import_loaders/table_loader.py in (.0)
217 csv_reader_stripped = (dict((k, (v.strip() if v is not None else None)) # v is not always a string
218 for k, v in row.items())
--> 219 for row in csv_reader)
220 csv_reader_generators.append(csv_reader_stripped)
221 except TypeError as e:

~/Projects/GemeenteAmsterdam/gtfspy-master/gtfspy/import_loaders/table_loader.py in (.0)
216 # The following results in a generator, the complicated
217 csv_reader_stripped = (dict((k, (v.strip() if v is not None else None)) # v is not always a string
--> 218 for k, v in row.items())
219 for row in csv_reader)
220 csv_reader_generators.append(csv_reader_stripped)

AttributeError: 'list' object has no attribute 'strip'

As you can see the error is an object mismatch. As I said before, unpacking the zip works, but using the zip directly does not.

@evelyn9191
Copy link
Contributor

I have the same problem when reading files for Prague transport:

/home/miska/PycharmProjects/prague_public_transport_app/venv/bin/python /home/miska/PycharmProjects/prague_public_transport_app/search_stops/import_gtfs_data.py
Beginning AgencyLoader
Importing agency.txt into agencies for 
Indexing agencies
Post-import agency.txt into agencies
Beginning RouteLoader
Importing routes.txt into routes for 
Indexing routes
Beginning MetadataLoader
Indexing metadata
Beginning CalendarLoader
Importing calendar.txt into calendar for 
Indexing calendar
Beginning CalendarDatesLoader
Importing calendar_dates.txt into calendar_dates for 
Beginning ShapeLoader
Importing shapes.txt into shapes for 
Indexing shapes
Post-import shapes.txt into shapes
Beginning FeedInfoLoader
Importing feed_info.txt into feed_info for 
Beginning StopLoader
Importing stops.txt into stops for 
Traceback (most recent call last):
  File "/home/miska/PycharmProjects/prague_public_transport_app/search_stops/import_gtfs_data.py", line 52, in <module>
    load_or_import_example_gtfs(verbose=True)
  File "/home/miska/PycharmProjects/prague_public_transport_app/search_stops/import_gtfs_data.py", line 20, in load_or_import_example_gtfs
    location_name="Prague")
  File "/home/miska/PycharmProjects/prague_public_transport_app/venv/lib/python3.6/site-packages/gtfspy/import_gtfs.py", line 104, in import_gtfs
    loader.import_(conn)
  File "/home/miska/PycharmProjects/prague_public_transport_app/venv/lib/python3.6/site-packages/gtfspy/import_loaders/table_loader.py", line 357, in import_
    self.insert_data(conn)
  File "/home/miska/PycharmProjects/prague_public_transport_app/venv/lib/python3.6/site-packages/gtfspy/import_loaders/table_loader.py", line 297, in insert_data
    cur.executemany(stmt, rows)
  File "/home/miska/PycharmProjects/prague_public_transport_app/venv/lib/python3.6/site-packages/gtfspy/import_loaders/stop_loader.py", line 14, in gen_rows
    for row in reader:
  File "/home/miska/PycharmProjects/prague_public_transport_app/venv/lib/python3.6/site-packages/gtfspy/import_loaders/table_loader.py", line 219, in <genexpr>
    for row in csv_reader)
  File "/home/miska/PycharmProjects/prague_public_transport_app/venv/lib/python3.6/site-packages/gtfspy/import_loaders/table_loader.py", line 218, in <genexpr>
    for k, v in row.items())
AttributeError: 'list' object has no attribute 'strip'

Is there some workaround to this issue?

@rkdarst
Copy link
Member

rkdarst commented Mar 2, 2020

Looks like one of the rows's csv is being turned into a list, instead of string. I guess it's done something clever.

Can you look into the stops.txt and see if anything is weird there? and/or, use a debugger to try to figure out the bad line and value? That would help to understand what is going on...

Otherwise, can you gave an link to the exact file you are using and exact command line used?

@evelyn9191
Copy link
Contributor

I cannot replicate this issue as it was solved by #24. At that time, I considered the issue to be connected with special characters that caused that a string was considered split to many due to the special chars (i.e. creating something like ["/ax instead of š).

I am using GTFS zip file that can be downloaded here and ran the script by
import_gtfs.import_gtfs(["..\\data\\traffic_source.zip"], "some.db", print_progress=verbose, location_name="Prague")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants