Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Column filing_id not found despite its manifest existence #90

Open
MichaelTiemannOSC opened this issue Jul 1, 2022 · 1 comment
Open

Comments

@MichaelTiemannOSC
Copy link

I am trying to load 18 CSV files from CorpWatch into a SQLite database using csvs-to-sqlite. I will spare you the deprecation warnings for error_bad_lines.

michael$ time /Users/michael/Library/Python/3.8/bin/csvs-to-sqlite ~/Downloads/corpwatch_api_tables_csv ~/Temp/corpwatch.db -s $'\t'

  return pd.read_csv(
/Users/michael/Library/Python/3.8/lib/python/site-packages/csvs_to_sqlite/utils.py:38: DtypeWarning: Columns (8,10) have mixed types. Specify dtype option on import or set low_memory=False.

  return pd.read_csv(
/Users/michael/Library/Python/3.8/lib/python/site-packages/csvs_to_sqlite/utils.py:38: DtypeWarning: Columns (17) have mixed types. Specify dtype option on import or set low_memory=False.

  return pd.read_csv(
/Users/michael/Library/Python/3.8/lib/python/site-packages/csvs_to_sqlite/utils.py:38: DtypeWarning: Columns (5,6,7,8,9) have mixed types. Specify dtype option on import or set low_memory=False.

  return pd.read_csv(
/Users/michael/Library/Python/3.8/lib/python/site-packages/csvs_to_sqlite/utils.py:38: DtypeWarning: Columns (3,6,7) have mixed types. Specify dtype option on import or set low_memory=False.
  return pd.read_csv(
Loaded 18 dataframes
Traceback (most recent call last):
  File "/Users/michael/Library/Python/3.8/bin/csvs-to-sqlite", line 8, in <module>
    sys.exit(cli())
  File "/Users/michael/Library/Python/3.8/lib/python/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/michael/Library/Python/3.8/lib/python/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/michael/Library/Python/3.8/lib/python/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/michael/Library/Python/3.8/lib/python/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/michael/Library/Python/3.8/lib/python/site-packages/csvs_to_sqlite/cli.py", line 251, in cli
    df.to_sql(df.table_name, conn, if_exists="append", index=False)
  File "/Users/michael/Library/Python/3.8/lib/python/site-packages/pandas/core/generic.py", line 2951, in to_sql
    return sql.to_sql(
  File "/Users/michael/Library/Python/3.8/lib/python/site-packages/pandas/io/sql.py", line 697, in to_sql
    return pandas_sql.to_sql(
  File "/Users/michael/Library/Python/3.8/lib/python/site-packages/pandas/io/sql.py", line 2190, in to_sql
    return table.insert(chunksize, method)
  File "/Users/michael/Library/Python/3.8/lib/python/site-packages/pandas/io/sql.py", line 950, in insert
    num_inserted = exec_insert(conn, keys, chunk_iter)
  File "/Users/michael/Library/Python/3.8/lib/python/site-packages/pandas/io/sql.py", line 1896, in _execute_insert
    conn.executemany(self.insert_statement(num_rows=1), data_list)
sqlite3.OperationalError: table ./company_filings has no column named filing_id

real	1m30.052s
user	1m12.763s
sys	0m16.640s

So the complaint is about company_filings.csv not having a filing_id. Yet:

michael$ head ~/Downloads/corpwatch_api_tables_csv/company_filings.csv
filing_id cik year quarter period_of_report filing_date form_10k_url sec_21_url
47 1000180 2008 1 20071230 2008-02-25 http://www.sec.gov/Archives/edgar/data/1000180/0000950134-08-003259.txt http://www.sec.gov/Archives/edgar/data/1000180/000095013408003259/f38194exv21w1.htm
104 1000209 2008 1 20071231 2008-03-13 http://www.sec.gov/Archives/edgar/data/1000209/0001193125-08-055644.txt http://www.sec.gov/Archives/edgar/data/1000209/000119312508055644/dex211.htm
121 1000228 2008 1 20071229 2008-02-26 http://www.sec.gov/Archives/edgar/data/1000228/0000950123-08-002119.txt http://www.sec.gov/Archives/edgar/data/1000228/000095012308002119/y50229exv21w1.htm
152 1000229 2008 1 0 2008-02-22 http://www.sec.gov/Archives/edgar/data/1000229/0001000229-08-000005.txt NULL
174 1000230 2008 1 0 2008-01-29 http://www.sec.gov/Archives/edgar/data/1000230/0001193125-08-014289.txt NULL
186 1000232 2008 1 20071231 2008-03-31 http://www.sec.gov/Archives/edgar/data/1000232/0001000232-08-000005.txt http://www.sec.gov/Archives/edgar/data/1000232/000100023208000005/exhibit21subsidiaries.txt
213 1000234 2008 1 20070924 2008-03-17 http://www.sec.gov/Archives/edgar/data/1000234/0001000234-08-000003.txt http://www.sec.gov/Archives/edgar/data/1000234/000100023408000003/ex21.htm
214 1000234 2008 1 20071231 2008-03-17 http://www.sec.gov/Archives/edgar/data/1000234/0001000234-08-000004.txt http://www.sec.gov/Archives/edgar/data/1000234/000100023408000004/ex21.htm
462 1000623 2008 1 20071231 2008-03-07 http://www.sec.gov/Archives/edgar/data/1000623/0001047469-08-002365.txt http://www.sec.gov/Archives/edgar/data/1000623/000104746908002365/a2183428zex-21.htm


Why is this column not being found?
@MichaelTiemannOSC
Copy link
Author

Manually setting low_memory to False in utils.py solves the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant