Performance improvement #20

vtoupet · 2021-07-22T15:56:02Z

I am using your library with Pandas. Performance is not that good (it takes 1-2 seconds to process a full year).
The reasons for this are:

operations are performed sequentially while it could be partially vectorised.
everyhting is decoded even though you don't need everything

The way I see things:

use pandas.read_fwf for the mandatory sections
use apply method for the remaining part of the string (additional fields + remarks).

Usually, you know what information you are trying to get (and probably not every field that is present).
The idea would be to provide a list of desired fields. Based on that list, we could perform only the necessary decoding and return a Pandas Dataframe (or a list of records)

That would increase speed a lot.

Are you interested in such evolution for your library ?

Thanks,
Vincent

haydenth · 2021-07-22T17:24:49Z

i am 100% interested in this :)

haydenth · 2021-07-22T17:35:56Z

OK Sat and thought about this for a few mins.

vectorized == parallized or threaded?
love the idea of requesting only specific fields; that would speed it up dramatically

vtoupet · 2021-07-23T08:49:56Z

vectorized means not scalar. Instead of applying a function to a scalar and iterate over a list of scalar, we apply the same function to a vector (of dimension 1 x n). This is the main principle of Numpy and Pandas. This is much quicker.

I'll try to initiate something by september.

haydenth · 2021-07-23T14:01:41Z

Oh I see what you are saying.. I don't think it would be crazy hard to make a layer above ish_report that vectorizes the individual ish_report objects so they can be used in a library like that.

amotl mentioned this issue Feb 27, 2023

Feat: Add integrated surface database earthobservations/wetterdienst#871

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvement #20

Performance improvement #20

vtoupet commented Jul 22, 2021 •

edited

haydenth commented Jul 22, 2021

haydenth commented Jul 22, 2021

vtoupet commented Jul 23, 2021

haydenth commented Jul 23, 2021

Performance improvement #20

Performance improvement #20

Comments

vtoupet commented Jul 22, 2021 • edited

haydenth commented Jul 22, 2021

haydenth commented Jul 22, 2021

vtoupet commented Jul 23, 2021

haydenth commented Jul 23, 2021

vtoupet commented Jul 22, 2021 •

edited