Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iteration performance on rows with zero Columns #1818

Open
wPatrick opened this issue Feb 21, 2024 · 2 comments
Open

iteration performance on rows with zero Columns #1818

wPatrick opened this issue Feb 21, 2024 · 2 comments
Labels
needs more info This issue can't reproduce, need more info

Comments

@wPatrick
Copy link

The following scenario:

I have an Excel file with 3 worksheets.

Worksheet 1 contains approx. 435 rows
Worksheet 2 contains approx. 222 rows
Worksheet 3 contains approx. 17 rows

Structurally, the worksheets are identical, but the dimensions of the worksheets are messed up.

Worksheet 1 Dimensions: A1:AMH435
Worksheet 2 Dimensions: A1:ALZ882
Worksheet 3 Dimensions: A1:AME471

Note: Everything is fine so far, rows were obviously deleted when the worksheets were created. No error at this point.

I now iterate through each row and measure the time it takes to read the columns of a row:

start := time.Now()
row, err := rows.Columns()
elapsed := time.Since(start)
fmt.Printf("%d,%s,%d\n", rowIndex+1, elapsed, len(row))

I now have the following output for worksheet 3, for example:

...
5,68.82µs,9
6,43.795µs,9
7,40.803µs,9
...
16,40.584µs,9
17,40.682µs,9
18,2.853217ms,0
19,2.855862ms,0
...
65,2.922637ms,0
66,2.859574ms,0
...
471,2.872934ms,0
472,2.880524ms,0

You can see that there are 9 columns per row up to row 17. In these rows the execution time is about 40µs. From row 17 to the end of the worksheet dimensions, however, the execution time is 2-3ms per row, even though there are no columns. To me, this feels like a bug, as it is not clear to me why the first 17 rows are processed faster than the subsequent rows without content. I noticed this because the execution time of worksheet 1 is in the ms range, while I need 2-3 seconds for worksheets 2 and 3. Of course i could do a workaround and end the iteration as soon there are empty lines, but that's not possible in every situation.

I am using v2.8.0

@xuri
Copy link
Member

xuri commented Mar 9, 2024

Thanks for your issue. Could you show us a complete, standalone example program or reproducible demo? If you open an existing workbook, please provide the file attachment without confidential info.

@xuri xuri added the needs more info This issue can't reproduce, need more info label Mar 9, 2024
@wPatrick
Copy link
Author

In the coming weeks, I will provide a sample code along with the corresponding Excel files. Currently, I have a new addition to my family that requires a lot of my time. Please keep the topic open as I am still interested in understanding the problem. I debugged some of the xml parsing and managed to isolate the issue to a certain extent. Unfortunately, my understanding isn’t deep enough to truly comprehend what the code is doing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs more info This issue can't reproduce, need more info
Projects
None yet
Development

No branches or pull requests

2 participants