Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deserialization error for some querysets #34

Open
Peder2911 opened this issue Dec 16, 2021 · 1 comment
Open

Deserialization error for some querysets #34

Peder2911 opened this issue Dec 16, 2021 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@Peder2911
Copy link
Contributor

Me and @jimdale found an issue where viewser would raise a deserialization
error, while there was obviously at least partial Parquet bytes data in the
response:

DeserializationError: DeserializationError:

  Description:
                Could not deserialize as parquet:             "b'PAR1\x15\x04\
                x15\xe0D\x15\xf8?L\x15\xcc\x08\x15\x04\x12\x00\x00\x1f\x8b\x08
                \x00\x00\x00\x00\x00\x00\x03-Wi8Vk\x1b5e\x1e\xdei\x8f\xafY*\x9
                1\xc21'..."

This only seems to happen with certain querysets. The queryset that lead to this error was:

queryset = (Queryset("jim_fatalities_conflict_history_lag_tdecay", "priogrid_month")
 
            # target variable
            .with_column(Column("ln_ged_sb", from_table = "ged2_pgm", from_column = "ged_sb_best_sum_nokgi")
                         .transform.missing.fill()
                         .transform.ops.ln()
                        )
            
            # spatial-tree-lagged d^-2 target variable
             .with_column(Column("ln_ged_sb_treelag_2_th1_0", from_table = "ged2_pgm", from_column = "ged_sb_best_sum_nokgi")
                         .transform.missing.fill()
                         .transform.ops.ln()
                         .transform.spatial.treelag(thetacrit_tree,2)
                        )
            
            # 1 tlagged spatial-tree-lagged d^-2 target variable
             .with_column(Column("ln_ged_tlag_1_sb_treelag_2_th1_0", from_table = "ged2_pgm", from_column = "ged_sb_best_sum_nokgi")
                         .transform.missing.fill()
                         .transform.ops.ln()
                         .transform.spatial.treelag(thetacrit_tree,2)
                         .transform.temporal.tlag(1)
                         .transform.missing.fill()
                        )
            
            # spatial-tree-lagged d^-1 target variable
             .with_column(Column("ln_ged_sb_treelag_1_th1_0", from_table = "ged2_pgm", from_column = "ged_sb_best_sum_nokgi")
                         .transform.missing.fill()
                         .transform.ops.ln()
                         .transform.spatial.treelag(thetacrit_tree,1)
                        )
            
            # 1 tlagged spatial-tree-lagged d^-1 target variable
             .with_column(Column("ln_ged_tlag_1_sb_treelag_1_th1_0", from_table = "ged2_pgm", from_column = "ged_sb_best_sum_nokgi")
                         .transform.missing.fill()
                         .transform.ops.ln()
                         .transform.spatial.treelag(thetacrit_tree,1)
                         .transform.temporal.tlag(1)
                         .transform.missing.fill()
                        )
            
            # spatial-tree-lagged ln(1+d) target variable
             .with_column(Column("ln_ged_sb_treelag_0_th1_0", from_table = "ged2_pgm", from_column = "ged_sb_best_sum_nokgi")
                         .transform.missing.fill()
                         .transform.ops.ln()
                         .transform.spatial.treelag(thetacrit_tree,0)
                        )
            
            # 1 tlagged spatial-tree-lagged ln(1+d) target variable
             .with_column(Column("ln_ged_tlag_1_sb_treelag_0_th1_0", from_table = "ged2_pgm", from_column = "ged_sb_best_sum_nokgi")
                         .transform.missing.fill()
                         .transform.ops.ln()
                         .transform.spatial.treelag(thetacrit_tree,0)
                         .transform.temporal.tlag(1)
                         .transform.missing.fill()
                        )
             )

To begin diagnosing this, we need to write some tooling for dumping the
erroneous response data to figure out what is being returned that is not
deserializable. This will give us a clue about whether or not the issue is
being caused by something upstream, or is caused by some issue with
deserialization.

A clue is that there is no exception happening upstream, which means that the
data is written to parquet and sent away just fine. This hints towards there
being something wrong with viewser.

@Peder2911 Peder2911 added the bug Something isn't working label Dec 16, 2021
@Peder2911
Copy link
Contributor Author

This seems to have something to do with our current network topology, as no issue has been found with viewser. We have inspected dumped data from when this issue occurs, and there is no exception, nor any deserialization malfunctions in viewser. The data is simply incomplete out of transfer, which again seems to suggest that something interrupts the connection mid-transfer.

It will be interesting to see if this issue persists with our new servers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants