Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pandas.eval() for massive datasets? #20

Open
bbartling opened this issue Feb 3, 2024 · 0 comments
Open

pandas.eval() for massive datasets? #20

bbartling opened this issue Feb 3, 2024 · 0 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@bbartling
Copy link
Owner

Current method with apply in pandas:

def apply(self, df: pd.DataFrame) -> pd.DataFrame:
    # Existing checks
    df['static_check_'] = (
        df[self.duct_static_col] < df[self.duct_static_setpoint_col] - self.duct_static_inches_err_thres)
    df['fan_check_'] = (
        df[self.supply_vfd_speed_col] >= self.vfd_speed_percent_max - self.vfd_speed_percent_err_thres)

    # Combined condition check
    df["combined_check"] = df['static_check_'] & df['fan_check_']

    # Rolling sum to count consecutive trues
    rolling_sum = df["combined_check"].rolling(window=5).sum()
    # Set flag to 1 if rolling sum equals the window size (5)
    df["fc1_flag"] = (rolling_sum == 5).astype(int)

    return df

Use eval?

def apply_with_eval(self, df: pd.DataFrame) -> pd.DataFrame:
    # Use eval for simple comparison operations
    df.eval('static_check_ = @self.duct_static_col < (@self.duct_static_setpoint_col - @self.duct_static_inches_err_thres)', inplace=True)
    df.eval('fan_check_ = @self.supply_vfd_speed_col >= (@self.vfd_speed_percent_max - @self.vfd_speed_percent_err_thres)', inplace=True)

    # Combined condition check (bitwise AND)
    df["combined_check"] = df['static_check_'] & df['fan_check_']

    # Rolling sum to count consecutive trues (This part remains the same)
    rolling_sum = df["combined_check"].rolling(window=5).sum()
    # Set flag to 1 if rolling sum equals the window size (5)
    df["fc1_flag"] = (rolling_sum == 5).astype(int)

    return df

Any insights appreciated its sort of interesting to see what ChatGPT states in a conversation about this.

  • Continue using your current approach with standard pandas operations, especially for the more complex parts like the rolling window operation.
  • If performance becomes an issue, consider using eval() for the simpler comparison operations, but benchmark to ensure it's actually faster for your specific case.
  • Always balance between readability/maintainability and performance, choosing the one that best fits your project's requirements.
  • Remember, while eval() can offer performance improvements in certain cases, it's always good to benchmark with your specific dataset to ensure it's actually faster and doesn't compromise readability or maintainability.
@bbartling bbartling added enhancement New feature or request help wanted Extra attention is needed labels Feb 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant