chi_merge_vector has redundant calculation. #9

cheesebear · 2022-04-24T10:20:28Z

codes below is calculated in every while loop, and takes too much time.

        intervals, unique_intervals = assign_interval_unique(x, unique_intervals[:, 1])
        pt_value, pt_column, pt_index = pivot_table_np(intervals[:, 1], y)

In my situation, original code takes 10m to calculate one feature. After optimazation, it takes about 10s.
in first loop, defines df:

    df = pd.DataFrame(pt_value, columns=pt_column)
    df['pt_index'] = pt_index
    df['chi2'] = np.append(chi2_array, [np.NaN] * (m - 1))

in other loops, adjust df, and adjust intermediate variable:
```

使用快速方法，避免重复计算

    merge_index_start=index_adjacent_to_merge[0]
    # print(df.loc[merge_index_start:merge_index_start+m-1, :].sum(axis=0).to_frame())
    df=pd.concat(
        [
            df.loc[:merge_index_start-1,:],
            df.loc[merge_index_start:merge_index_start+m-1, :].sum(axis=0).to_frame().T,
            df.loc[merge_index_start+ m:, :],
        ],
        ignore_index=True
    )
    # print(df)
    df.loc[merge_index_start:merge_index_start  , 'pt_index']=new_interval[0][1]

    pt_value = df[pt_column].to_numpy()
    pt_index = df['pt_index'].to_numpy()
    boundaries_tmp = np.unique(
        np.concatenate((np.array([-float('inf')]),
                        df['pt_index'].to_numpy(), np.array([float('inf')])),
                       axis=0))
    boundaries_tmp.sort()
    unique_intervals=np.array([[boundaries_tmp[i],boundaries_tmp[i+1]] for i in range(len(boundaries_tmp)-1)])

The text was updated successfully, but these errors were encountered:

Mensyne · 2022-10-11T07:26:23Z

已收到，谢谢

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chi_merge_vector has redundant calculation. #9

chi_merge_vector has redundant calculation. #9

cheesebear commented Apr 24, 2022 •

edited

Mensyne commented Oct 11, 2022 via email

chi_merge_vector has redundant calculation. #9

chi_merge_vector has redundant calculation. #9

Comments

cheesebear commented Apr 24, 2022 • edited

使用快速方法，避免重复计算

Mensyne commented Oct 11, 2022 via email

cheesebear commented Apr 24, 2022 •

edited