Why does "force_parallel(enable=True)" not work? #206

kongbo96 · 2022-12-30T10:31:03Z

In this code, dask works：

def has_inter(x_cat_set, now_set):
    inter = x_cat_set.intersection(now_set)
    return len(inter) == 0 

def get_negs2(now_set,si_doc, num, df3):
    negs_set = set(df3[df3.loc[:,'s_cat'].swifter.progress_bar(False).apply(has_inter, args=(now_set, ))].s_id)
    negs = list(negs_set)
    return negs

neg_dict = df2.loc[:, 's_cat'].swifter.force_parallel(enable=True).apply(get_negs2, args=(si_doc, n_neg, df3,))

This is the result：

In this code, dask doesn't works：


def get_negs(line, si_doc, num, df3):
    now_set = line['s_cat']
    negs_set = set(df3[df3.loc[:,'s_cat'].swifter.progress_bar(False).apply(has_inter, args=(now_set, ))].s_id)
    negs = list(negs_set)
    return negs

neg_dict = df2.swifter.force_parallel(enable=True).allow_dask_on_strings(enable=True).apply(get_negs,args=(si_doc,n_neg, df3,),axis=1)

This is the result：

Why are there different results? I want to use the second method, because I need to use two columns of data in other cases.

The text was updated successfully, but these errors were encountered:

jmcarpenter2 · 2023-03-24T17:25:23Z

Hmmm, this is strange behavior. It must be trying to use dask and failing to validate the apply on the sample dataset. Is there any chance you could provide an example (or fake) dataset for me to run this code and try to debug the core of the issue?

kongbo96 · 2023-03-26T07:56:05Z

Sorry, it's been too long and the data can't be found. The only difference between the two pieces of code is that the now_set of different lines of the second piece of code is different, while the first piece of code has only one now_set.
In fact, the purpose is to find out the data in df3 that does not intersect with the s_cat of the row of df2.

jmcarpenter2 added the bug Something isn't working label Mar 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why does "force_parallel(enable=True)" not work? #206

Why does "force_parallel(enable=True)" not work? #206

kongbo96 commented Dec 30, 2022 •

edited

jmcarpenter2 commented Mar 24, 2023

kongbo96 commented Mar 26, 2023

Why does "force_parallel(enable=True)" not work? #206

Why does "force_parallel(enable=True)" not work? #206

Comments

kongbo96 commented Dec 30, 2022 • edited

jmcarpenter2 commented Mar 24, 2023

kongbo96 commented Mar 26, 2023

kongbo96 commented Dec 30, 2022 •

edited