How to apply function .preprocess and others to Pandas df? #8

fatihbozdag · 2023-01-14T20:19:06Z

Greetings all,

I have a large corpus zipping into a Pandas dataframe and I'd like to iterate text column to record the results of individual functions to separate columns. As far as I get, extractor only accepts str. I am trying to merge scores with metadata included in the dataframe.

For instance, my dataframe is follows.

df.head()
  docid_field  ...                                         text_field
0    BGSU1001  ...   <ICLE-BG-SUN-0001.1> \nIt is time, that our s...
1    BGSU1002  ...   <ICLE-BG-SUN-0002.1> \nNowadays there is a gr...
2    BGSU1003  ...   <ICLE-BG-SUN-0003.1> \nOnce upon a time there...
3    BGSU1004  ...   <ICLE-BG-SUN-0004.1> \nOur educational system...
4    BGSU1005  ...   <ICLE-BG-SUN-0005.1> \nScience, technology an...

Is there a way to apply LingFeat function to df['text_field'] and record scores (let's say LingFeat.EnDF_()) as tuples into another column?
I did try

df['LingFeat'] = df['text_field'].apply(lambda x: extractor.pass_text(x))

and the result is

0      <lingfeat.extractor.pass_text object at 0x0000...
1      <lingfeat.extractor.pass_text object at 0x0000...
2      <lingfeat.extractor.pass_text object at 0x0000...
3      <lingfeat.extractor.pass_text object at 0x0000...
4      <lingfeat.extractor.pass_text object at 0x0000...
                       
923    <lingfeat.extractor.pass_text object at 0x0000...
924    <lingfeat.extractor.pass_text object at 0x0000...
925    <lingfeat.extractor.pass_text object at 0x0000...
926    <lingfeat.extractor.pass_text object at 0x0000...
927    <lingfeat.extractor.pass_text object at 0x0000...
Name: LingFeat, Length: 928, dtype: object

I couldn't go on any further. How should I do it, if it is possible?

The text was updated successfully, but these errors were encountered:

fatihbozdag · 2023-01-14T20:22:33Z

Another yet related question,

is it possible to add LingFeat to Spacy nlp.pipe?

brucewlee · 2023-03-02T00:01:51Z

Actually, this is a very interesting idea. I'll try to implement this is the next version of this project: LFTK.

fatihbozdag · 2023-03-02T18:30:35Z

I did something like this for those who may want to apply something similar.

a1 = "DocID"

a2 = "I won't say that committing suicide is good or bad, what I want to emphasize here is I think none should accuse such people of something that is only and only up to the person himself. It's his choice and the end is his own end not ours, everyone should be responsible for his rights and wrongs. First we should consider why a person intends to give an end to his life and how he finds the enough courage to kill himself. If somebody is to ready to do such a terrible thing, there should be incredibly huge reasons to force him to this"

df = pd.DataFrame({"DocID": a1, "text_field": a2}, index = [0])

# Advanced Semantic (AdSem) Features

WoKF = [] # Wikipedia Knowledge Features
WBKF = [] # WeeBit Corpus Knowledge Features
OSKF = [] # OneStopEng Corpus Knowledge Features

  # Discourse (Disco) Features
EnDF = [] # Entity Density Features
EnGF = [] # Entity Grid Features

  # Syntactic (Synta) Features
PhrF = [] # Noun/Verb/Adj/Adv/... Phrasal Features
TrSF = [] # (Parse) Tree Structural Features
POSF = [] # Noun/Verb/Adj/Adv/... Part-of-Speech Features

  # Lexico Semantic (LxSem) Features
TTRF = [] # Type Token Ratio Features
VarF = [] # Noun/Verb/Adj/Adv Variation Features 
PsyF = [] # Psycholinguistic Difficulty of Words (AoA Kuperman)
WoLF = [] # Word Familiarity from Frequency Count (SubtlexUS)

  # Shallow Traditional (ShTra) Features
ShaF = [] # Shallow Features (e.g. avg number of tokens)
TraF = [] # Traditional Formulas 

for x in df["text_field"]:
    LingFeat = []
    LingFeat.append(extractor.pass_text(x))
    for y in LingFeat:
        y.preprocess()
        for a in LingFeat:
            WoKF.append(a.WoKF_())
            WBKF.append(a.WBKF_())
            OSKF.append(a.OSKF_())
            EnDF.append(a.EnDF_())
            EnGF.append(a.EnGF_())
            PhrF.append(a.PhrF_())
            TrSF.append(a.TrSF_())
            POSF.append(a.POSF_())
            TTRF.append(a.TTRF_())
            VarF.append(a.VarF_())
            PsyF.append(a.PsyF_())
            WoLF.append(a.WorF_())
            ShaF.append(a.ShaF_())
            TraF.append(a.TraF_())

 ##Advanced Semantic Scores##
    
WoKF_score = pd.DataFrame.from_dict(WoKF, orient = "columns")
WBKF_score = pd.DataFrame.from_dict(WBKF, orient = "columns")
OSKF_score = pd.DataFrame.from_dict(OSKF, orient = "columns")

Adsem_Scores = pd.concat([WoKF_score,WBKF_score,OSKF_score], axis = 1)
Adsem_Scores.insert(0, "DocID", df["DocID"])

    ##Discourse Scores##

EnDF_score =  pd.DataFrame.from_dict(EnDF, orient = "columns")
EnGF_score =  pd.DataFrame.from_dict(EnGF, orient = "columns")

Disco_Scores = pd.concat([EnDF_score, EnGF_score], axis = 1)
Disco_Scores.insert(0, "DocID", df["DocID"])

    ##Syntactic Scores##

PhrF_score = pd.DataFrame.from_dict(PhrF, orient="columns")
TrSF_score = pd.DataFrame.from_dict(TrSF, orient = "columns")
POSF_score = pd.DataFrame.from_dict(POSF, orient = "columns")

Syntactic_Scores = pd.concat([PhrF_score, TrSF_score, POSF_score], axis = 1)
Syntactic_Scores.insert(0, "DocID", df["DocID"])


    ###Lexico-Semantic Scores###

TTRF_score = pd.DataFrame.from_dict(TTRF, orient="columns")
VarF_score = pd.DataFrame.from_dict(VarF, orient="columns")
PsyF_score =  pd.DataFrame.from_dict(PsyF, orient="columns")
WoLF_score =  pd.DataFrame.from_dict(WoLF, orient="columns")

LexicoSemantic_Scores = pd.concat([TTRF_score, VarF_score, PsyF_score, WoLF_score], axis = 1)
LexicoSemantic_Scores.insert(0, "DocID", df["DocID"])
    
    ###Shallow Traditional Features###

ShaF_score = pd.DataFrame.from_dict(ShaF, orient="columns")
TraF_score = pd.DataFrame.from_dict(TraF, orient="columns")

ShTra_Scores = pd.concat([ShaF_score, TraF_score], axis = 1)
ShTra_Scores.insert(0, "DocID", df["DocID"])

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to apply function .preprocess and others to Pandas df? #8

How to apply function .preprocess and others to Pandas df? #8

fatihbozdag commented Jan 14, 2023

fatihbozdag commented Jan 14, 2023

brucewlee commented Mar 2, 2023

fatihbozdag commented Mar 2, 2023 •

edited

How to apply function .preprocess and others to Pandas df? #8

How to apply function .preprocess and others to Pandas df? #8

Comments

fatihbozdag commented Jan 14, 2023

fatihbozdag commented Jan 14, 2023

brucewlee commented Mar 2, 2023

fatihbozdag commented Mar 2, 2023 • edited

fatihbozdag commented Mar 2, 2023 •

edited