Datset Preprocessing #10

hamza13-12 · 2024-03-25T19:17:15Z

Hello. As far as I understand, you are storing the data in a pandas dataframe with one column corressponding to EEG signals and the other to text and then converting EEG signals to text, correct? Could you elaborate more on how you've achieved this dataset format so that others can organize the dataset the same way?

MikeWangWZHL · 2024-03-25T21:39:26Z

Hi! sorry I am not sure what do you mean by pandas? But data preprocssing scripts can be found in scripts/prepare_dataset.sh;
for example, the util/construct_dataset_mat_to_pickle_v1.py will convert the ZuCo v1.0 .mat file into a .pickle file, which is like a python dictionary.

hamza13-12 · 2024-03-25T23:44:56Z

Pandas is a data analysis library in python used to build dataframes. I was actually asking for instructions on how to build the dataset in the format where one column corressponds to EEG signals and another one to text so that I can create seq2seq models that take EEG as input and generate text

hamza13-12 · 2024-03-30T22:09:25Z

Actually, I figured it out! After creating train_set and dev_set, I just used this snippet of code:

import pandas as pd

def dataset_to_dataframe(dataset):
    # Initialize lists to hold data
    input_embeddings_list = []
    seq_len_list = []
    input_attn_mask_list = []
    input_attn_mask_invert_list = []
    target_strings_list = []
    sent_level_EEG_list = []
    
    # Iterate through the dataset
    for i in range(len(dataset)):
        input_embeddings, seq_len, input_attn_mask, input_attn_mask_invert, target_string, sent_level_EEG = dataset[i]
        
        # Convert tensors to numpy arrays
        input_embeddings_np = input_embeddings.numpy()
        input_attn_mask_np = input_attn_mask.numpy()
        input_attn_mask_invert_np = input_attn_mask_invert.numpy()
        sent_level_EEG_np = sent_level_EEG.numpy()
        
        # Append to lists
        input_embeddings_list.append(input_embeddings_np)
        seq_len_list.append(seq_len)
        input_attn_mask_list.append(input_attn_mask_np)
        input_attn_mask_invert_list.append(input_attn_mask_invert_np)
        target_strings_list.append(target_string)
        sent_level_EEG_list.append(sent_level_EEG_np)
    
    # Create DataFrame
    df = pd.DataFrame({
        'Input Embeddings': input_embeddings_list,
        'Sequence Length': seq_len_list,
        'Input Attention Mask': input_attn_mask_list,
        'Input Attention Mask Invert': input_attn_mask_invert_list,
        'Target String': target_strings_list,
        'Sentence Level EEG': sent_level_EEG_list
    })
    
    return df

# Convert datasets to dataframes
train_df = dataset_to_dataframe(train_set)
dev_df = dataset_to_dataframe(dev_set)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Datset Preprocessing #10

Datset Preprocessing #10

hamza13-12 commented Mar 25, 2024

MikeWangWZHL commented Mar 25, 2024

hamza13-12 commented Mar 25, 2024

hamza13-12 commented Mar 30, 2024 •

edited

Datset Preprocessing #10

Datset Preprocessing #10

Comments

hamza13-12 commented Mar 25, 2024

MikeWangWZHL commented Mar 25, 2024

hamza13-12 commented Mar 25, 2024

hamza13-12 commented Mar 30, 2024 • edited

hamza13-12 commented Mar 30, 2024 •

edited