Skip to content

tsalo/convert-eprime

Repository files navigation

convert-eprime

DOI License TravisCI

Python code to convert E-Prime files to csv-format spreadsheets for analysis. With the exception of text_to_csv, most of the functionality of this package is implemented very easily with pandas.

I am not currently working on this package, but if you see a bug or have an idea for an improvement, please open an issue and I will address it as soon as I can. Pull Requests are also welcome.

Code

  • index_eprime_files.py: Looks through a folder and finds pairs of edat/txt files (successful runs) and all other combinations of the two filetypes. It notes these other files as requiring individual attention. Eventually it will call convert to automatically convert the text files in pairs. Does not currently work properly!
  • convert.py: A set of three functions to convert E-Prime files into a more manageable csv format. The functions:
    1. etext_to_rcsv: Converts exported "E-Prime text" file to reduced csv based on desired column headers. Make sure, when exporting the edat file as "E-Prime text", that Unicode is turned off.
    2. text_to_csv: Converts text file produced by successful completion of E-Prime experiment to csv. Output from text_to_csv can be used to deduce information necessary for text_to_rcsv (e.g. columns to merge, columns to rename, etc.). These variables would then be saved in a task-specific json file. Alternatively, if you plan to work directly from the text files, then you probably won't need to rename columns.
    3. text_to_rcsv: Converts text file produced by successful completion of E-Prime experiment to reduced csv, using information from the variables contained in headers.pickle. If used properly the output should be indistinguishable from the output of etext_to_rcsv, only without the tedious step of exporting the "E-Prime text" file by hand.

Procedure

  • If you plan to export your edat files to "E-Prime text" format by hand, then you can simply use etext_to_rcsv to convert the resulting files. You will need to know which columns in the file are necessary for vector creation and result summarization. Also, you will need to know if you want to remove rows with NaNs. If you're unsure, set rem_nulls for your task to False.
  • If you plan to only use the text files outputted by E-Prime, you can use text_to_csv to convert them. The outputted csv files will be very different from the ones created by etext_to_rcsv, depending on how you coded the experiment. Sometimes columns will be split in the text file but merged in the edat, and column headers will often be different between the two. You can, of course, write your vector creation and result summarization code to work with these csv files instead of the reduced ones created by the other functions.
  • If you're used to converting your edat files by hand and all of your code is geared toward that, but you hate it and want to stop without rewriting all of your code, you can use text_to_rcsv. You will need to run text_to_csv at least once per paradigm to determine which columns need to be merged, which ones need to be renamed, etc. to match the format of the "E-Prime text" files you normally deal with. However, once you have that done and the dictionaries in the task-specific json file are set for your task, it should run smoothly for the future. If this interests you, here is (I hope) a fairly generic step-by-step procedure for figuring out those dictionaries:
    1. Using whatever scripts you normally use to create your vectors or summary statistics, identify necessary columns in the edat file.
    2. Create a json file (see config_files/ for examples) for your task with headers (the relevant column names from the edat), merge_cols (empty dictionary), null_cols (empty list), rem_nulls (True), and replace_dict (empty dictionary).
    3. Run text_to_csv. It will generate a csv file that will be based on the text file automatically generated by E-Prime.
    4. Look through the csv file and compare with the edat. 1. There will be some columns with different names that contain the same information (replace_dict).
      • You might also come across a column in the text file that has the same name as an important column in the edat, but with different information. You can rename that column to something else using replace_dict, so that you’re essentially switching out a bad header with a good one (assuming you’ve found the differently-named but corresponding text file header for that edat header). 2. There will also be some columns that have NULLs right where you need them (on the rows you want to remove), so use one of those for null_cols. You can use more than one if no one of them covers everything. 3. There may also be columns in the edat that correspond to multiple columns in the csv file created by text_to_csv. For example, one column in the text file might correspond to block 1, with another corresponding to block 2, while in the edat they’re just one column. You can add those columns as dictionaries to merge_cols, where the key is the named of the merged column to create and the value is a list of columns to merge.
    5. Test it out. This procedure is tedious, but now text_to_rcsv should work with your task.