Skip to content

Convenience functions for working with emuR

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

rpuggaardrode/emuhelpeR

Repository files navigation

emuhelpeR: Convenience functions for working with emuR

This package collects convenience functions for working with emuR and EMU-SDMS. For more information on those tools, see this useful manual.

For now, functions are available to bulk extract and pre-process SSFF tracks, with specific functions available for processing fundamental frequency, formants, and measures which depend directly on these. This can all be done in a single step with the function import_ssfftracks(), but the different steps can also be carried out independently. Some of the independent functions may well be useful for raw data that isn’t generated in emuR. These functions are adapted from the data processing used in Kirby et al. 2023.

The package also provides different functions for adding SSFF tracks to existing EMU databases. One such function is praatsauce2ssff(), which allows users to add the output from PraatSauce to an EMU database. The function was written on the basis of PraatSauce output, but should in principle also work for output from VoiceSauce. This hasn’t been tested though. Additionally there are functions for calculating spectral moments and DCT coefficients of spectra generated over equidistant time steps for all the sound files of an EMU database. These are called moments2ssff() and dct2ssff(). See the help files for more.

Importing and processing SSFF tracks

The function import_ssfftracks() assumes that you have raw data stored in an EMU database which has already been loaded into R, and that you have generated a segment list with relevant portions of the data using the querying system in emuR. Let’s load some example data from Kirby et al. forthc. into R.

datapath <- system.file('extdata/db', package='emuhelpeR')
raw <- emuR::load_emuDB(datapath)
#> INFO: Checking if cache needs update for 2 sessions and 10 bundles ...
#> INFO: Performing precheck and calculating checksums (== MD5 sums) for _annot.json files ...
#> INFO: Nothing to update!

This data can be inspected in EMU-SDMS by typing emuR::serve(raw) in the R console. Let’s have a look at a segment list I prepared:

library(emuhelpeR)
seg_list
#> # A tibble: 10 × 16
#>    labels start   end db_uuid       session bundle start…¹ end_i…² level attri…³
#>    <chr>  <dbl> <dbl> <chr>         <chr>   <chr>    <int>   <int> <chr> <chr>  
#>  1 op      715.  929. 33157d9f-4a3… f1      F1-00…       5       5 ORL   ORL    
#>  2 op      302.  611. 33157d9f-4a3… f1      F1-00…       5       5 ORL   ORL    
#>  3 op      482.  733. 33157d9f-4a3… f1      F1-00…       5       5 ORL   ORL    
#>  4 op     1461. 1665. 33157d9f-4a3… f1      F1-00…       5       5 ORL   ORL    
#>  5 op      897. 1165. 33157d9f-4a3… f1      F1-00…       7       7 ORL   ORL    
#>  6 op     1026. 1261. 33157d9f-4a3… m1      M1-00…       4       4 ORL   ORL    
#>  7 op      775. 1194. 33157d9f-4a3… m1      M1-00…       5       5 ORL   ORL    
#>  8 op     1289. 1547. 33157d9f-4a3… m1      M1-00…       7       7 ORL   ORL    
#>  9 op     1226. 1544. 33157d9f-4a3… m1      M1-00…       6       6 ORL   ORL    
#> 10 op      983. 1134. 33157d9f-4a3… m1      M1-00…       7       7 ORL   ORL    
#> # … with 6 more variables: start_item_seq_idx <int>, end_item_seq_idx <int>,
#> #   type <chr>, sample_start <int>, sample_end <int>, sample_rate <int>, and
#> #   abbreviated variable names ¹​start_item_id, ²​end_item_id, ³​attribute
dplyr::glimpse(seg_list)
#> Rows: 10
#> Columns: 16
#> $ labels             <chr> "op", "op", "op", "op", "op", "op", "op", "op", "op…
#> $ start              <dbl> 714.5465, 301.8254, 481.9161, 1460.7596, 897.0181, …
#> $ end                <dbl> 928.9229, 611.3265, 733.1633, 1665.3628, 1165.0907,…
#> $ db_uuid            <chr> "33157d9f-4a3a-468a-882b-60d3b10ea771", "33157d9f-4…
#> $ session            <chr> "f1", "f1", "f1", "f1", "f1", "m1", "m1", "m1", "m1…
#> $ bundle             <chr> "F1-0002-car-rep1-Naam-37", "F1-0002-car-rep1-baa-8…
#> $ start_item_id      <int> 5, 5, 5, 5, 7, 4, 5, 7, 6, 7
#> $ end_item_id        <int> 5, 5, 5, 5, 7, 4, 5, 7, 6, 7
#> $ level              <chr> "ORL", "ORL", "ORL", "ORL", "ORL", "ORL", "ORL", "O…
#> $ attribute          <chr> "ORL", "ORL", "ORL", "ORL", "ORL", "ORL", "ORL", "O…
#> $ start_item_seq_idx <int> 4, 4, 4, 4, 6, 3, 4, 6, 5, 6
#> $ end_item_seq_idx   <int> 4, 4, 4, 4, 6, 3, 4, 6, 5, 6
#> $ type               <chr> "SEGMENT", "SEGMENT", "SEGMENT", "SEGMENT", "SEGMEN…
#> $ sample_start       <int> 31512, 13311, 21253, 64420, 39559, 45237, 34182, 56…
#> $ sample_end         <int> 40965, 26959, 32332, 73442, 51380, 55596, 52655, 68…
#> $ sample_rate        <int> 44100, 44100, 44100, 44100, 44100, 44100, 44100, 44…

There are a bunch of functional measures available for this database, as the following prompt will tell us:

emuR::list_ssffTrackDefinitions(raw)
#>       name columnName fileExtension
#> 1  praatF0        pF0           pF0
#> 2    eggF0       pdF0          pdF0
#> 3    H1H2c      H1H2c         H1H2c
#> 4    H1A1c      H1A1c         H1A1c
#> 5    H1A3c      H1A3c         H1A3c
#> 6      CPP        CPP           CPP
#> 7    CQ_PH      CQ_PH         CQ_PH
#> 8    CQ_PD      CQ_PD         CQ_PD
#> 9  praatF1        pF1           pF1
#> 10 praatF2        pF2           pF2
#> 11 praatF3        pF3           pF3

Using import_ssfftracks() we can bulk extract all these measures from the segments in seg_list into a single data frame. I set proc=FALSE, because right now we just want to extract the raw measures. verbose is set to FALSE to avoid printing progress bars that look ugly on GitHub.

x <- import_ssfftracks(db_handle=raw, seg_list=seg_list, proc=FALSE, verbose=FALSE)
x
#> # A tibble: 2,632 × 31
#>    sl_rowIdx labels start   end db_uuid     session bundle start…¹ end_i…² level
#>        <int> <chr>  <dbl> <dbl> <chr>       <chr>   <chr>    <int>   <int> <chr>
#>  1         1 op      715.  929. 33157d9f-4… f1      F1-00…       5       5 ORL  
#>  2         1 op      715.  929. 33157d9f-4… f1      F1-00…       5       5 ORL  
#>  3         1 op      715.  929. 33157d9f-4… f1      F1-00…       5       5 ORL  
#>  4         1 op      715.  929. 33157d9f-4… f1      F1-00…       5       5 ORL  
#>  5         1 op      715.  929. 33157d9f-4… f1      F1-00…       5       5 ORL  
#>  6         1 op      715.  929. 33157d9f-4… f1      F1-00…       5       5 ORL  
#>  7         1 op      715.  929. 33157d9f-4… f1      F1-00…       5       5 ORL  
#>  8         1 op      715.  929. 33157d9f-4… f1      F1-00…       5       5 ORL  
#>  9         1 op      715.  929. 33157d9f-4… f1      F1-00…       5       5 ORL  
#> 10         1 op      715.  929. 33157d9f-4… f1      F1-00…       5       5 ORL  
#> # … with 2,622 more rows, 21 more variables: attribute <chr>,
#> #   start_item_seq_idx <int>, end_item_seq_idx <int>, type <chr>,
#> #   sample_start <int>, sample_end <int>, sample_rate <int>, times_orig <dbl>,
#> #   times_rel <dbl>, times_norm <dbl>, praatF0 <dbl>, eggF0 <dbl>, H1H2c <dbl>,
#> #   H1A1c <dbl>, H1A3c <dbl>, CPP <dbl>, CQ_PH <dbl>, CQ_PD <dbl>,
#> #   praatF1 <dbl>, praatF2 <dbl>, praatF3 <dbl>, and abbreviated variable names
#> #   ¹​start_item_id, ²​end_item_id
dplyr::glimpse(x)
#> Rows: 2,632
#> Columns: 31
#> $ sl_rowIdx          <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
#> $ labels             <chr> "op", "op", "op", "op", "op", "op", "op", "op", "op…
#> $ start              <dbl> 714.5465, 714.5465, 714.5465, 714.5465, 714.5465, 7…
#> $ end                <dbl> 928.9229, 928.9229, 928.9229, 928.9229, 928.9229, 9…
#> $ db_uuid            <chr> "33157d9f-4a3a-468a-882b-60d3b10ea771", "33157d9f-4…
#> $ session            <chr> "f1", "f1", "f1", "f1", "f1", "f1", "f1", "f1", "f1…
#> $ bundle             <chr> "F1-0002-car-rep1-Naam-37", "F1-0002-car-rep1-Naam-…
#> $ start_item_id      <int> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, …
#> $ end_item_id        <int> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, …
#> $ level              <chr> "ORL", "ORL", "ORL", "ORL", "ORL", "ORL", "ORL", "O…
#> $ attribute          <chr> "ORL", "ORL", "ORL", "ORL", "ORL", "ORL", "ORL", "O…
#> $ start_item_seq_idx <int> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, …
#> $ end_item_seq_idx   <int> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, …
#> $ type               <chr> "SEGMENT", "SEGMENT", "SEGMENT", "SEGMENT", "SEGMEN…
#> $ sample_start       <int> 31512, 31512, 31512, 31512, 31512, 31512, 31512, 31…
#> $ sample_end         <int> 40965, 40965, 40965, 40965, 40965, 40965, 40965, 40…
#> $ sample_rate        <int> 44100, 44100, 44100, 44100, 44100, 44100, 44100, 44…
#> $ times_orig         <dbl> 715, 716, 717, 718, 719, 720, 721, 722, 723, 724, 7…
#> $ times_rel          <dbl> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 1…
#> $ times_norm         <dbl> 0.000000000, 0.004694836, 0.009389671, 0.014084507,…
#> $ praatF0            <dbl> 216.523, 216.423, 216.719, 217.147, 217.576, 218.00…
#> $ eggF0              <dbl> 215.0127, 215.0127, 215.0127, 218.6272, 218.6272, 2…
#> $ H1H2c              <dbl> 15.851, 16.093, 16.287, 16.513, 16.633, 16.495, 16.…
#> $ H1A1c              <dbl> 19.376, 19.110, 18.868, 18.655, 18.433, 18.122, 17.…
#> $ H1A3c              <dbl> 22.017, 21.680, 21.623, 21.852, 21.368, 20.545, 20.…
#> $ CPP                <dbl> 19.499, 20.861, 24.094, 23.843, 25.368, 24.960, 25.…
#> $ CQ_PH              <dbl> 0.7078310, 0.7078310, 0.7078310, 0.7105345, 0.71053…
#> $ CQ_PD              <dbl> 0.7461344, 0.7461344, 0.7461344, 0.7482461, 0.74824…
#> $ praatF1            <dbl> 788.254, 788.753, 789.251, 789.750, 790.601, 792.51…
#> $ praatF2            <dbl> 1678.267, 1670.355, 1662.443, 1654.532, 1648.756, 1…
#> $ praatF3            <dbl> 3585.059, 3585.030, 3585.001, 3584.972, 3582.238, 3…

Neat! But if we skip proc=FALSE and set some more parameters, we can also do a bunch of preprocessing in the same step, such as by-speaker normalization and automated removal of outliers that fall outside three standard deviations from the mean within the same group. When the function is called, it will also print a message telling us how many outliers were removed from each track.

  • f0col='praatF0' specifies that F0 values are stored in the SSFF track praatF0. In this track, values of 0 should be recoded as NA, and outliers should be automatically removed after.
  • f0dep='H1H2c' specifies that the track H1H2c (the difference between the first two harmonics) is directly dependent on the F0 measurements, so for each F0 measure coded as NA, the corresponding H1H2c should also be coded as NA.
  • fncol=c('praatF1', 'praatF2', 'praatF3') specifies that the available formant measures F1-F3 are stored in the SSFF tracks praatF1, praatF2, and praatF3. Outliers are automatically removed.
  • fndep=list(c('H1A1c', 'F1'), c('H1A3c', 'F3')) specifies that, respectively, H1A1c is a spectral measure that directly depends on F1 (and F0), and H1A3c is a spectral measure that directly depends on F3 (and F0). H1A1c values will be coded as NA if the corresponding F1 or F0 measure is NA, etc.
  • speaker='speaker' specifies that there is a column with speaker information in the seg_list data frame, and that column is labeled speaker. This is used for by-speaker normalization.
  • group_var=c('speaker', 'vowel') specifies that the columns speaker and vowel in seg_list should be used for determining which tokens should be automatically removed; only tokens that are three standard deviations from the mean within-speaker and within-vowel are removed.
  • timing_rm=list('cl', 250) specifies that F0 measurements that are more than 250 ms removed from a cl label in the data should be removed.
  • outlier_rm='eggF0' specifies that, in addition to the automated outlier procedures that have already been applied, outliers should also be automatically removed from the SSFF track eggF0.
y <- import_ssfftracks(db_handle=raw, seg_list=seg_list,
                       f0col='praatF0', f0dep='H1H2c', 
                       fncol=c('praatF1', 'praatF2', 'praatF3'),
                       fndep=list(c('H1A1c', 'F1'), c('H1A3c', 'F3')),
                       speaker='session', group_var='session',
                       timing_rm=list('cl', 250), outlier_rm='eggF0',
                       verbose=FALSE)
#> [1] "Initial number of NAs in F0 track: 114"
#> [1] "Number of NAs removed from F0 track during automated outlier removal: 0"
#> [1] "Number of NAs removed from H1H2c track during automated outlier removal: 114"
#> [1] "Number of NAs removed from F1 track during automated outlier removal: 0"
#> [1] "Number of NAs removed from F2 track during automated outlier removal: 1"
#> [1] "Number of NAs removed from F3 track during automated outlier removal: 6"
#> [1] "Number of NAs removed from H1A1c track during automated outlier removal: 114"
#> [1] "Number of NAs removed from H1A3c track during automated outlier removal: 114"
#> [1] "Number of NAs removed from eggF0 track during automated outlier removal: 26"
y
#> # A tibble: 2,632 × 61
#>    sl_rowIdx labels start   end db_uuid     session bundle start…¹ end_i…² level
#>        <int> <chr>  <dbl> <dbl> <chr>       <chr>   <chr>    <int>   <int> <chr>
#>  1         1 op      715.  929. 33157d9f-4… f1      F1-00…       5       5 ORL  
#>  2         1 op      715.  929. 33157d9f-4… f1      F1-00…       5       5 ORL  
#>  3         1 op      715.  929. 33157d9f-4… f1      F1-00…       5       5 ORL  
#>  4         1 op      715.  929. 33157d9f-4… f1      F1-00…       5       5 ORL  
#>  5         1 op      715.  929. 33157d9f-4… f1      F1-00…       5       5 ORL  
#>  6         1 op      715.  929. 33157d9f-4… f1      F1-00…       5       5 ORL  
#>  7         1 op      715.  929. 33157d9f-4… f1      F1-00…       5       5 ORL  
#>  8         1 op      715.  929. 33157d9f-4… f1      F1-00…       5       5 ORL  
#>  9         1 op      715.  929. 33157d9f-4… f1      F1-00…       5       5 ORL  
#> 10         1 op      715.  929. 33157d9f-4… f1      F1-00…       5       5 ORL  
#> # … with 2,622 more rows, 51 more variables: attribute <chr>,
#> #   start_item_seq_idx <int>, end_item_seq_idx <int>, type <chr>,
#> #   sample_start <int>, sample_end <int>, sample_rate <int>, times_orig <dbl>,
#> #   times_rel <dbl>, times_norm <dbl>, eggF0 <dbl>, H1H2c <dbl>, H1A1c <dbl>,
#> #   H1A3c <dbl>, CPP <dbl>, CQ_PH <dbl>, CQ_PD <dbl>, F0 <dbl>, uppF0 <dbl>,
#> #   lowF0 <dbl>, zF0 <dbl>, normF0 <dbl>, zH1H2c <dbl>, normH1H2c <dbl>,
#> #   F1 <dbl>, uppF1 <dbl>, lowF1 <dbl>, zF1 <dbl>, normF1 <dbl>, F2 <dbl>, …
dplyr::glimpse(y)
#> Rows: 2,632
#> Columns: 61
#> $ sl_rowIdx          <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
#> $ labels             <chr> "op", "op", "op", "op", "op", "op", "op", "op", "op…
#> $ start              <dbl> 714.5465, 714.5465, 714.5465, 714.5465, 714.5465, 7…
#> $ end                <dbl> 928.9229, 928.9229, 928.9229, 928.9229, 928.9229, 9…
#> $ db_uuid            <chr> "33157d9f-4a3a-468a-882b-60d3b10ea771", "33157d9f-4…
#> $ session            <chr> "f1", "f1", "f1", "f1", "f1", "f1", "f1", "f1", "f1…
#> $ bundle             <chr> "F1-0002-car-rep1-Naam-37", "F1-0002-car-rep1-Naam-…
#> $ start_item_id      <int> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, …
#> $ end_item_id        <int> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, …
#> $ level              <chr> "ORL", "ORL", "ORL", "ORL", "ORL", "ORL", "ORL", "O…
#> $ attribute          <chr> "ORL", "ORL", "ORL", "ORL", "ORL", "ORL", "ORL", "O…
#> $ start_item_seq_idx <int> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, …
#> $ end_item_seq_idx   <int> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, …
#> $ type               <chr> "SEGMENT", "SEGMENT", "SEGMENT", "SEGMENT", "SEGMEN…
#> $ sample_start       <int> 31512, 31512, 31512, 31512, 31512, 31512, 31512, 31…
#> $ sample_end         <int> 40965, 40965, 40965, 40965, 40965, 40965, 40965, 40…
#> $ sample_rate        <int> 44100, 44100, 44100, 44100, 44100, 44100, 44100, 44…
#> $ times_orig         <dbl> 715, 716, 717, 718, 719, 720, 721, 722, 723, 724, 7…
#> $ times_rel          <dbl> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 1…
#> $ times_norm         <dbl> 0.000000000, 0.004694836, 0.009389671, 0.014084507,…
#> $ eggF0              <dbl> 215.0127, 215.0127, 215.0127, 218.6272, 218.6272, 2…
#> $ H1H2c              <dbl> 15.851, 16.093, 16.287, 16.513, 16.633, 16.495, 16.…
#> $ H1A1c              <dbl> 19.376, 19.110, 18.868, 18.655, 18.433, 18.122, 17.…
#> $ H1A3c              <dbl> 22.017, 21.680, 21.623, 21.852, 21.368, 20.545, 20.…
#> $ CPP                <dbl> 19.499, 20.861, 24.094, 23.843, 25.368, 24.960, 25.…
#> $ CQ_PH              <dbl> 0.7078310, 0.7078310, 0.7078310, 0.7105345, 0.71053…
#> $ CQ_PD              <dbl> 0.7461344, 0.7461344, 0.7461344, 0.7482461, 0.74824…
#> $ F0                 <dbl> 216.523, 216.423, 216.719, 217.147, 217.576, 218.00…
#> $ uppF0              <dbl> 294.5784, 294.5784, 294.5784, 294.5784, 294.5784, 2…
#> $ lowF0              <dbl> 123.5667, 123.5667, 123.5667, 123.5667, 123.5667, 1…
#> $ zF0                <dbl> 0.2614007, 0.2578924, 0.2682773, 0.2832942, 0.29834…
#> $ normF0             <dbl> 177.3753, 177.2133, 177.6928, 178.3861, 179.0810, 1…
#> $ zH1H2c             <dbl> 1.738322, 1.796663, 1.843433, 1.897917, 1.926846, 1…
#> $ normH1H2c          <dbl> 13.95344, 14.31125, 14.59808, 14.93223, 15.10965, 1…
#> $ F1                 <dbl> 788.254, 788.753, 789.251, 789.750, 790.601, 792.51…
#> $ uppF1              <dbl> 1298.965, 1298.965, 1298.965, 1298.965, 1298.965, 1…
#> $ lowF1              <dbl> 351.1913, 351.1913, 351.1913, 351.1913, 351.1913, 3…
#> $ zF1                <dbl> -0.233120763, -0.229962020, -0.226809459, -0.223650…
#> $ normF1             <dbl> 745.3951, 745.9944, 746.5925, 747.1918, 748.2139, 7…
#> $ F2                 <dbl> 1678.267, 1670.355, 1662.443, 1654.532, 1648.756, 1…
#> $ uppF2              <dbl> 2479.633, 2479.633, 2479.633, 2479.633, 2479.633, 2…
#> $ lowF2              <dbl> 683.3898, 683.3898, 683.3898, 683.3898, 683.3898, 6…
#> $ zF2                <dbl> 0.3231926, 0.2967641, 0.2703357, 0.2439105, 0.22461…
#> $ normF2             <dbl> 1515.805, 1506.660, 1497.515, 1488.371, 1481.694, 1…
#> $ F3                 <dbl> 3585.059, 3585.030, 3585.001, 3584.972, 3582.238, 3…
#> $ uppF3              <dbl> 3842.108, 3842.108, 3842.108, 3842.108, 3842.108, 3…
#> $ lowF3              <dbl> 2429.872, 2429.872, 2429.872, 2429.872, 2429.872, 2…
#> $ zF3                <dbl> 1.9843558, 1.9842259, 1.9840961, 1.9839662, 1.97174…
#> $ normF3             <dbl> 3868.009, 3867.929, 3867.848, 3867.767, 3860.190, 3…
#> $ zH1A1c             <dbl> 0.5591552574, 0.4962194196, 0.4389616163, 0.3885655…
#> $ normH1A1c          <dbl> 16.50144, 16.07416, 15.68542, 15.34326, 14.98665, 1…
#> $ zH1A3c             <dbl> 0.68503828, 0.63057184, 0.62135923, 0.65837058, 0.5…
#> $ normH1A3c          <dbl> 18.53489, 17.99424, 17.90279, 18.27017, 17.49369, 1…
#> $ uppeggF0           <dbl> 348.6988, 348.6988, 348.6988, 348.6988, 348.6988, 3…
#> $ loweggF0           <dbl> 43.39191, 43.39191, 43.39191, 43.39191, 43.39191, 4…
#> $ zCPP               <dbl> -0.228769020, 0.007061036, 0.566854573, 0.523394048…
#> $ normCPP            <dbl> 20.37302, 21.71946, 24.91550, 24.66737, 26.17494, 2…
#> $ zCQ_PH             <dbl> 0.4689086, 0.4689086, 0.4689086, 0.4873716, 0.48737…
#> $ normCQ_PH          <dbl> 0.6190917, 0.6190917, 0.6190917, 0.6217139, 0.62171…
#> $ zCQ_PD             <dbl> 0.5354962, 0.5354962, 0.5354962, 0.5508538, 0.55085…
#> $ normCQ_PD          <dbl> 0.6558349, 0.6558349, 0.6558349, 0.6579906, 0.65799…

Notice that the praatF0, praatF1 columns etc. have been renamed to F0, F1. Notice also that for each SSFF track has a corresponding column with z-score normalized values (e.g. zF1) and a corresponding column where these normalized values have been rescaled based on the overall mean and standard deviation of the data (e.g. normF1).

import_ssfftracks() is very dependent on emuR and EMU-SDMS, but it incorporates several independent functions which can in principle be used on raw data generated with other software: f0_proc() for processing F0 and dependencies, fn_proc() for processing formants and dependencies, outlier_rm for automated removal of outliers, and normz for z-score normalizing and rescaling by speaker. The syntax of these functions is similar to import_ssfftracks().

Adding PraatSauce output to EMU database

If the output of PraatSauce is loaded into R, it will look roughly like this:

dplyr::glimpse(ps)
#> Rows: 3,546
#> Columns: 14
#> $ Filename  <chr> "F1-0002-car-rep1-Naam-37", "F1-0002-car-rep1-Naam-37", "F1-…
#> $ session   <chr> "f1", "f1", "f1", "f1", "f1", "f1", "f1", "f1", "f1", "f1", …
#> $ seg_Start <dbl> 0.5919388, 0.5919388, 0.5919388, 0.5919388, 0.5919388, 0.591…
#> $ seg_End   <dbl> 0.7145465, 0.7145465, 0.7145465, 0.7145465, 0.7145465, 0.714…
#> $ t         <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 1…
#> $ t_ms      <dbl> 0.5929388, 0.5939388, 0.5949388, 0.5959388, 0.5969388, 0.597…
#> $ f0        <dbl> 184.974, 184.847, 184.720, 184.593, 184.466, 184.444, 184.45…
#> $ F1        <dbl> 658.327, 650.616, 640.280, 622.049, 603.819, 585.588, 567.35…
#> $ F2        <dbl> 1127.051, 1114.348, 1103.061, 1096.036, 1089.011, 1081.986, …
#> $ F3        <dbl> 3382.894, 3396.817, 3392.458, 3333.087, 3273.716, 3214.345, …
#> $ CPP       <dbl> 18.038, 19.296, 19.707, 19.974, 20.051, 21.084, 20.277, 19.0…
#> $ H1H2c     <dbl> 8.541, 8.694, 8.880, 9.044, 9.029, 8.866, 8.593, 8.272, 7.90…
#> $ H1A1c     <dbl> 16.492, 16.159, 15.834, 15.492, 15.206, 15.038, 14.897, 14.8…
#> $ H1A3c     <dbl> 7.896, 7.166, 5.750, 3.910, 2.540, 1.576, 4.947, 5.527, 8.22…

In order to add this to an existing EMU database with no SSFF tracks, you can use the praatsauce2ssff() like so:

datapath_ps <- system.file('extdata/ps', package='emuhelpeR')
ps_db <- emuR::load_emuDB(datapath)
praatsauce2ssff(ps_output=ps, db_handle=ps_db, session_col='session')

Note that the session_col argument is only necessary if there are multiple sessions in the database.

Subsequently, you can have a look at the SSFF tracks, such as the F0 track, in EMU by running e.g. the following:

sco <- emuR::get_signalCanvasesOrder(ps_db, 'default')
emuR::set_signalCanvasesOrder(ps_db, 'default', c(sco, 'f0'))
emuR::serve(ps_db)

Installation

You can install the development version of emuhelpeR from GitHub with:

#install.packages("devtools")
devtools::install_github("rpuggaardrode/emuhelpeR")

References

Kirby, James, Marc Brunelle & Pittayawat Pittayaporn (2023) Transphonologization of onset voicing: Revisiting Northern and Eastern Kmhmu. Phonetica. DOI: 10.1515/phon-2022-0029.

About

Convenience functions for working with emuR

Topics

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages