Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ports/esp32: Add PDM RX feature to I2S. #14176

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Lana-chan
Copy link

hello! i hope this PR meets the code conventions guidelines. i've tried running the code auto-formatter but it tried to modify some 50 unrelated files instead. please let me know what i can change if this PR is unacceptable.

this PR adds the capability to grab samples from PDM devices (such as certain MEMS microphones) to the ESP32 port using the new ESP-IDF PDM driver. i have tested this on an M5 Cardputer which has a SPM1423 microphone built-in. i have tested that other I2S functions are unaffected (the cardputer also has a NS4168 speaker i have been using). i have also confirmed that other ports which use machine.I2S (rp2, mimxrt, stm32) have compilation unaffected, however i do not have those boards to test physically.

Copy link

codecov bot commented Mar 25, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.39%. Comparing base (51d05c4) to head (e2d9925).
Report is 46 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #14176   +/-   ##
=======================================
  Coverage   98.39%   98.39%           
=======================================
  Files         161      161           
  Lines       21200    21200           
=======================================
  Hits        20860    20860           
  Misses        340      340           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

Code size report:

   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 
   unix x64:    +0 +0.000% standard
      stm32:    +0 +0.000% PYBV10
     mimxrt:    +0 +0.000% TEENSY40
        rp2:    +0 +0.000% RPI_PICO
       samd:    +0 +0.000% ADAFRUIT_ITSYBITSY_M4_EXPRESS

Signed-off-by: maple "mavica" syrup <maple@maple.pet>
Signed-off-by: maple "mavica" syrup <maple@maple.pet>
Signed-off-by: maple "mavica" syrup <maple@maple.pet>
@miketeachman
Copy link
Sponsor Contributor

miketeachman commented Mar 26, 2024

This is some great work to get PDM microphones working on the ESP32. There are a number of products that will benefit from PDM protocol support, for example, the T-Watch 2020 V3.

One suggestion is to revisit the MicroPython API for supporting the PDM protocol. At the moment, this PR has the PDM protocol being added to the machine.I2S class. But, I2S and PDM are very different audio protocols. Adding the PDM protocol to the existing I2S class maybe not the best from an architecture and user comprehension viewpoint. To me, it would be better to support the PDM protocol with a separate machine.PDM class. That would allow a PDM interface in Micropython to exactly match the PDM protocol. This would involve adding an extmod/machine_pdm.c implementation to define the machine.PDM class API. You can still use and modify the functions in machine_i2s.c. Some refactoring might be needed and I can imagine a header file or two is needed. This is extra work, but it will improve the overall experience in Micropython wrt to these two protocols.

CircuitPython takes this suggested approach. audiobusio.PDMIn() for PDM microphone support and audiobusio.I2SOut() for I2S amplifier support.

@Lana-chan
Copy link
Author

after reading the extmod/machine_i2s file and everywhere it's called, i fear making a whole new extmod might be a bit of a daunting task for me. i don't wholly disagree that PDM could have a better calling convention than i implemented, but if a PDM class is going to call on machine_i2s methods anyway, would that be the best course of action code-wise?

and if deemed really necessary, are there pointers on how i would even begin with that? i'm not familiar with the code structure, i was able to make my PR because the code changes weren't too massive.

i'd really like to see PDM support as i have a device i currently can't fully use without hacking up micropython, and i'd like to do it in a way that i don't have to distribute my own firmware to others who want to run my micropython code

@jonnor
Copy link
Contributor

jonnor commented Mar 27, 2024

This feature looks great @Lana-chan !
I am very interested to test it on one of my ESP32 devices which have PDM microphones. I have at least a T-Watch 2020 V3, M5Stack AtomS3U and TTGO T-Camera Mic.
Is the feature enabled by default (on this branch), if I build a generic ESP32 (or ESP32 S3) image for these boards? Or do I need to do anything in particular to enable it?

@Lana-chan
Copy link
Author

Yes, it should automatically be included in any device ESP-IDF defines SOC_I2S_SUPPORTS_PDM_RX for. S3 is one of them.

The I2S mic recording examples work with only the modification of the mode to PDM_RX and setting the correct pins.

@jonnor
Copy link
Contributor

jonnor commented Apr 28, 2024

I got around to testing this now. Using a M5Stack AtomS3U. Has SPM1432 microphone, should be same as the M5 Cardputer.
I used the nonblocking record example as a starting point. From https://github.com/miketeachman/micropython-i2s-examples/blob/master/examples/record_mic_to_sdcard_blocking.py
And updated the pinouts for this device (note: official M5Stack docs have the pinout wrapped, schematic is correct). But as there is no SD card, so I recorded to the internal FLASH.

However, with the standard parameters, my voice was definitely higher pitch than normally. Checking with a 440 Hz sinewave it was indeed showing up as 880 Hz in the recorded .wav file. To get normally pitched sounds, I had to pass rate as 2x the specified samplerate. That does not seem entirely correct? Perhaps the PDM clock rate setup is off by a factor 2?

audio_in = I2S(
 ...
    mode=I2S.PDM_RX,
    rate=SAMPLE_RATE_IN_HZ*2,
...
)

@jonnor
Copy link
Contributor

jonnor commented Apr 28, 2024

I am also observing some noise or discontinuities. They seem to be related to the size of buffers being read, However this might be due to the internal flash writing, or something else that is not PDM specific. The non-blocking seems a bit worse than the blocking. Unfortunately, I do not have any I2S audio inputs to compare with.

@Lana-chan
Copy link
Author

Lana-chan commented Apr 28, 2024

However, with the standard parameters, my voice was definitely higher pitch than normally. Checking with a 440 Hz sinewave it was indeed showing up as 880 Hz in the recorded .wav file. To get normally pitched sounds, I had to pass rate as 2x the specified samplerate. That does not seem entirely correct? Perhaps the PDM clock rate setup is off by a factor 2?

yes i noticed this too, and i'm not entirely sure my understanding of PDM is correct but i think this has to do with the downsampling necessary for the conversion to PCM: https://github.com/micropython/micropython/pull/14176/files#diff-e507a794c1ae2e2e69bf262f93220fd9d17d2f8f5245e778d53dc941a06011e9R433

with ESP-IDF by itself others have noticed that you have to use 2x the desired sample rate, so this is understood to me as a fact of the firmware itself: espressif/esp-idf#8660 (comment)

at any rate my implementation simply calls on ESP-IDF's implementation. personally i don't think it should be up to MicroPython to automatically double the rate behind the scenes

I am also observing some noise or discontinuities. They seem to be related to the size of buffers being read, However this might be due to the internal flash writing, or something else that is not PDM specific. The non-blocking seems a bit worse than the blocking.

i can confirm this as well having tried to use it non-blocking, however i don't think there's much that can be done if the program cycle simply cannot read the buffers fast enough for continuous sound capture. i'm not sure a non-PDM input device would function any differently but i also don't have others on hand to test

@jonnor
Copy link
Contributor

jonnor commented May 1, 2024

yes i noticed this too, and i'm not entirely sure my understanding of PDM is correct but i think this has to do with the downsampling necessary for the conversion to PCM: https://github.com/micropython/micropython/pull/14176/files#diff-e507a794c1ae2e2e69bf262f93220fd9d17d2f8f5245e778d53dc941a06011e9R433

I believe this variable pdm_cfg.clk_cfg.dn_sample_mode is highly relevant to the frequency-doubling issue. This sets the PDM oversampling factor (which determines the PDM clock given to the mic, with pdm_clock=pcm_samplerate*oversampling ). According to the esp-idf I2S documentation, can take two values, for either 64x or 128x oversampling. These are the two most common values (by far) for PDM microphones, and both are commonly used.

with ESP-IDF by itself others have noticed that you have to use 2x the desired sample rate, so this is understood to me as a fact of the firmware itself: espressif/esp-idf#8660 (comment)

In the linked issue, they say that this issue has been fixed in a commit in 4.4.x series. This fix should also be in esp-idf 5.x, which MicroPython is currently using?

@Lana-chan
Copy link
Author

Lana-chan commented May 1, 2024

i thought the issue mentioned being fixed was the amplitude issue, the frequency was an unrelated comment to the issue... this is also complicated in that I2S was completely overhauled and incompatible between 4.4 and 5.x, so i can't say for certain the bug isn't present in 5.x. this is the relevant commit: espressif/esp-idf@53a5d51

if you look at the latest 5.x branch, the HAL files for I2S are completely different in this regard

a stereo/mono disparity in recording would explain an exact 2x sample rate. the downsampling modes have multipliers different than 2x (enum names are "8" and "16", effect described in docs is 64 and 128 respectively), i don't believe they're what's causing the issue. the part you are mentioning in the documentation seems to indicate those are the multipliers for the sample rate to the WS clock (WS clock will be downsample factor times desired sample rate) and the downsampling brings it back to the sample rate. to micropython, this should be all invisible.

@jonnor
Copy link
Contributor

jonnor commented May 1, 2024

I am also observing some noise or discontinuities. They seem to be related to the size of buffers being read, However this might be due to the internal flash writing, or something else that is not PDM specific. The non-blocking seems a bit worse than the blocking.

i can confirm this as well having tried to use it non-blocking, however i don't think there's much that can be done if the program cycle simply cannot read the buffers fast enough for continuous sound capture. i'm not sure a non-PDM input device would function any differently but i also don't have others on hand to test

Thank you for confirming this issue. I have ordered some I2S microphones now, hopefully will be able to test that next week.
The application is able to process the buffers fast enough (on average) - to my knowledge. I do not see any indications the buffers overflowed (though I am not entirely sure how to check)? But maybe @miketeachman or others with more experience can give some input on how to debug.

@jonnor
Copy link
Contributor

jonnor commented May 1, 2024

I now modified the machine_i2s.c to support an oversample parameter of 64/128. It makes no difference to the frequency doubling issue. So esp-idf correctly does the appropriate combination of clock configuration and downsampling.
Furthermore, I note that when I have the correct samplerate configured (that is without doing *2), I only get half as many samples as I should... Which might be a hint to what is actually going wrong here?

I think it would be a good idea to expose the PDM oversampling. The PDM clock controls the power modes of the microphone (which also influences acoustical performance). Especially, it is critical to avoid a clock that is out-of-spec or borderline with the power states.
For example 48 kHz * 128x = 6.144 Mhz is out-of-spec, so should use 64x instead.
The SPM1423 has a transition into sleep at 1 Mhz. So 16 kHz * 64x = 1024 khz oversampling should not be used (instead 128x oversampling) to avoid risk of shutting down in case of slight clock drifts. I have spent months debugging such issues...
At 11.025 kHz * 64x that mic won't turn on, must use 128x. On another mic which support a low power state, one would prefer to use 64x for lower power consumption.

EDIT: for reference, here is the code I added for PDM oversample parameter, jonnor@4c1f034

@jonnor
Copy link
Contributor

jonnor commented May 1, 2024

Regarding the intermittent noises or apparent discontinuities. I did a new test now where I recorded into RAM, and then wrote the WAV file afterward. This gave a very clean signal, no apparent periodic noises.

# Based on https://github.com/miketeachman/micropython-i2s-examples/blob/master/examples/record_mic_to_sdcard_blocking.py
# The MIT License (MIT)
# Copyright (c) 2022 Mike Teachman
# https://opensource.org/licenses/MIT

import os
from machine import Pin
from machine import I2S

SCK_PIN = 39
#WS_PIN = 25
SD_PIN = 38
I2S_ID = 0
BUFFER_LENGTH_IN_BYTES = 40000

# ======= AUDIO CONFIGURATION =======
WAV_FILE = "mic.wav"
RECORD_TIME_IN_SECONDS = 4
WAV_SAMPLE_SIZE_IN_BITS = 16
FORMAT = I2S.MONO
SAMPLE_RATE_IN_HZ = 16000
# ======= AUDIO CONFIGURATION =======

format_to_channels = {I2S.MONO: 1, I2S.STEREO: 2}
NUM_CHANNELS = format_to_channels[FORMAT]
WAV_SAMPLE_SIZE_IN_BYTES = WAV_SAMPLE_SIZE_IN_BITS // 8
RECORDING_SIZE_IN_BYTES = (
    RECORD_TIME_IN_SECONDS * SAMPLE_RATE_IN_HZ * WAV_SAMPLE_SIZE_IN_BYTES * NUM_CHANNELS
)


def create_wav_header(sampleRate, bitsPerSample, num_channels, num_samples):
    datasize = num_samples * num_channels * bitsPerSample // 8
    o = bytes("RIFF", "ascii")  # (4byte) Marks file as RIFF
    o += (datasize + 36).to_bytes(
        4, "little"
    )  # (4byte) File size in bytes excluding this and RIFF marker
    o += bytes("WAVE", "ascii")  # (4byte) File type
    o += bytes("fmt ", "ascii")  # (4byte) Format Chunk Marker
    o += (16).to_bytes(4, "little")  # (4byte) Length of above format data
    o += (1).to_bytes(2, "little")  # (2byte) Format type (1 - PCM)
    o += (num_channels).to_bytes(2, "little")  # (2byte)
    o += (sampleRate).to_bytes(4, "little")  # (4byte)
    o += (sampleRate * num_channels * bitsPerSample // 8).to_bytes(4, "little")  # (4byte)
    o += (num_channels * bitsPerSample // 8).to_bytes(2, "little")  # (2byte)
    o += (bitsPerSample).to_bytes(2, "little")  # (2byte)
    o += bytes("data", "ascii")  # (4byte) Data Chunk Marker
    o += (datasize).to_bytes(4, "little")  # (4byte) Data size in bytes
    return o


audio_in = I2S(
    I2S_ID,
    sck=Pin(SCK_PIN),
    #ws=Pin(WS_PIN),
    sd=Pin(SD_PIN),
    mode=I2S.PDM_RX,
    bits=WAV_SAMPLE_SIZE_IN_BITS,
    format=FORMAT,
    rate=SAMPLE_RATE_IN_HZ*2,
    ibuf=BUFFER_LENGTH_IN_BYTES,
)

# allocate sample arrays
# memoryview used to reduce heap allocation in while loop
mic_samples = bytearray(10000)
mic_samples_mv = memoryview(mic_samples)


recording_buffer = bytearray(RECORDING_SIZE_IN_BYTES)
bytes_received = 0

print("Recording size: {} bytes".format(RECORDING_SIZE_IN_BYTES))
print("==========  START RECORDING ==========")
try:
    while bytes_received < RECORDING_SIZE_IN_BYTES:
        # read a block of samples from the I2S microphone
        bytes_read = audio_in.readinto(mic_samples_mv)
        if bytes_read > 0:
            bytes_to_write = min(
                bytes_read, RECORDING_SIZE_IN_BYTES - bytes_received
            )
            recording_buffer[bytes_received:bytes_received+bytes_to_write] = mic_samples_mv[0:bytes_to_write]
            print('FILL', bytes_received, bytes_to_write)
            bytes_received += bytes_read

    print("==========  DONE RECORDING ==========")
except (KeyboardInterrupt, Exception) as e:
    print("caught exception {} {}".format(type(e).__name__, e))


# Write to WAV
wav = open(WAV_FILE, "wb")

# create header for WAV file and write to SD card
wav_header = create_wav_header(
    SAMPLE_RATE_IN_HZ,
    WAV_SAMPLE_SIZE_IN_BITS,
    NUM_CHANNELS,
    SAMPLE_RATE_IN_HZ * RECORD_TIME_IN_SECONDS,
)
wav.write(wav_header)

# write samples to WAV file
wav.write(recording_buffer)

# cleanup
wav.close()
print("Wrote ", WAV_FILE)
audio_in.deinit()

@jonnor
Copy link
Contributor

jonnor commented May 1, 2024

I also did a test where non-blocking read was used, and then doing a 20 ms computation every 100 ms. The resulting audio was clean - great! Code: https://github.com/jonnor/embeddedml/blob/master/handson/micropython-esp32-pdm/record_nonblocking_memory.py

So the noise and discontinuities does not seem to be a software issue (at least not a general one around buffering). Instead, it might be that writing to FLASH causes electromagnetic interference. Or that there are actually tiny sounds being created in the PCB near the microphone when the writing to FLASH (capacitors especially can exhibit slight expansion when exposed to a variable electric field - so-called microphonic effect). I have experienced both previously, on other boards and microcontrollers - it is not as far-fetched as it may sound.

So from my point of view, this code is already at a very useful state, and I would love to see it in mainline MicroPython. Ideally. the oddity about double frequency would be resolved. I would be in favor of doing a *2 multiplication inside the machine_i2s for ESP32, so that it correct at least on the API surface.

@miketeachman
Copy link
Sponsor Contributor

miketeachman commented May 1, 2024

It's great to see the demonstration of the PDM protocol working on the ESP32 port of MicroPython. This protocol would be an excellent addition to MicroPython.

At this time, the PR demonstrates a PDM protocol proof-of-concept, by making a quick-fix adaptation to the machine.I2S class. This is an excellent first step to showing that it works. However, I strongly believe that more effort is needed before the PR is accepted into the mainline. To support the PDM protocol I recommend adding a separate machine.PDM class to MicroPython. This approach will give MicroPython users two easy-to-use and consistent APIs,machine.I2S and machine.PDM.

There is also the consideration of the other ports. machine.I2S is supported on 3 other ports: stm32, rp2, and mimxrt. Does the API proposed in this PR make sense for the PDM implementations offered by these ports?

Would the proposal to have separate machine classes add more work to the PR? Most definitely. The ESP32 I2S code will likely need to be refactored, to create a common base that is used by both machine.I2S and machine.PDM. This will involve a deep dive into the code to understand how it works and how to cleanly make a common base that supports both I2S and PDM.

I've found that getting a prototype to work is often only 10% of the overall effort. The other 90% involves getting into the software engineering to design a great interface and a solid implementation. MicroPython shows this principle. It can be a lot work, but it yields a better user experience and a code base that can be supported in the long term.

@Lana-chan
Copy link
Author

regarding the doubled sample rate, i found out i was doing something wrong in setting the mono slot config on the ESP-IDF mask, MicroPython interestingly expects you to record stereo no matter which mode you actually want, and in mono it discards half the stream internally before exposing it to the Python side. however, despite it being supposedly a valid mode in the ESP-IDF docs, i can't seem to get a I2S_SLOT_MODE_STEREO slot config to work, it simply triggers a ESP_ERR_INVALID_ARG. i'll try to troubleshoot this further on my own fork at another time, doubling the sample rate in MicroPython is not the correct approach.

regarding exposing the PDM downsampling mode, and for that matter, Mike's comment above, this PR is simply to expose the ESP32-only (and for that matter, only in select ESP32 boards) PDM to PCM hardware. I am not implementing PDM to PCM. if that is not the proper implementation, then we will simply have hardware in the ESP32 board that cannot be used for MicroPython. i don't have the capacity to make the "proper" interface for MicroPython at this time and i hope that someone better than me can step up to finally implement PDM in MicroPython one day.

@jonnor
Copy link
Contributor

jonnor commented May 6, 2024

@Lana-chan those are good findings. If you are able to debug a bit more and figure out a working approach, then I should be able to help out in refactoring into a PDM module. At least for ESP32, which I am currently using a lot. And I can do a verification of API design wrt STM32 DFSDM (which I have worked with before) and RP2 (where someone has made a merge request earlier, #8016).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants