New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ports/esp32: Add PDM RX feature to I2S. #14176
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #14176 +/- ##
=======================================
Coverage 98.39% 98.39%
=======================================
Files 161 161
Lines 21200 21200
=======================================
Hits 20860 20860
Misses 340 340 ☔ View full report in Codecov by Sentry. |
Code size report:
|
Signed-off-by: maple "mavica" syrup <maple@maple.pet>
c4bd1a5
to
ad6e376
Compare
Signed-off-by: maple "mavica" syrup <maple@maple.pet>
Signed-off-by: maple "mavica" syrup <maple@maple.pet>
ad6e376
to
e2d9925
Compare
This is some great work to get PDM microphones working on the ESP32. There are a number of products that will benefit from PDM protocol support, for example, the T-Watch 2020 V3. One suggestion is to revisit the MicroPython API for supporting the PDM protocol. At the moment, this PR has the PDM protocol being added to the machine.I2S class. But, I2S and PDM are very different audio protocols. Adding the PDM protocol to the existing I2S class maybe not the best from an architecture and user comprehension viewpoint. To me, it would be better to support the PDM protocol with a separate machine.PDM class. That would allow a PDM interface in Micropython to exactly match the PDM protocol. This would involve adding an CircuitPython takes this suggested approach. |
after reading the extmod/machine_i2s file and everywhere it's called, i fear making a whole new extmod might be a bit of a daunting task for me. i don't wholly disagree that PDM could have a better calling convention than i implemented, but if a PDM class is going to call on machine_i2s methods anyway, would that be the best course of action code-wise? and if deemed really necessary, are there pointers on how i would even begin with that? i'm not familiar with the code structure, i was able to make my PR because the code changes weren't too massive. i'd really like to see PDM support as i have a device i currently can't fully use without hacking up micropython, and i'd like to do it in a way that i don't have to distribute my own firmware to others who want to run my micropython code |
This feature looks great @Lana-chan ! |
Yes, it should automatically be included in any device ESP-IDF defines The I2S mic recording examples work with only the modification of the mode to |
I got around to testing this now. Using a M5Stack AtomS3U. Has SPM1432 microphone, should be same as the M5 Cardputer. However, with the standard parameters, my voice was definitely higher pitch than normally. Checking with a 440 Hz sinewave it was indeed showing up as 880 Hz in the recorded .wav file. To get normally pitched sounds, I had to pass rate as 2x the specified samplerate. That does not seem entirely correct? Perhaps the PDM clock rate setup is off by a factor 2? audio_in = I2S(
...
mode=I2S.PDM_RX,
rate=SAMPLE_RATE_IN_HZ*2,
...
) |
I am also observing some noise or discontinuities. They seem to be related to the size of buffers being read, However this might be due to the internal flash writing, or something else that is not PDM specific. The non-blocking seems a bit worse than the blocking. Unfortunately, I do not have any I2S audio inputs to compare with. |
yes i noticed this too, and i'm not entirely sure my understanding of PDM is correct but i think this has to do with the downsampling necessary for the conversion to PCM: https://github.com/micropython/micropython/pull/14176/files#diff-e507a794c1ae2e2e69bf262f93220fd9d17d2f8f5245e778d53dc941a06011e9R433 with ESP-IDF by itself others have noticed that you have to use 2x the desired sample rate, so this is understood to me as a fact of the firmware itself: espressif/esp-idf#8660 (comment) at any rate my implementation simply calls on ESP-IDF's implementation. personally i don't think it should be up to MicroPython to automatically double the rate behind the scenes
i can confirm this as well having tried to use it non-blocking, however i don't think there's much that can be done if the program cycle simply cannot read the buffers fast enough for continuous sound capture. i'm not sure a non-PDM input device would function any differently but i also don't have others on hand to test |
I believe this variable
In the linked issue, they say that this issue has been fixed in a commit in 4.4.x series. This fix should also be in esp-idf 5.x, which MicroPython is currently using? |
i thought the issue mentioned being fixed was the amplitude issue, the frequency was an unrelated comment to the issue... this is also complicated in that I2S was completely overhauled and incompatible between 4.4 and 5.x, so i can't say for certain the bug isn't present in 5.x. this is the relevant commit: espressif/esp-idf@53a5d51 if you look at the latest 5.x branch, the HAL files for I2S are completely different in this regard a stereo/mono disparity in recording would explain an exact 2x sample rate. the downsampling modes have multipliers different than 2x (enum names are "8" and "16", effect described in docs is 64 and 128 respectively), i don't believe they're what's causing the issue. the part you are mentioning in the documentation seems to indicate those are the multipliers for the sample rate to the WS clock (WS clock will be downsample factor times desired sample rate) and the downsampling brings it back to the sample rate. to micropython, this should be all invisible. |
Thank you for confirming this issue. I have ordered some I2S microphones now, hopefully will be able to test that next week. |
I now modified the machine_i2s.c to support an oversample parameter of 64/128. It makes no difference to the frequency doubling issue. So esp-idf correctly does the appropriate combination of clock configuration and downsampling. I think it would be a good idea to expose the PDM oversampling. The PDM clock controls the power modes of the microphone (which also influences acoustical performance). Especially, it is critical to avoid a clock that is out-of-spec or borderline with the power states. EDIT: for reference, here is the code I added for PDM oversample parameter, jonnor@4c1f034 |
Regarding the intermittent noises or apparent discontinuities. I did a new test now where I recorded into RAM, and then wrote the WAV file afterward. This gave a very clean signal, no apparent periodic noises. # Based on https://github.com/miketeachman/micropython-i2s-examples/blob/master/examples/record_mic_to_sdcard_blocking.py
# The MIT License (MIT)
# Copyright (c) 2022 Mike Teachman
# https://opensource.org/licenses/MIT
import os
from machine import Pin
from machine import I2S
SCK_PIN = 39
#WS_PIN = 25
SD_PIN = 38
I2S_ID = 0
BUFFER_LENGTH_IN_BYTES = 40000
# ======= AUDIO CONFIGURATION =======
WAV_FILE = "mic.wav"
RECORD_TIME_IN_SECONDS = 4
WAV_SAMPLE_SIZE_IN_BITS = 16
FORMAT = I2S.MONO
SAMPLE_RATE_IN_HZ = 16000
# ======= AUDIO CONFIGURATION =======
format_to_channels = {I2S.MONO: 1, I2S.STEREO: 2}
NUM_CHANNELS = format_to_channels[FORMAT]
WAV_SAMPLE_SIZE_IN_BYTES = WAV_SAMPLE_SIZE_IN_BITS // 8
RECORDING_SIZE_IN_BYTES = (
RECORD_TIME_IN_SECONDS * SAMPLE_RATE_IN_HZ * WAV_SAMPLE_SIZE_IN_BYTES * NUM_CHANNELS
)
def create_wav_header(sampleRate, bitsPerSample, num_channels, num_samples):
datasize = num_samples * num_channels * bitsPerSample // 8
o = bytes("RIFF", "ascii") # (4byte) Marks file as RIFF
o += (datasize + 36).to_bytes(
4, "little"
) # (4byte) File size in bytes excluding this and RIFF marker
o += bytes("WAVE", "ascii") # (4byte) File type
o += bytes("fmt ", "ascii") # (4byte) Format Chunk Marker
o += (16).to_bytes(4, "little") # (4byte) Length of above format data
o += (1).to_bytes(2, "little") # (2byte) Format type (1 - PCM)
o += (num_channels).to_bytes(2, "little") # (2byte)
o += (sampleRate).to_bytes(4, "little") # (4byte)
o += (sampleRate * num_channels * bitsPerSample // 8).to_bytes(4, "little") # (4byte)
o += (num_channels * bitsPerSample // 8).to_bytes(2, "little") # (2byte)
o += (bitsPerSample).to_bytes(2, "little") # (2byte)
o += bytes("data", "ascii") # (4byte) Data Chunk Marker
o += (datasize).to_bytes(4, "little") # (4byte) Data size in bytes
return o
audio_in = I2S(
I2S_ID,
sck=Pin(SCK_PIN),
#ws=Pin(WS_PIN),
sd=Pin(SD_PIN),
mode=I2S.PDM_RX,
bits=WAV_SAMPLE_SIZE_IN_BITS,
format=FORMAT,
rate=SAMPLE_RATE_IN_HZ*2,
ibuf=BUFFER_LENGTH_IN_BYTES,
)
# allocate sample arrays
# memoryview used to reduce heap allocation in while loop
mic_samples = bytearray(10000)
mic_samples_mv = memoryview(mic_samples)
recording_buffer = bytearray(RECORDING_SIZE_IN_BYTES)
bytes_received = 0
print("Recording size: {} bytes".format(RECORDING_SIZE_IN_BYTES))
print("========== START RECORDING ==========")
try:
while bytes_received < RECORDING_SIZE_IN_BYTES:
# read a block of samples from the I2S microphone
bytes_read = audio_in.readinto(mic_samples_mv)
if bytes_read > 0:
bytes_to_write = min(
bytes_read, RECORDING_SIZE_IN_BYTES - bytes_received
)
recording_buffer[bytes_received:bytes_received+bytes_to_write] = mic_samples_mv[0:bytes_to_write]
print('FILL', bytes_received, bytes_to_write)
bytes_received += bytes_read
print("========== DONE RECORDING ==========")
except (KeyboardInterrupt, Exception) as e:
print("caught exception {} {}".format(type(e).__name__, e))
# Write to WAV
wav = open(WAV_FILE, "wb")
# create header for WAV file and write to SD card
wav_header = create_wav_header(
SAMPLE_RATE_IN_HZ,
WAV_SAMPLE_SIZE_IN_BITS,
NUM_CHANNELS,
SAMPLE_RATE_IN_HZ * RECORD_TIME_IN_SECONDS,
)
wav.write(wav_header)
# write samples to WAV file
wav.write(recording_buffer)
# cleanup
wav.close()
print("Wrote ", WAV_FILE)
audio_in.deinit() |
I also did a test where non-blocking read was used, and then doing a 20 ms computation every 100 ms. The resulting audio was clean - great! Code: https://github.com/jonnor/embeddedml/blob/master/handson/micropython-esp32-pdm/record_nonblocking_memory.py So the noise and discontinuities does not seem to be a software issue (at least not a general one around buffering). Instead, it might be that writing to FLASH causes electromagnetic interference. Or that there are actually tiny sounds being created in the PCB near the microphone when the writing to FLASH (capacitors especially can exhibit slight expansion when exposed to a variable electric field - so-called microphonic effect). I have experienced both previously, on other boards and microcontrollers - it is not as far-fetched as it may sound. So from my point of view, this code is already at a very useful state, and I would love to see it in mainline MicroPython. Ideally. the oddity about double frequency would be resolved. I would be in favor of doing a *2 multiplication inside the machine_i2s for ESP32, so that it correct at least on the API surface. |
It's great to see the demonstration of the PDM protocol working on the ESP32 port of MicroPython. This protocol would be an excellent addition to MicroPython. At this time, the PR demonstrates a PDM protocol proof-of-concept, by making a quick-fix adaptation to the There is also the consideration of the other ports. Would the proposal to have separate machine classes add more work to the PR? Most definitely. The ESP32 I2S code will likely need to be refactored, to create a common base that is used by both I've found that getting a prototype to work is often only 10% of the overall effort. The other 90% involves getting into the software engineering to design a great interface and a solid implementation. MicroPython shows this principle. It can be a lot work, but it yields a better user experience and a code base that can be supported in the long term. |
regarding the doubled sample rate, i found out i was doing something wrong in setting the mono slot config on the ESP-IDF mask, MicroPython interestingly expects you to record stereo no matter which mode you actually want, and in mono it discards half the stream internally before exposing it to the Python side. however, despite it being supposedly a valid mode in the ESP-IDF docs, i can't seem to get a regarding exposing the PDM downsampling mode, and for that matter, Mike's comment above, this PR is simply to expose the ESP32-only (and for that matter, only in select ESP32 boards) PDM to PCM hardware. I am not implementing PDM to PCM. if that is not the proper implementation, then we will simply have hardware in the ESP32 board that cannot be used for MicroPython. i don't have the capacity to make the "proper" interface for MicroPython at this time and i hope that someone better than me can step up to finally implement PDM in MicroPython one day. |
@Lana-chan those are good findings. If you are able to debug a bit more and figure out a working approach, then I should be able to help out in refactoring into a PDM module. At least for ESP32, which I am currently using a lot. And I can do a verification of API design wrt STM32 DFSDM (which I have worked with before) and RP2 (where someone has made a merge request earlier, #8016). |
hello! i hope this PR meets the code conventions guidelines. i've tried running the code auto-formatter but it tried to modify some 50 unrelated files instead. please let me know what i can change if this PR is unacceptable.
this PR adds the capability to grab samples from PDM devices (such as certain MEMS microphones) to the ESP32 port using the new ESP-IDF PDM driver. i have tested this on an M5 Cardputer which has a SPM1423 microphone built-in. i have tested that other I2S functions are unaffected (the cardputer also has a NS4168 speaker i have been using). i have also confirmed that other ports which use machine.I2S (rp2, mimxrt, stm32) have compilation unaffected, however i do not have those boards to test physically.