Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audio buffer fix #47

Open
wants to merge 9 commits into
base: development
Choose a base branch
from
Open

Conversation

mainvolume
Copy link

Audio buffer and format fix.

@BrunoBerisso BrunoBerisso changed the base branch from master to development June 7, 2018 08:44
@BrunoBerisso BrunoBerisso changed the base branch from development to master June 7, 2018 08:44
@BrunoBerisso
Copy link
Contributor

Hey! @mainvolume Thank you very much for this PR :)

Could you please change the base? so your changes are merged in the development branch? I try to do it myself but it ends up adding some old commits not related to your changes.

Also, it will be great if you could explain a little why this is needed? It's kind of clear looking at the code but for those not so familiar with it a short explanation will be really helpful.

Could it be the case that these changes fix #44 ?

Thanks again!

@mainvolume
Copy link
Author

Surething!

@mainvolume mainvolume changed the base branch from master to development June 7, 2018 08:51
@mainvolume
Copy link
Author

regarding #44

Could be as the model sample-rate has to be the same device for actual decoding 😄 when streaming. Have not tested with bluetooth device, but guessing that the audio settings from the device becomes easier when not set to a static frequency and adaptable to inputbus sample rate of the device.

This way, it becomes as well possible to use to the same decoder functionality with macOS as well.

@mainvolume
Copy link
Author

Hi Bruno.

I also added a decode buffer function for already obtained buffers and other streams of audio with added start and end utterance convenience functions.

🙂


import Foundation
import AVFoundation
import Sphinx

public let bufferSize = 16384
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where this value come from? Maybe add a comment about it?

do {
try AVAudioSession.sharedInstance().setCategory(AVAudioSessionCategoryRecord)
try AVAudioSession.sharedInstance().setCategory(AVAudioSessionCategoryPlayAndRecord, with: [.mixWithOthers, .allowBluetoothA2DP])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should add these as parameters to startDecodingSpeech? 🤔 Maybe there isn't a fixed set of settings for the audio session that works for everybody...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to this:

public func startDecodingSpeech (_ audioSessionCategoryOptions:AVAudioSessionCategoryOptions = [.mixWithOthers, .allowBluetoothA2DP], utteranceComplete: @escaping (Hypothesis?) -> ()) throws {

    do {
        try AVAudioSession.sharedInstance().setCategory(AVAudioSessionCategoryPlayAndRecord, with: audioSessionCategoryOptions)
    } catch let error as NSError {
        print("Error setting the shared AVAudioSession: \(error)")
        throw DecodeErrors.CantSetAudioSession(error)
    }

@@ -248,7 +251,33 @@ public final class Decoder {
engine.stop()
engine = nil
}


public func startUtterence() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this and endUtterence shouldn't be public. Is my understanding that we needed public because you should call startUtterance() before startDecodingBuffer right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is accurate. Shall we make the endUtterence private you mean?

Updated audio session setting in decode speech  function call
self.start_utt()
}

public func startDecodingBuffer(buffer: AVAudioPCMBuffer!, time: AVAudioTime!, utteranceComplete: @escaping (Hypothesis?)-> ()) throws {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏🏻👏🏻👏🏻 nice!
These will be really useful. How are you testing this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, something was wrong with the tabs? jaja

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tabs... editing in github as the codebase in home and at work right now. 😂

Havent written any tests, but to bypass the microphone usage for the thinking machine implementation, a synthesized continuous buffer is passed to the function with which works quite sweet with.

The function is based on the streaming function but with the option of creating the buffer before passing it to the function, instead of using the tap in the function.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hold on, fixing the tabs.

Updated tabs
@mainvolume
Copy link
Author

There, it should be tabbed cleaner now.
🙂

Also, a sample project using TLSphinx (without buffer)
https://github.com/mainvolume/SpeechDetector

@mainvolume
Copy link
Author

🤔
regarding the endUtterence...
the reason that it's public is to be able to end the utterance when there buffer is completed, or similar.

If you wish, we can make it private, but then the utterance would be running when the buffer ends considering the start call after reading the utterance.

🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants