Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audio buffer fix #47

Open
wants to merge 9 commits into
base: development
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
8 changes: 8 additions & 0 deletions README.md
Expand Up @@ -147,6 +147,12 @@ Then run `carthage update`, this should fetch and build the last version of `TLS

Download the project from this repository and drag the _TLSpinx_ project to your XCode project. If you encounter any errors about missing headers and/or libraries for _Sphinx_ please add the `Spinx/include` directory to your header search path and `Sphinx/lib` to the library search path and mark it as `recursive`.

## Community

[![Slack][slack-image]][slack-url]

Join us on Slack!

## Author

BrunoBerisso, bruno@tryolabs.com
Expand All @@ -162,3 +168,5 @@ TLSphinx is available under the MIT license. See the LICENSE file for more info.
[Sphinx docs]: http://cmusphinx.sourceforge.net/wiki/
[Tryolabs Blog]: http://blog.tryolabs.com/
[Carthage]: https://github.com/Carthage/Carthage
[slack-image]: http://s13.postimg.org/ybwy92ktf/Slack.png
[slack-url]: https://slackin-telsphinx.herokuapp.com/
51 changes: 35 additions & 16 deletions TLSphinx.xcodeproj/project.pbxproj 100644 → 100755
Expand Up @@ -11,11 +11,12 @@
9B5187BB1B18E1E80000E78D /* TLSphinx.framework in Frameworks */ = {isa = PBXBuildFile; fileRef = 9B5187AF1B18E1E80000E78D /* TLSphinx.framework */; };
9B5187C21B18E1E80000E78D /* Basic.swift in Sources */ = {isa = PBXBuildFile; fileRef = 9B5187C11B18E1E80000E78D /* Basic.swift */; };
9B5187CC1B18E79B0000E78D /* Decoder.swift in Sources */ = {isa = PBXBuildFile; fileRef = 9B5187CB1B18E79B0000E78D /* Decoder.swift */; };
9B5187E51B18EE280000E78D /* en-us in Resources */ = {isa = PBXBuildFile; fileRef = 9B5187E41B18EE280000E78D /* en-us */; };
9B5187E71B18F0D40000E78D /* Config.swift in Sources */ = {isa = PBXBuildFile; fileRef = 9B5187E61B18F0D40000E78D /* Config.swift */; };
9B5187E91B18F1A50000E78D /* Globals.swift in Sources */ = {isa = PBXBuildFile; fileRef = 9B5187E81B18F1A50000E78D /* Globals.swift */; };
9B5187EB1B19011B0000E78D /* LiveDecode.swift in Sources */ = {isa = PBXBuildFile; fileRef = 9B5187EA1B19011B0000E78D /* LiveDecode.swift */; };
9B943A291B1CB30A00CD4F11 /* Hypotesis.swift in Sources */ = {isa = PBXBuildFile; fileRef = 9B943A281B1CB30A00CD4F11 /* Hypotesis.swift */; };
AEAF925F20C87E1E0039F4E0 /* en-us in Resources */ = {isa = PBXBuildFile; fileRef = AEAF925E20C87E1E0039F4E0 /* en-us */; };
AEAF926020C87E1E0039F4E0 /* en-us in Resources */ = {isa = PBXBuildFile; fileRef = AEAF925E20C87E1E0039F4E0 /* en-us */; };
/* End PBXBuildFile section */

/* Begin PBXContainerItemProxy section */
Expand All @@ -36,11 +37,11 @@
9B5187C01B18E1E80000E78D /* Info.plist */ = {isa = PBXFileReference; lastKnownFileType = text.plist.xml; path = Info.plist; sourceTree = "<group>"; };
9B5187C11B18E1E80000E78D /* Basic.swift */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.swift; path = Basic.swift; sourceTree = "<group>"; };
9B5187CB1B18E79B0000E78D /* Decoder.swift */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.swift; path = Decoder.swift; sourceTree = "<group>"; };
9B5187E41B18EE280000E78D /* en-us */ = {isa = PBXFileReference; lastKnownFileType = folder; path = "en-us"; sourceTree = "<group>"; };
9B5187E61B18F0D40000E78D /* Config.swift */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.swift; path = Config.swift; sourceTree = "<group>"; };
9B5187E81B18F1A50000E78D /* Globals.swift */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.swift; path = Globals.swift; sourceTree = "<group>"; };
9B5187EA1B19011B0000E78D /* LiveDecode.swift */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.swift; path = LiveDecode.swift; sourceTree = "<group>"; };
9B943A281B1CB30A00CD4F11 /* Hypotesis.swift */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.swift; path = Hypotesis.swift; sourceTree = "<group>"; };
AEAF925E20C87E1E0039F4E0 /* en-us */ = {isa = PBXFileReference; lastKnownFileType = folder; name = "en-us"; path = "Sphinx/share/pocketsphinx/model/en-us"; sourceTree = SOURCE_ROOT; };
/* End PBXFileReference section */

/* Begin PBXFrameworksBuildPhase section */
Expand Down Expand Up @@ -95,6 +96,7 @@
9B5187B21B18E1E80000E78D /* Supporting Files */ = {
isa = PBXGroup;
children = (
AEAF922820C86AF70039F4E0 /* model */,
9B5187B41B18E1E80000E78D /* TLSphinx.h */,
9B5187B31B18E1E80000E78D /* Info.plist */,
);
Expand All @@ -114,20 +116,18 @@
9B5187BF1B18E1E80000E78D /* Supporting Files */ = {
isa = PBXGroup;
children = (
9B5187CD1B18EDF40000E78D /* model */,
9B5187C01B18E1E80000E78D /* Info.plist */,
);
name = "Supporting Files";
sourceTree = "<group>";
};
9B5187CD1B18EDF40000E78D /* model */ = {
AEAF922820C86AF70039F4E0 /* model */ = {
isa = PBXGroup;
children = (
9B5187E41B18EE280000E78D /* en-us */,
AEAF925E20C87E1E0039F4E0 /* en-us */,
);
name = model;
path = Sphinx/share/pocketsphinx/model;
sourceTree = SOURCE_ROOT;
sourceTree = "<group>";
};
/* End PBXGroup section */

Expand Down Expand Up @@ -187,7 +187,7 @@
attributes = {
LastSwiftMigration = 0720;
LastSwiftUpdateCheck = 0720;
LastUpgradeCheck = 0800;
LastUpgradeCheck = 0940;
ORGANIZATIONNAME = "Bruno Berisso";
TargetAttributes = {
9B5187AE1B18E1E80000E78D = {
Expand All @@ -196,7 +196,7 @@
};
9B5187B91B18E1E80000E78D = {
CreatedOnToolsVersion = 6.3.2;
LastSwiftMigration = 0800;
LastSwiftMigration = 0940;
};
};
};
Expand All @@ -223,14 +223,15 @@
isa = PBXResourcesBuildPhase;
buildActionMask = 2147483647;
files = (
AEAF925F20C87E1E0039F4E0 /* en-us in Resources */,
);
runOnlyForDeploymentPostprocessing = 0;
};
9B5187B81B18E1E80000E78D /* Resources */ = {
isa = PBXResourcesBuildPhase;
buildActionMask = 2147483647;
files = (
9B5187E51B18EE280000E78D /* en-us in Resources */,
AEAF926020C87E1E0039F4E0 /* en-us in Resources */,
);
runOnlyForDeploymentPostprocessing = 0;
};
Expand Down Expand Up @@ -276,14 +277,22 @@
CLANG_CXX_LIBRARY = "libc++";
CLANG_ENABLE_MODULES = YES;
CLANG_ENABLE_OBJC_ARC = YES;
CLANG_WARN_BLOCK_CAPTURE_AUTORELEASING = YES;
CLANG_WARN_BOOL_CONVERSION = YES;
CLANG_WARN_COMMA = YES;
CLANG_WARN_CONSTANT_CONVERSION = YES;
CLANG_WARN_DEPRECATED_OBJC_IMPLEMENTATIONS = YES;
CLANG_WARN_DIRECT_OBJC_ISA_USAGE = YES_ERROR;
CLANG_WARN_EMPTY_BODY = YES;
CLANG_WARN_ENUM_CONVERSION = YES;
CLANG_WARN_INFINITE_RECURSION = YES;
CLANG_WARN_INT_CONVERSION = YES;
CLANG_WARN_NON_LITERAL_NULL_CONVERSION = YES;
CLANG_WARN_OBJC_IMPLICIT_RETAIN_SELF = YES;
CLANG_WARN_OBJC_LITERAL_CONVERSION = YES;
CLANG_WARN_OBJC_ROOT_CLASS = YES_ERROR;
CLANG_WARN_RANGE_LOOP_ANALYSIS = YES;
CLANG_WARN_STRICT_PROTOTYPES = YES;
CLANG_WARN_SUSPICIOUS_MOVE = YES;
CLANG_WARN_UNREACHABLE_CODE = YES;
CLANG_WARN__DUPLICATE_METHOD_MATCH = YES;
Expand All @@ -308,7 +317,7 @@
GCC_WARN_UNINITIALIZED_AUTOS = YES_AGGRESSIVE;
GCC_WARN_UNUSED_FUNCTION = YES;
GCC_WARN_UNUSED_VARIABLE = YES;
IPHONEOS_DEPLOYMENT_TARGET = 9.0;
IPHONEOS_DEPLOYMENT_TARGET = 10.0;
MTL_ENABLE_DEBUG_INFO = YES;
ONLY_ACTIVE_ARCH = YES;
SDKROOT = iphoneos;
Expand All @@ -328,14 +337,22 @@
CLANG_CXX_LIBRARY = "libc++";
CLANG_ENABLE_MODULES = YES;
CLANG_ENABLE_OBJC_ARC = YES;
CLANG_WARN_BLOCK_CAPTURE_AUTORELEASING = YES;
CLANG_WARN_BOOL_CONVERSION = YES;
CLANG_WARN_COMMA = YES;
CLANG_WARN_CONSTANT_CONVERSION = YES;
CLANG_WARN_DEPRECATED_OBJC_IMPLEMENTATIONS = YES;
CLANG_WARN_DIRECT_OBJC_ISA_USAGE = YES_ERROR;
CLANG_WARN_EMPTY_BODY = YES;
CLANG_WARN_ENUM_CONVERSION = YES;
CLANG_WARN_INFINITE_RECURSION = YES;
CLANG_WARN_INT_CONVERSION = YES;
CLANG_WARN_NON_LITERAL_NULL_CONVERSION = YES;
CLANG_WARN_OBJC_IMPLICIT_RETAIN_SELF = YES;
CLANG_WARN_OBJC_LITERAL_CONVERSION = YES;
CLANG_WARN_OBJC_ROOT_CLASS = YES_ERROR;
CLANG_WARN_RANGE_LOOP_ANALYSIS = YES;
CLANG_WARN_STRICT_PROTOTYPES = YES;
CLANG_WARN_SUSPICIOUS_MOVE = YES;
CLANG_WARN_UNREACHABLE_CODE = YES;
CLANG_WARN__DUPLICATE_METHOD_MATCH = YES;
Expand All @@ -353,7 +370,7 @@
GCC_WARN_UNINITIALIZED_AUTOS = YES_AGGRESSIVE;
GCC_WARN_UNUSED_FUNCTION = YES;
GCC_WARN_UNUSED_VARIABLE = YES;
IPHONEOS_DEPLOYMENT_TARGET = 9.0;
IPHONEOS_DEPLOYMENT_TARGET = 10.0;
MTL_ENABLE_DEBUG_INFO = NO;
SDKROOT = iphoneos;
SWIFT_OPTIMIZATION_LEVEL = "-Owholemodule";
Expand Down Expand Up @@ -384,7 +401,7 @@
);
INFOPLIST_FILE = TLSphinx/Info.plist;
INSTALL_PATH = "$(LOCAL_LIBRARY_DIR)/Frameworks";
IPHONEOS_DEPLOYMENT_TARGET = 9.0;
IPHONEOS_DEPLOYMENT_TARGET = 10.0;
LD_RUNPATH_SEARCH_PATHS = "$(inherited) @executable_path/Frameworks @loader_path/Frameworks";
LIBRARY_SEARCH_PATHS = (
"$(PROJECT_DIR)/Sphinx/lib/pocketsphinx",
Expand Down Expand Up @@ -418,7 +435,7 @@
);
INFOPLIST_FILE = TLSphinx/Info.plist;
INSTALL_PATH = "$(LOCAL_LIBRARY_DIR)/Frameworks";
IPHONEOS_DEPLOYMENT_TARGET = 9.0;
IPHONEOS_DEPLOYMENT_TARGET = 10.0;
LD_RUNPATH_SEARCH_PATHS = "$(inherited) @executable_path/Frameworks @loader_path/Frameworks";
LIBRARY_SEARCH_PATHS = (
"$(PROJECT_DIR)/Sphinx/lib/pocketsphinx",
Expand Down Expand Up @@ -454,7 +471,8 @@
);
PRODUCT_BUNDLE_IDENTIFIER = "com.tryolabs.$(PRODUCT_NAME:rfc1034identifier)";
PRODUCT_NAME = "$(TARGET_NAME)";
SWIFT_VERSION = 3.0;
SWIFT_SWIFT3_OBJC_INFERENCE = On;
SWIFT_VERSION = 4.0;
};
name = Debug;
};
Expand All @@ -476,7 +494,8 @@
);
PRODUCT_BUNDLE_IDENTIFIER = "com.tryolabs.$(PRODUCT_NAME:rfc1034identifier)";
PRODUCT_NAME = "$(TARGET_NAME)";
SWIFT_VERSION = 3.0;
SWIFT_SWIFT3_OBJC_INFERENCE = On;
SWIFT_VERSION = 4.0;
};
name = Release;
};
Expand Down
Empty file modified TLSphinx.xcodeproj/project.xcworkspace/contents.xcworkspacedata 100644 → 100755
Empty file.
2 changes: 1 addition & 1 deletion TLSphinx.xcodeproj/xcshareddata/xcschemes/TLSphinx.xcscheme 100644 → 100755
@@ -1,6 +1,6 @@
<?xml version="1.0" encoding="UTF-8"?>
<Scheme
LastUpgradeVersion = "0800"
LastUpgradeVersion = "0940"
version = "1.3">
<BuildAction
parallelizeBuildables = "YES"
Expand Down
Empty file modified TLSphinx/Config.swift 100644 → 100755
Empty file.
71 changes: 50 additions & 21 deletions TLSphinx/Decoder.swift 100644 → 100755
Expand Up @@ -5,11 +5,14 @@
// Created by Bruno Berisso on 5/29/15.
// Copyright (c) 2015 Bruno Berisso. All rights reserved.
//
// Updated by mainvolume Copyright © 2018. All rights reserved.

import Foundation
import AVFoundation
import Sphinx

//A global buffer size in the decoder to be used when initialize the models -nfft parameter
public let bufferSize = 16384

fileprivate enum SpeechStateEnum : CustomStringConvertible {
case silence
Expand Down Expand Up @@ -123,7 +126,7 @@ public final class Decoder {

fileprivate func hypotesisForSpeech (inFile fileHandle: FileHandle) -> Hypothesis? {

start_utt()
startUtterence()

let hypothesis = fileHandle.reduceChunks(2048, initial: nil, reducer: {
(data: Data, partialHyp: Hypothesis?) -> Hypothesis? in
Expand All @@ -133,15 +136,15 @@ public final class Decoder {
var resultantHyp = partialHyp
if speechState == .utterance {

end_utt()
stopUtterence()
resultantHyp = partialHyp + get_hyp()
start_utt()
startUtterence()
}

return resultantHyp
})

end_utt()
stopUtterence()

//Process any pending speech
if speechState == .speech {
Expand All @@ -165,33 +168,32 @@ public final class Decoder {
}
}
}
public func startDecodingSpeech (_ utteranceComplete: @escaping (Hypothesis?) -> ()) throws {

public func startDecodingSpeech (_ audioSessionCategoryOptions:AVAudioSessionCategoryOptions = [.mixWithOthers, .allowBluetoothA2DP], utteranceComplete: @escaping (Hypothesis?) -> ()) throws {
do {
try AVAudioSession.sharedInstance().setCategory(AVAudioSessionCategoryRecord)
try AVAudioSession.sharedInstance().setCategory(AVAudioSessionCategoryPlayAndRecord, with: audioSessionCategoryOptions)
} catch let error as NSError {
print("Error setting the shared AVAudioSession: \(error)")
throw DecodeErrors.CantSetAudioSession(error)
}

engine = AVAudioEngine()

let input = engine.inputNode
let mixer = AVAudioMixerNode()
engine.attach(mixer)
engine.connect(input, to: mixer, format: input.outputFormat(forBus: 0))

// We forceunwrap this because the docs for AVAudioFormat specify that this constructor return nil when the channels
// are grater than 2.
let formatIn = AVAudioFormat(commonFormat: .pcmFormatFloat32, sampleRate: 16000, channels: 1, interleaved: false)!
let formatOut = AVAudioFormat(commonFormat: .pcmFormatInt16, sampleRate: 16000, channels: 1, interleaved: false)!

let formatIn = AVAudioFormat(commonFormat: .pcmFormatFloat32, sampleRate: input.outputFormat(forBus: 0).sampleRate, channels: 1, interleaved: false)!
let formatOut = AVAudioFormat(commonFormat: .pcmFormatInt16, sampleRate: input.outputFormat(forBus: 0).sampleRate, channels: 1, interleaved: false)!

guard let bufferMapper = AVAudioConverter(from: formatIn, to: formatOut) else {
// Returns nil if the format conversion is not possible.
throw DecodeErrors.CantConvertAudioFormat
}

mixer.installTap(onBus: 0, bufferSize: 2048, format: formatIn, block: {
mixer.installTap(onBus: 0, bufferSize: AVAudioFrameCount(bufferSize), format: formatIn, block: {
[unowned self] (buffer: AVAudioPCMBuffer!, time: AVAudioTime!) in

guard let sphinxBuffer = AVAudioPCMBuffer(pcmFormat: formatOut, frameCapacity: buffer.frameCapacity) else {
Expand All @@ -218,27 +220,28 @@ public final class Decoder {
let audioData = sphinxBuffer.toData()
self.process_raw(audioData)

print("Process: \(buffer.frameLength) frames - \(audioData.count) bytes - sample time: \(time.sampleTime)")
// uncomment for frame monitorting
//print("Process: \(buffer.frameLength) frames - \(audioData.count) bytes - sample time: \(time.sampleTime)")

if self.speechState == .utterance {

self.end_utt()
self.endUtterence()
let hypothesis = self.get_hyp()

DispatchQueue.main.async {
utteranceComplete(hypothesis)
}

self.start_utt()
self.startUtterence()
}
})

start_utt()
startUtterence()

do {
try engine.start()
} catch let error as NSError {
end_utt()
endUtterence()
print("Can't start AVAudioEngine: \(error)")
throw DecodeErrors.CantStartAudioEngine(error)
}
Expand All @@ -248,7 +251,33 @@ public final class Decoder {
engine.stop()
engine = nil
}


public func startUtterence() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this and endUtterence shouldn't be public. Is my understanding that we needed public because you should call startUtterance() before startDecodingBuffer right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is accurate. Shall we make the endUtterence private you mean?

self.start_utt()
}

public func startDecodingBuffer(buffer: AVAudioPCMBuffer!, time: AVAudioTime!, utteranceComplete: @escaping (Hypothesis?)-> ()) throws {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏🏻👏🏻👏🏻 nice!
These will be really useful. How are you testing this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, something was wrong with the tabs? jaja

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tabs... editing in github as the codebase in home and at work right now. 😂

Havent written any tests, but to bypass the microphone usage for the thinking machine implementation, a synthesized continuous buffer is passed to the function with which works quite sweet with.

The function is based on the streaming function but with the option of creating the buffer before passing it to the function, instead of using the tap in the function.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hold on, fixing the tabs.


let audioData = buffer.toData()
self.process_raw(audioData)

if self.speechState == .utterance {

self.endUtterence()
let hypothesis = self.get_hyp()

DispatchQueue.main.async {
utteranceComplete(hypothesis)
}

self.startUtterence()
}
}

public func endUtterence() {
self.end_utt()
}

public func add(words:Array<(word: String, phones: String)>) throws {

guard engine == nil || !engine.isRunning else {
Expand Down
Empty file modified TLSphinx/Globals.swift 100644 → 100755
Empty file.
Empty file modified TLSphinx/Hypotesis.swift 100644 → 100755
Empty file.
Empty file modified TLSphinx/Info.plist 100644 → 100755
Empty file.
Empty file modified TLSphinx/TLSphinx.h 100644 → 100755
Empty file.
Empty file modified TLSphinxTests/Basic.swift 100644 → 100755
Empty file.
Empty file modified TLSphinxTests/Info.plist 100644 → 100755
Empty file.
Empty file modified TLSphinxTests/LiveDecode.swift 100644 → 100755
Empty file.