Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Got the repo kinda working #26

Closed
SDCalvo opened this issue Apr 4, 2024 · 12 comments · Fixed by #37
Closed

Got the repo kinda working #26

SDCalvo opened this issue Apr 4, 2024 · 12 comments · Fixed by #37

Comments

@SDCalvo
Copy link

SDCalvo commented Apr 4, 2024

Right now these are my logs from a recent call

Server running on port 3000
Twilio -> Starting Media Stream for MZ1cbaeb64da6297ccb08a8cf316d2fe6c
Interaction 1: TTS -> TWILIO: Hello! I understand you're looking for a pair of AirPods, is that correct?
Twilio -> Audio completed mark (6): a85266e3-9158-4d27-8b46-6d72faa1416e
UtteranceEnd received before speechFinal, emit the text collected so far:  Hi there. Can you hear me?
Interaction 0 – STT -> GPT:  Hi there. Can you hear me?
Interaction 0: GPT -> TTS: Yes, I can hear you loud and clear •
Interaction 0: TTS -> TWILIO: Yes, I can hear you loud and clear •
Twilio -> Audio completed mark (272): dc2fe939-fbc0-4f0f-93c0-595cd7613ed9
Interaction 0: GPT -> TTS:  How may I assist you today with your AirPods purchase?
GPT -> user context length: 5
Interaction 0: TTS -> TWILIO:  How may I assist you today with your AirPods purchase?
Twilio -> Audio completed mark (273): 3149e0b8-52d4-420a-a6b2-e9c7d1c80fcc
STT -> Deepgram connection closed
Twilio -> Media stream MZ1cbaeb64da6297ccb08a8cf316d2fe6c ended.
[nodemon] restarting due to changes...
[nodemon] starting `ts-node src/app.ts`
Server running on port 3000
Twilio -> Starting Media Stream for MZ60a84bb1246cfd7160de45c4fbce614d
Interaction 1: TTS -> TWILIO: Hello! I understand you're looking for a pair of AirPods, is that correct?
Twilio -> Audio completed mark (10): 5f5b7ca0-e6fa-4d0d-bb6c-3507eaea5928
STT -> Deepgram connection closed
Twilio -> Media stream MZ60a84bb1246cfd7160de45c4fbce614d ended.

I get no audio on the call, like no audio whatsoever but I do see some transcripts in the console, I honestly can't seem to understand what the issue might be as there are no errors, it seems like deepgram closes conection out of the blue for one thing and also TTS never sends the audio to the actual call. Any ideas?

@SDCalvo
Copy link
Author

SDCalvo commented Apr 4, 2024

I thought the issue might have been me trying to add typescript to the entire repo but I've just cloned it fresh and started the server, then on another console run the outbound script and these are the logs

Server running on port 3000
Twilio -> Starting Media Stream for MZ6d0e55f33c98d84cfd9d85816b56e03e
Interaction 1: TTS -> TWILIO: Hello! I understand you're looking for a pair of AirPods, is that correct?
Twilio -> Audio completed mark (2): 619229c6-2fa7-4e04-abf3-71fb2adf2caa
STT -> Deepgram connection closed
Twilio -> Media stream MZ6d0e55f33c98d84cfd9d85816b56e03e ended.

On the phone I hear nothing, and there does not appear to be any transcription going on whatsoever, I'm not sure what might be the issue here but on the logs I can only see Deepgram connection closed, any ideas?

@SDCalvo
Copy link
Author

SDCalvo commented Apr 4, 2024

Ok turns out that I forgot to add the elevenlabs api key to the .env, but still I get an issue with deepgram where it always clses the conection, can't figure out why yet, but at the very least I got the first message as audio in the phone call (Icould hear "Rachel's" voice on the phone), but after responding the next log is always Deepgram concection closed

@SDCalvo
Copy link
Author

SDCalvo commented Apr 4, 2024

If anyone sees this, the issue was that deeprgram SDK changed, I managed to get everything to work, or kind of, every service is now working by modifying the actual transcription service to comply with the new SDK

class TranscriptionService extends EventEmitter {
  private connection;
  private deepgramApiKey: string | undefined;

  constructor() {
    super();
    this.deepgramApiKey = process.env.DEEPGRAM_API_KEY;

    const deepgram = createClient(this.deepgramApiKey as string);
    this.connection = deepgram.listen.live({
      encoding: "mulaw",
      sample_rate: 8000,
      model: "nova-2",
      punctuate: true,
      interim_results: true,
      endpointing: 200,
      utterance_end_ms: 1000,
    });

    // Setup event listeners
    this.setupListeners();
  }

  setupListeners() {
    this.connection.on(LiveTranscriptionEvents.Open, () => {
      console.log("Connection opened.");
    });

    this.connection.on(LiveTranscriptionEvents.Transcript, (data: any) => {
      // Handle transcription data
      const transcription = data;
      const alternatives = transcription.channel?.alternatives;
      let text = "";
      if (alternatives) {
        text = alternatives[0]?.transcript;
        console.log(`Logger - Text: ${text}`);
      }

      this.emit("transcription", text);
    });

    this.connection.on(LiveTranscriptionEvents.Metadata, (data) => {
      console.log("Received metadata:", data);
    });

    this.connection.on(LiveTranscriptionEvents.Close, (data: any) => {
      console.log(`Connection closed: ${data}`);
      this.emit("closed");
    });

    this.connection.on(LiveTranscriptionEvents.Error, (error) => {
      console.error("Error:", error);
      this.emit("error", error);
    });
  }

  send(payload: string) {
    // Convert payload to Buffer and send immediately
    const audioData = Buffer.from(payload, "base64");
    if (this.connection.getReadyState() === 1) {
      // Ensure the connection is open
      this.connection.send(audioData);
    }
  }

  // The flushBuffer method is no longer needed as we are sending data immediately
}

export { TranscriptionService };

The issue I'm facing now is that the conversations seem to be out of sync, so I answer then gpt generates an answer then that seems to hapen a couple of times in a row and while I wait on the call for the agent to speak (Which speaks threee messages in a row without me talking) the deepgram timout hits and the conection closes.

Logs if anyone is interested.

[nodemon] restarting due to changes...
[nodemon] starting `ts-node src/app.ts`
Server running on port 3000
Twilio -> Starting Media Stream for MZ9d8fa630d3bc09699408043a58c73916
Connection opened.
Interaction 1: TTS -> TWILIO: Hello! I understand you're looking for a pair of AirPods, is that correct?
Sending audio
Twilio -> Audio completed mark (15): 28a1c047-8332-4fdd-82fc-4bbfd2815ef6
Logger - Text: Order. Yes. I
Interaction 0 - STT -> GPT: Order. Yes. I
Logger - Text: Order. Yes. I would like to buy a pair of
Interaction 1 - STT -> GPT: Order. Yes. I would like to buy a pair of
Interaction 1: GPT -> TTS: Awesome! Are you leaning towards the in-ear style like the AirPods or AirPods Pro, or would you prefer the over-ear design of the AirPods Max?
GPT -> user context length: 6
Interaction 0: GPT -> TTS: Fantastic! Let's find the perfect fit for you. •
Interaction 0: GPT -> TTS:  Do you prefer headphones that go in your ear, •
Interaction 0: TTS -> TWILIO: Fantastic! Let's find the perfect fit for you. •
Interaction 0: GPT -> TTS:  or do you like the over-the-ear style?
GPT -> user context length: 7
Interaction 1: TTS -> TWILIO: Awesome! Are you leaning towards the in-ear style like the AirPods or AirPods Pro, or would you prefer the over-ear design of the AirPods Max?
Sending audio
Sending audio
Interaction 0: TTS -> TWILIO:  Do you prefer headphones that go in your ear, •
Sending audio
Interaction 0: TTS -> TWILIO:  or do you like the over-the-ear style?
Sending audio
Twilio -> Audio completed mark (134): ad819ca4-7865-4082-ae14-aad935a0f26e
Received metadata: {
  type: 'Metadata',
  transaction_key: 'deprecated',
  request_id: '8fec6fc6-62c4-4006-a8bf-4e772dd25b2a',
  sha256: 'incomplete',
  created: '2024-04-04T23:23:18.781Z',
  duration: 2.0199375,
  channels: 1,
  models: [ '1dbdfb4d-85b2-4659-9831-16b3c76229aa' ],
  model_info: {
    '1dbdfb4d-85b2-4659-9831-16b3c76229aa': {
      name: '2-general-nova',
      version: '2024-01-11.36317',
      arch: 'nova-2'
    }
  }
}
Connection closed: [object Object]
Twilio -> Audio completed mark (135): 51f9b784-d495-418d-93bd-e4f4d518bcb9
Twilio -> Audio completed mark (136): 81a9ec03-e909-42b2-9b1f-1539dd3327dd
Twilio -> Audio completed mark (137): 8771f6c6-e65c-4633-8f90-d9ec29c51137
Twilio -> Media stream MZ9d8fa630d3bc09699408043a58c73916 ended.

Given how the interaction orders are numbered I think there might be an issue in the interaction handling, I'll have to keep debuguing to see, but I hope this helps someone else that might want to give this fantastic repo a try!

@cweems
Copy link
Collaborator

cweems commented Apr 10, 2024

@SDCalvo Hey sorry for the late reply here! Do you know which version of the Deepgram SDK caused the change? My guess would be 3.x.x, but what I'm wondering is how you got that version of the SDK since this project specifies ^2.4.0. Did you intentionally upgrade to the latest version?

I'll take a look at supporting the new DG SDK.

@SDCalvo
Copy link
Author

SDCalvo commented Apr 10, 2024

Honestly I don't remember, I think I might've upgraded by accident? Not entirely sure, also thanks for the reply! And let me know if I could help you upgrade and/or add typescript support, the work ou've done here is fantastic!

@SDCalvo
Copy link
Author

SDCalvo commented Apr 10, 2024

My package.json right now

{
  "name": "genai-phone",
  "version": "1.1.0",
  "description": "",
  "main": "dist/app.js",
  "scripts": {
    "inbound": "node ./dist/scripts/inbound-call.js",
    "outbound": "node ./dist/scripts/outbound-call.js",
    "test": "jest",
    "build": "tsc",
    "start": "node dist/app.js",
    "dev": "nodemon --exec ts-node src/app.ts"
  },
  "keywords": [],
  "author": "Santiago Calvo",
  "license": "MIT",
  "dependencies": {
    "@deepgram/sdk": "^3.2.0",
    "@types/express-ws": "^3.0.4",
    "colors": "^1.4.0",
    "cross-fetch": "^4.0.0",
    "dotenv": "^16.3.1",
    "express": "^4.18.2",
    "express-ws": "^5.0.2",
    "node-fetch": "^2.7.0",
    "openai": "^4.20.1",
    "twilio": "^4.19.3",
    "uuid": "^9.0.1",
    "wavefile": "^11.0.0"
  },
  "devDependencies": {
    "@flydotio/dockerfile": "^0.4.11",
    "@types/express": "^4.17.21",
    "@types/node": "^20.12.3",
    "@types/uuid": "^9.0.8",
    "eslint": "^8.57.0",
    "jest": "^29.7.0",
    "nodemon": "^3.0.2",
    "ts-node": "^10.9.2",
    "typescript": "^5.4.3"
  }
}

I probly updated the SDK version without noticing it at some point

@mercuryyy
Copy link

Any update on this? would be great to be able to use deepgram for the TTS it is much better then 11labs

@SDCalvo
Copy link
Author

SDCalvo commented May 21, 2024

Not really, I ended up using only openai to make a POC, using tts and stt from open ai directly and the new model gpt4o, it's pretty fast, got it to use tools, and it works overall great tbh

@mercuryyy
Copy link

mercuryyy commented May 21, 2024

I also updated the SDK because i was trying to code a class for deepgram to work with TTS now i see what you meant with it messing up the STT :(

@cweems any chance on supporting the new SDK ?

@mercuryyy
Copy link

So i found a workaround i just installed both versions of the SKD 2.4 and 3.3 with an aliase and i use the 3.3 for the TTS, works great but probably best to update the transcribe "STT" to work with the new SDK

@SDCalvo
Copy link
Author

SDCalvo commented May 21, 2024

oh that's smart!!

@ketan9712735468
Copy link

ketan9712735468 commented Jun 3, 2024

@SDCalvo, You need to take a subscription to https://elevenlabs.io/ and use that API key it might work for me.
Before I got the same issues but after Elevenlabs subscription plan I got a voice into the call

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants