Got the repo kinda working #26

SDCalvo · 2024-04-04T02:05:59Z

Right now these are my logs from a recent call

Server running on port 3000
Twilio -> Starting Media Stream for MZ1cbaeb64da6297ccb08a8cf316d2fe6c
Interaction 1: TTS -> TWILIO: Hello! I understand you're looking for a pair of AirPods, is that correct?
Twilio -> Audio completed mark (6): a85266e3-9158-4d27-8b46-6d72faa1416e
UtteranceEnd received before speechFinal, emit the text collected so far:  Hi there. Can you hear me?
Interaction 0 – STT -> GPT:  Hi there. Can you hear me?
Interaction 0: GPT -> TTS: Yes, I can hear you loud and clear •
Interaction 0: TTS -> TWILIO: Yes, I can hear you loud and clear •
Twilio -> Audio completed mark (272): dc2fe939-fbc0-4f0f-93c0-595cd7613ed9
Interaction 0: GPT -> TTS:  How may I assist you today with your AirPods purchase?
GPT -> user context length: 5
Interaction 0: TTS -> TWILIO:  How may I assist you today with your AirPods purchase?
Twilio -> Audio completed mark (273): 3149e0b8-52d4-420a-a6b2-e9c7d1c80fcc
STT -> Deepgram connection closed
Twilio -> Media stream MZ1cbaeb64da6297ccb08a8cf316d2fe6c ended.
[nodemon] restarting due to changes...
[nodemon] starting `ts-node src/app.ts`
Server running on port 3000
Twilio -> Starting Media Stream for MZ60a84bb1246cfd7160de45c4fbce614d
Interaction 1: TTS -> TWILIO: Hello! I understand you're looking for a pair of AirPods, is that correct?
Twilio -> Audio completed mark (10): 5f5b7ca0-e6fa-4d0d-bb6c-3507eaea5928
STT -> Deepgram connection closed
Twilio -> Media stream MZ60a84bb1246cfd7160de45c4fbce614d ended.

I get no audio on the call, like no audio whatsoever but I do see some transcripts in the console, I honestly can't seem to understand what the issue might be as there are no errors, it seems like deepgram closes conection out of the blue for one thing and also TTS never sends the audio to the actual call. Any ideas?

The text was updated successfully, but these errors were encountered:

SDCalvo · 2024-04-04T02:28:33Z

I thought the issue might have been me trying to add typescript to the entire repo but I've just cloned it fresh and started the server, then on another console run the outbound script and these are the logs

Server running on port 3000
Twilio -> Starting Media Stream for MZ6d0e55f33c98d84cfd9d85816b56e03e
Interaction 1: TTS -> TWILIO: Hello! I understand you're looking for a pair of AirPods, is that correct?
Twilio -> Audio completed mark (2): 619229c6-2fa7-4e04-abf3-71fb2adf2caa
STT -> Deepgram connection closed
Twilio -> Media stream MZ6d0e55f33c98d84cfd9d85816b56e03e ended.

On the phone I hear nothing, and there does not appear to be any transcription going on whatsoever, I'm not sure what might be the issue here but on the logs I can only see Deepgram connection closed, any ideas?

SDCalvo · 2024-04-04T02:47:00Z

Ok turns out that I forgot to add the elevenlabs api key to the .env, but still I get an issue with deepgram where it always clses the conection, can't figure out why yet, but at the very least I got the first message as audio in the phone call (Icould hear "Rachel's" voice on the phone), but after responding the next log is always Deepgram concection closed

SDCalvo · 2024-04-04T23:29:30Z

If anyone sees this, the issue was that deeprgram SDK changed, I managed to get everything to work, or kind of, every service is now working by modifying the actual transcription service to comply with the new SDK

class TranscriptionService extends EventEmitter {
  private connection;
  private deepgramApiKey: string | undefined;

  constructor() {
    super();
    this.deepgramApiKey = process.env.DEEPGRAM_API_KEY;

    const deepgram = createClient(this.deepgramApiKey as string);
    this.connection = deepgram.listen.live({
      encoding: "mulaw",
      sample_rate: 8000,
      model: "nova-2",
      punctuate: true,
      interim_results: true,
      endpointing: 200,
      utterance_end_ms: 1000,
    });

    // Setup event listeners
    this.setupListeners();
  }

  setupListeners() {
    this.connection.on(LiveTranscriptionEvents.Open, () => {
      console.log("Connection opened.");
    });

    this.connection.on(LiveTranscriptionEvents.Transcript, (data: any) => {
      // Handle transcription data
      const transcription = data;
      const alternatives = transcription.channel?.alternatives;
      let text = "";
      if (alternatives) {
        text = alternatives[0]?.transcript;
        console.log(`Logger - Text: ${text}`);
      }

      this.emit("transcription", text);
    });

    this.connection.on(LiveTranscriptionEvents.Metadata, (data) => {
      console.log("Received metadata:", data);
    });

    this.connection.on(LiveTranscriptionEvents.Close, (data: any) => {
      console.log(`Connection closed: ${data}`);
      this.emit("closed");
    });

    this.connection.on(LiveTranscriptionEvents.Error, (error) => {
      console.error("Error:", error);
      this.emit("error", error);
    });
  }

  send(payload: string) {
    // Convert payload to Buffer and send immediately
    const audioData = Buffer.from(payload, "base64");
    if (this.connection.getReadyState() === 1) {
      // Ensure the connection is open
      this.connection.send(audioData);
    }
  }

  // The flushBuffer method is no longer needed as we are sending data immediately
}

export { TranscriptionService };

The issue I'm facing now is that the conversations seem to be out of sync, so I answer then gpt generates an answer then that seems to hapen a couple of times in a row and while I wait on the call for the agent to speak (Which speaks threee messages in a row without me talking) the deepgram timout hits and the conection closes.

Logs if anyone is interested.

[nodemon] restarting due to changes...
[nodemon] starting `ts-node src/app.ts`
Server running on port 3000
Twilio -> Starting Media Stream for MZ9d8fa630d3bc09699408043a58c73916
Connection opened.
Interaction 1: TTS -> TWILIO: Hello! I understand you're looking for a pair of AirPods, is that correct?
Sending audio
Twilio -> Audio completed mark (15): 28a1c047-8332-4fdd-82fc-4bbfd2815ef6
Logger - Text: Order. Yes. I
Interaction 0 - STT -> GPT: Order. Yes. I
Logger - Text: Order. Yes. I would like to buy a pair of
Interaction 1 - STT -> GPT: Order. Yes. I would like to buy a pair of
Interaction 1: GPT -> TTS: Awesome! Are you leaning towards the in-ear style like the AirPods or AirPods Pro, or would you prefer the over-ear design of the AirPods Max?
GPT -> user context length: 6
Interaction 0: GPT -> TTS: Fantastic! Let's find the perfect fit for you. •
Interaction 0: GPT -> TTS:  Do you prefer headphones that go in your ear, •
Interaction 0: TTS -> TWILIO: Fantastic! Let's find the perfect fit for you. •
Interaction 0: GPT -> TTS:  or do you like the over-the-ear style?
GPT -> user context length: 7
Interaction 1: TTS -> TWILIO: Awesome! Are you leaning towards the in-ear style like the AirPods or AirPods Pro, or would you prefer the over-ear design of the AirPods Max?
Sending audio
Sending audio
Interaction 0: TTS -> TWILIO:  Do you prefer headphones that go in your ear, •
Sending audio
Interaction 0: TTS -> TWILIO:  or do you like the over-the-ear style?
Sending audio
Twilio -> Audio completed mark (134): ad819ca4-7865-4082-ae14-aad935a0f26e
Received metadata: {
  type: 'Metadata',
  transaction_key: 'deprecated',
  request_id: '8fec6fc6-62c4-4006-a8bf-4e772dd25b2a',
  sha256: 'incomplete',
  created: '2024-04-04T23:23:18.781Z',
  duration: 2.0199375,
  channels: 1,
  models: [ '1dbdfb4d-85b2-4659-9831-16b3c76229aa' ],
  model_info: {
    '1dbdfb4d-85b2-4659-9831-16b3c76229aa': {
      name: '2-general-nova',
      version: '2024-01-11.36317',
      arch: 'nova-2'
    }
  }
}
Connection closed: [object Object]
Twilio -> Audio completed mark (135): 51f9b784-d495-418d-93bd-e4f4d518bcb9
Twilio -> Audio completed mark (136): 81a9ec03-e909-42b2-9b1f-1539dd3327dd
Twilio -> Audio completed mark (137): 8771f6c6-e65c-4633-8f90-d9ec29c51137
Twilio -> Media stream MZ9d8fa630d3bc09699408043a58c73916 ended.

Given how the interaction orders are numbered I think there might be an issue in the interaction handling, I'll have to keep debuguing to see, but I hope this helps someone else that might want to give this fantastic repo a try!

cweems · 2024-04-10T16:42:51Z

@SDCalvo Hey sorry for the late reply here! Do you know which version of the Deepgram SDK caused the change? My guess would be 3.x.x, but what I'm wondering is how you got that version of the SDK since this project specifies ^2.4.0. Did you intentionally upgrade to the latest version?

I'll take a look at supporting the new DG SDK.

SDCalvo · 2024-04-10T16:45:41Z

Honestly I don't remember, I think I might've upgraded by accident? Not entirely sure, also thanks for the reply! And let me know if I could help you upgrade and/or add typescript support, the work ou've done here is fantastic!

SDCalvo · 2024-04-10T16:49:24Z

My package.json right now

{
  "name": "genai-phone",
  "version": "1.1.0",
  "description": "",
  "main": "dist/app.js",
  "scripts": {
    "inbound": "node ./dist/scripts/inbound-call.js",
    "outbound": "node ./dist/scripts/outbound-call.js",
    "test": "jest",
    "build": "tsc",
    "start": "node dist/app.js",
    "dev": "nodemon --exec ts-node src/app.ts"
  },
  "keywords": [],
  "author": "Santiago Calvo",
  "license": "MIT",
  "dependencies": {
    "@deepgram/sdk": "^3.2.0",
    "@types/express-ws": "^3.0.4",
    "colors": "^1.4.0",
    "cross-fetch": "^4.0.0",
    "dotenv": "^16.3.1",
    "express": "^4.18.2",
    "express-ws": "^5.0.2",
    "node-fetch": "^2.7.0",
    "openai": "^4.20.1",
    "twilio": "^4.19.3",
    "uuid": "^9.0.1",
    "wavefile": "^11.0.0"
  },
  "devDependencies": {
    "@flydotio/dockerfile": "^0.4.11",
    "@types/express": "^4.17.21",
    "@types/node": "^20.12.3",
    "@types/uuid": "^9.0.8",
    "eslint": "^8.57.0",
    "jest": "^29.7.0",
    "nodemon": "^3.0.2",
    "ts-node": "^10.9.2",
    "typescript": "^5.4.3"
  }
}

I probly updated the SDK version without noticing it at some point

mercuryyy · 2024-05-21T02:41:54Z

Any update on this? would be great to be able to use deepgram for the TTS it is much better then 11labs

SDCalvo · 2024-05-21T02:44:05Z

Not really, I ended up using only openai to make a POC, using tts and stt from open ai directly and the new model gpt4o, it's pretty fast, got it to use tools, and it works overall great tbh

mercuryyy · 2024-05-21T04:03:36Z

I also updated the SDK because i was trying to code a class for deepgram to work with TTS now i see what you meant with it messing up the STT :(

@cweems any chance on supporting the new SDK ?

mercuryyy · 2024-05-21T04:30:02Z

So i found a workaround i just installed both versions of the SKD 2.4 and 3.3 with an aliase and i use the 3.3 for the TTS, works great but probably best to update the transcribe "STT" to work with the new SDK

SDCalvo · 2024-05-21T05:36:21Z

oh that's smart!!

ketan9712735468 · 2024-06-03T18:02:07Z

@SDCalvo, You need to take a subscription to https://elevenlabs.io/ and use that API key it might work for me.
Before I got the same issues but after Elevenlabs subscription plan I got a voice into the call

cweems mentioned this issue Jun 8, 2024

Upgrade to Deepgram v3 Node SDK #37

Merged

cweems closed this as completed in #37 Jun 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Got the repo kinda working #26

Got the repo kinda working #26

SDCalvo commented Apr 4, 2024

SDCalvo commented Apr 4, 2024

SDCalvo commented Apr 4, 2024 •

edited

SDCalvo commented Apr 4, 2024 •

edited

cweems commented Apr 10, 2024

SDCalvo commented Apr 10, 2024

SDCalvo commented Apr 10, 2024

mercuryyy commented May 21, 2024

SDCalvo commented May 21, 2024

mercuryyy commented May 21, 2024 •

edited

mercuryyy commented May 21, 2024

SDCalvo commented May 21, 2024

ketan9712735468 commented Jun 3, 2024 •

edited

Got the repo kinda working #26

Got the repo kinda working #26

Comments

SDCalvo commented Apr 4, 2024

SDCalvo commented Apr 4, 2024

SDCalvo commented Apr 4, 2024 • edited

SDCalvo commented Apr 4, 2024 • edited

cweems commented Apr 10, 2024

SDCalvo commented Apr 10, 2024

SDCalvo commented Apr 10, 2024

mercuryyy commented May 21, 2024

SDCalvo commented May 21, 2024

mercuryyy commented May 21, 2024 • edited

mercuryyy commented May 21, 2024

SDCalvo commented May 21, 2024

ketan9712735468 commented Jun 3, 2024 • edited

SDCalvo commented Apr 4, 2024 •

edited

SDCalvo commented Apr 4, 2024 •

edited

mercuryyy commented May 21, 2024 •

edited

ketan9712735468 commented Jun 3, 2024 •

edited