Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audio stream recording file extension is not detected #310

Open
F0rzend opened this issue Jul 16, 2022 · 6 comments
Open

Audio stream recording file extension is not detected #310

F0rzend opened this issue Jul 16, 2022 · 6 comments
Assignees

Comments

@F0rzend
Copy link

F0rzend commented Jul 16, 2022

I have next bytes:

header := []byte{112, 44, 209, 245, 107, 234, 157, 24, 68, 65, 203, 234, 91, 217, 92, 73, 96, 16, 165, 1, 242, 4, 238, 116, 56, 128, 52, 53, 117, 94, 102, 36, 253, 83, 85, 108, 139, 149, 47, 151, 242, 21, 53, 27, 182, 95, 85, 145, 197, 130, 94, 11, 37, 107, 43, 248, 53, 209, 117, 91, 174, 48, 44, 49, 35, 126, 230, 33, 171, 150, 173, 81, 214, 149, 99, 220, 174, 89, 239, 211, 127, 243, 91, 252, 165, 55, 41, 36, 88, 82, 99, 123, 41, 172, 169, 106, 210, 218, 167, 238, 234, 95, 175, 91, 120, 229, 188, 113, 199, 120, 210, 217, 164, 169, 115, 243, 207, 154, 255, 254, 111, 255, 247, 87, 255, 255, 255, 227, 7, 0, 24, 118, 82, 29, 123, 108, 221, 142, 183, 44, 159, 75, 88, 210, 185, 137, 3, 24, 180, 233, 151, 146, 152, 130, 193, 143, 132, 32, 4, 160, 228, 48, 236, 120, 48, 207, 65, 21, 145, 25, 144, 252, 2, 22, 168, 204, 4, 1, 166, 182, 54, 148, 185, 223, 116, 209, 12, 235, 119, 95, 140, 61, 92, 42, 55, 156, 64, 183, 165, 56, 159, 73, 167, 190, 14, 95, 206, 66, 211, 90, 237, 134, 71, 46, 92, 174, 10, 87, 202, 53, 105, 74, 212, 13, 131, 206, 207, 38, 139, 42, 198, 146, 142, 159, 249, 255, 251, 144, 196, 205, 128, 38, 37, 169, 73, 249, 205, 0, 67, 51, 44, 234, 119, 55, 128, 0, 190, 78, 119, 255, 185, 198}

It is the record of stream of radio-t.com.
I expected to get an mp3 file extension, but as a mimetype I got application/octet-stream.

Version of the library you are using
v1.4.1

Output of go version
go version go1.18.1 linux/amd64

Additional context
I wrote a test for this.

package test

import (
	"github.com/gabriel-vasile/mimetype"
	"github.com/stretchr/testify/assert"
)

func TestFileDetection(t *testing.T) {
	t.Parallel()

	// Radio-T stream header
	header := []byte{
		112, 44, 209, 245, 107, 234, 157, 24, 68, 65, 203, 234, 91, 217, 92, 73, 96, 16, 165, 1, 242, 4, 238, 116, 56,
		128, 52, 53, 117, 94, 102, 36, 253, 83, 85, 108, 139, 149, 47, 151, 242, 21, 53, 27, 182, 95, 85, 145, 197, 130,
		94, 11, 37, 107, 43, 248, 53, 209, 117, 91, 174, 48, 44, 49, 35, 126, 230, 33, 171, 150, 173, 81, 214, 149, 99,
		220, 174, 89, 239, 211, 127, 243, 91, 252, 165, 55, 41, 36, 88, 82, 99, 123, 41, 172, 169, 106, 210, 218, 167,
		238, 234, 95, 175, 91, 120, 229, 188, 113, 199, 120, 210, 217, 164, 169, 115, 243, 207, 154, 255, 254, 111, 255,
		247, 87, 255, 255, 255, 227, 7, 0, 24, 118, 82, 29, 123, 108, 221, 142, 183, 44, 159, 75, 88, 210, 185, 137, 3,
		24, 180, 233, 151, 146, 152, 130, 193, 143, 132, 32, 4, 160, 228, 48, 236, 120, 48, 207, 65, 21, 145, 25, 144,
		252, 2, 22, 168, 204, 4, 1, 166, 182, 54, 148, 185, 223, 116, 209, 12, 235, 119, 95, 140, 61, 92, 42, 55, 156,
		64, 183, 165, 56, 159, 73, 167, 190, 14, 95, 206, 66, 211, 90, 237, 134, 71, 46, 92, 174, 10, 87, 202, 53, 105,
		74, 212, 13, 131, 206, 207, 38, 139, 42, 198, 146, 142, 159, 249, 255, 251, 144, 196, 205, 128, 38, 37, 169, 73,
		249, 205, 0, 67, 51, 44, 234, 119, 55, 128, 0, 190, 78, 119, 255, 185, 198,
	}

	mime := mimetype.Detect(header)
	fileExtension := mime.Extension()

	t.Log(mime)

	if fileExtension == "" {
		t.Errorf("File extension not detected")
	}
}

I also have a recording of another segment of this stream. The type of this file is also not defined:
https://drive.google.com/file/d/1sL18cF-zwN7txDfm30QnZoLlozcG4-5g/view?usp=sharing

@gabriel-vasile gabriel-vasile self-assigned this Jul 21, 2022
@gabriel-vasile
Copy link
Owner

I'm looking into this.
Linux file utility (which is, I'd say, best file format detection tool) also fails to detect the samples.

What program/library was used to create these samples?

@gabriel-vasile
Copy link
Owner

Mp3 files are made up of frames.
The problem seems to be that the test recordings start with an incomplete frame (maybe because they have been streamed?)
The first complete mp3 frame starts at index 126 in the test case you provided.

package test

import (
	"github.com/gabriel-vasile/mimetype"
	"github.com/stretchr/testify/assert"
)

func TestFileDetection(t *testing.T) {
	t.Parallel()

	// Radio-T stream header
	header := []byte{
		112, 44, 209, 245, 107, 234, 157, 24, 68, 65, 203, 234, 91, 217, 92, 73, 96, 16, 165, 1, 242, 4, 238, 116, 56,
		128, 52, 53, 117, 94, 102, 36, 253, 83, 85, 108, 139, 149, 47, 151, 242, 21, 53, 27, 182, 95, 85, 145, 197, 130,
		94, 11, 37, 107, 43, 248, 53, 209, 117, 91, 174, 48, 44, 49, 35, 126, 230, 33, 171, 150, 173, 81, 214, 149, 99,
		220, 174, 89, 239, 211, 127, 243, 91, 252, 165, 55, 41, 36, 88, 82, 99, 123, 41, 172, 169, 106, 210, 218, 167,
		238, 234, 95, 175, 91, 120, 229, 188, 113, 199, 120, 210, 217, 164, 169, 115, 243, 207, 154, 255, 254, 111, 255,
		247, 87, 255, 255, 255, 227, 7, 0, 24, 118, 82, 29, 123, 108, 221, 142, 183, 44, 159, 75, 88, 210, 185, 137, 3,
		24, 180, 233, 151, 146, 152, 130, 193, 143, 132, 32, 4, 160, 228, 48, 236, 120, 48, 207, 65, 21, 145, 25, 144,
		252, 2, 22, 168, 204, 4, 1, 166, 182, 54, 148, 185, 223, 116, 209, 12, 235, 119, 95, 140, 61, 92, 42, 55, 156,
		64, 183, 165, 56, 159, 73, 167, 190, 14, 95, 206, 66, 211, 90, 237, 134, 71, 46, 92, 174, 10, 87, 202, 53, 105,
		74, 212, 13, 131, 206, 207, 38, 139, 42, 198, 146, 142, 159, 249, 255, 251, 144, 196, 205, 128, 38, 37, 169, 73,
		249, 205, 0, 67, 51, 44, 234, 119, 55, 128, 0, 190, 78, 119, 255, 185, 198,
	}

	mime := mimetype.Detect(header[126:])
	fileExtension := mime.Extension()

	t.Log(mime)

	if fileExtension == "" {
		t.Errorf("File extension not detected")
	}
}

That being said, I'm not sure if mimetype should search for the first frame in input.
Looking what other projects are doing, file/file and apache/tika don't search for header either.

On the other hand, the mp3 specification says that decoders should search for beginning of frame if they don't find it at index 0 in input (that's why the recording plays fine, even if it is truncated).

@F0rzend
Copy link
Author

F0rzend commented Jul 27, 2022

What program/library was used to create these samples?

It is the record of radio-t stream, created using io.Copy
https://github.com/F0rzend/radiot_dumper/blob/master/copier/stream_copier.go#L82

@gabriel-vasile
Copy link
Owner

This file: https://drive.google.com/file/d/1sL18cF-zwN7txDfm30QnZoLlozcG4-5g/view?usp=sharing is the original mp3 from radio-t.com or was it saved through radiot_dumper? I think there are some problems with the way StreamCopier saves files.

@gabriel-vasile
Copy link
Owner

Unfortunately, the problem has not been resolved. Apparently the point is that this is a stream, and not just a recording

I'm not sure about that.
I saved some mp3 segments from these radio stations and they are all detected correctly.
ex:
https://stream.rcast.net/200399.mp3
https://stream.rcast.net/200292.mp3
https://stream.rcast.net/200167.mp3

Can you provide the URL to the radio stream that reproduces the issue?

@F0rzend
Copy link
Author

F0rzend commented Jul 31, 2022

Unfortunately, the problem has not been resolved. Apparently the point is that this is a stream, and not just a recording

I'm not sure about that.
I saved some mp3 segments from these radio stations and they are all detected correctly.
ex:
https://stream.rcast.net/200399.mp3
https://stream.rcast.net/200292.mp3
https://stream.rcast.net/200167.mp3

Can you provide the URL to the radio stream that reproduces the issue?

I write records from https://stream.radio-t.com/. But the stream starts once a week. Saturday at 20:00 UTC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants