Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demographic data is sometimes doubled per client ID #117

Open
ftyers opened this issue Oct 14, 2021 · 6 comments
Open

Demographic data is sometimes doubled per client ID #117

ftyers opened this issue Oct 14, 2021 · 6 comments

Comments

@ftyers
Copy link

ftyers commented Oct 14, 2021

In some cases a given client_id might have more than one demographic datapoint (e.g. gender or age) linked to it. Often this is blank vs. male/female or blank vs. some age.

This is probably because people recorded some clips then made a profile, or because they became logged out.

In any case it would be good (and probably safe) to replace blank in the field with the more specific datapoint if and only if there are no other datapoints associated with the client_id.

Some examples from Turkish, with thanks to @HarikalarKutusu!

image

@HarikalarKutusu
Copy link
Contributor

I'm not sure of the reason thou... To my experience, you can do 100 recordings per hour, if done right (read silent / record / listen / re-record if necessary). If not done right, it may increase to 150-200 recs/hour...

As the id is calculated from session-id, that would mean (ex: line 4) someone made 374 recordings (2-4 hours) then decided to register. This seems a bit odd. There are 26 such anomalies in the Turkish dataset.

@HarikalarKutusu
Copy link
Contributor

OK, I can see how this is possible. During/after the server upgrades many of us got kicked out of the system while we had to re-login multiple times a day. I saw some people in our community complain about validating their own sentences which made me aware of this issue.

If a user starts by registering & logging in with demographic info filled and later kicked out but continues without logging in this might happen.

In any case it would be good (and probably safe) to replace blank in the field with the more specific datapoint if and only if there are no other datapoints associated with the client_id.

I think this will be a very logical solution.

@HarikalarKutusu
Copy link
Contributor

@danielinux7
Copy link

danielinux7 commented Jan 31, 2022

In some cases a given client_id might have more than one demographic datapoint

Often, I create accounts on my phone/notebook to allow people to record and validate, the reasons are either because their phone is not supported, or they can't do it themselves, elderly need bigger screen to read so I use a notebook, if the client_id is associated with the device, then you will find one client_id with many demographic data points.

@ftyers
Copy link
Author

ftyers commented Jan 31, 2022

The client_id should be associated with the browser session. Using a single session or single account with multiple people is advised against iirc.

@danielinux7
Copy link

It seems the session is not terminated when the tab or the Chrome browser is closed on Android. It's possible that when I create multiple accounts on the same device, might have the same client_ID.

> is advised against iirc.

I'm not sure what iirc is?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants