Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MySQL support #1385

Merged
merged 7 commits into from Sep 7, 2021
Merged

MySQL support #1385

merged 7 commits into from Sep 7, 2021

Conversation

dehydr8
Copy link
Contributor

@dehydr8 dehydr8 commented Sep 1, 2021

MySQL is a popular DBMS and people have requested support (#787) for using it in label-studio.

Adding a config entry to core/base.py works but a migration fails:

DJANGO_DB_MYSQL: {
    'ENGINE': 'django.db.backends.mysql',
    'USER': get_env('MYSQL_USER', 'root'),
    'PASSWORD': get_env('MYSQL_PASSWORD', ''),
    'NAME': get_env('MYSQL_NAME', 'labelstudio'),
    'HOST': get_env('MYSQL_HOST', 'localhost'),
    'PORT': int(get_env('MYSQL_PORT', '3306')),
},
app_1    |   Applying tasks.0005_auto_20210309_1239...Traceback (most recent call last):
app_1    |   File "/usr/local/lib/python3.8/dist-packages/django/db/backends/utils.py", line 84, in _execute
app_1    |     return self.cursor.execute(sql, params)
app_1    |   File "/usr/local/lib/python3.8/dist-packages/django/db/backends/mysql/base.py", line 73, in execute
app_1    |     return self.cursor.execute(query, args)
app_1    |   File "/usr/local/lib/python3.8/dist-packages/MySQLdb/cursors.py", line 206, in execute
app_1    |     res = self._query(query)
app_1    |   File "/usr/local/lib/python3.8/dist-packages/MySQLdb/cursors.py", line 319, in _query
app_1    |     db.query(q)
app_1    |   File "/usr/local/lib/python3.8/dist-packages/MySQLdb/connections.py", line 259, in query
app_1    |     _mysql.connection.query(self, query)
app_1    | MySQLdb._exceptions.OperationalError: (1553, "Cannot drop index 'task_comple_task_id_07c6ca_idx': needed in a foreign key constraint")

It turns out that an index gets created automatically for every models.ForeignKey if db_index is not set to False explicitly.

A separate docker-compose-mysql.yml has been added to start label-studio with MySQL.

@dehydr8
Copy link
Contributor Author

dehydr8 commented Sep 1, 2021

One key thing to note is that MySQL <8 does not support skip_locked=True. If we run it with MySQL 5.x, next_task (Label All Tasks) won't function because it relies on skip_locked, but every thing else runs fine.

@makseq
Copy link
Member

makseq commented Sep 1, 2021

One key thing to note is that MySQL <8 does not support skip_locked=True. If we run it with MySQL 5.x, next_task (Label All Tasks) won't function because it relies on skip_locked, but every thing else runs fine.

Maybe we have to limit MySQL versions >= 8 ?

docker-compose-mysql.yml Outdated Show resolved Hide resolved
Copy link
Member

@farioas farioas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move docker-compose.mysql.yml to deploy folder

Copy link
Member

@farioas farioas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls ignore my previous requirement. From my side everything is ok, waiting for conclusion from dev team.

@makseq makseq merged commit 7f935c9 into HumanSignal:master Sep 7, 2021
@loveychen
Copy link

loveychen commented Sep 22, 2021

@dehydr8

I am trying to using MySQL as my default DB. I set all the environment variables list below:

export DJANGO_DB="mysql"
export MYSQL_USER="root"
export MYSQL_PASSWORD="xxxx"
export MYSQL_NAME="labelstudio"
export MYSQL_HOST="127.0.0.1"
export MYSQL_PORT=3306
export POSTGRE_HOST=

But while I am loading all the tasks for a project, an Exception like below raised:

Traceback (most recent call last):
  File "/data/anaconda3/envs/labelstudio/lib/python3.8/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "/data/anaconda3/envs/labelstudio/lib/python3.8/site-packages/django/db/backends/mysql/base.py", line 73, in execute
    return self.cursor.execute(query, args)
  File "/data/anaconda3/envs/labelstudio/lib/python3.8/site-packages/MySQLdb/cursors.py", line 206, in execute
    res = self._query(query)
  File "/data/anaconda3/envs/labelstudio/lib/python3.8/site-packages/MySQLdb/cursors.py", line 319, in _query
    db.query(q)
  File "/data/anaconda3/envs/labelstudio/lib/python3.8/site-packages/MySQLdb/connections.py", line 259, in query
    _mysql.connection.query(self, query)
MySQLdb._exceptions.ProgrammingError: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'DISTINCT `task_completion`.`completed_by_id` ) AS `annotators`, `task`.`id` AS `' at line 1")

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/data/anaconda3/envs/labelstudio/lib/python3.8/site-packages/rest_framework/views.py", line 506, in dispatch
    response = handler(request, *args, **kwargs)
  File "/data/chendiao/label-studio/label_studio/data_manager/api.py", line 167, in tasks
    page = self.paginate_queryset(queryset)
  File "/data/anaconda3/envs/labelstudio/lib/python3.8/site-packages/rest_framework/generics.py", line 171, in paginate_queryset
    return self.paginator.paginate_queryset(queryset, self.request, view=self)
  File "/data/chendiao/label-studio/label_studio/data_manager/api.py", line 52, in paginate_queryset
    aggregated = queryset.aggregate(
  File "/data/anaconda3/envs/labelstudio/lib/python3.8/site-packages/django/db/models/query.py", line 398, in aggregate
    return query.get_aggregation(self.db, kwargs)
  File "/data/anaconda3/envs/labelstudio/lib/python3.8/site-packages/django/db/models/sql/query.py", line 502, in get_aggregation
    result = compiler.execute_sql(SINGLE)
  File "/data/anaconda3/envs/labelstudio/lib/python3.8/site-packages/django/db/models/sql/compiler.py", line 1156, in execute_sql
    cursor.execute(sql, params)
  File "/data/anaconda3/envs/labelstudio/lib/python3.8/site-packages/sentry_sdk/integrations/django/__init__.py", line 500, in execute
    return real_execute(self, sql, params)
  File "/data/anaconda3/envs/labelstudio/lib/python3.8/site-packages/django/db/backends/utils.py", line 66, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File "/data/anaconda3/envs/labelstudio/lib/python3.8/site-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/data/anaconda3/envs/labelstudio/lib/python3.8/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "/data/anaconda3/envs/labelstudio/lib/python3.8/site-packages/django/db/utils.py", line 90, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/data/anaconda3/envs/labelstudio/lib/python3.8/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "/data/anaconda3/envs/labelstudio/lib/python3.8/site-packages/django/db/backends/mysql/base.py", line 73, in execute
    return self.cursor.execute(query, args)
  File "/data/anaconda3/envs/labelstudio/lib/python3.8/site-packages/MySQLdb/cursors.py", line 206, in execute
    res = self._query(query)
  File "/data/anaconda3/envs/labelstudio/lib/python3.8/site-packages/MySQLdb/cursors.py", line 319, in _query
    db.query(q)
  File "/data/anaconda3/envs/labelstudio/lib/python3.8/site-packages/MySQLdb/connections.py", line 259, in query
    _mysql.connection.query(self, query)
django.db.utils.ProgrammingError: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'DISTINCT `task_completion`.`completed_by_id` ) AS `annotators`, `task`.`id` AS `' at line 1")

And I found that, the generated SQL using PostgreSQL function ARRAY_AGG, see the detailed generated SQL for more detail information:

SELECT COUNT(`__col1`), SUM(`total_annotations`)
	, SUM(`total_predictions`)
FROM (
	SELECT COUNT(DISTINCT CASE 
			WHEN NOT `task_completion`.`was_cancelled` THEN `task_completion`.`id`
			ELSE NULL
		END) AS `total_annotations`
		, COUNT(DISTINCT CASE 
			WHEN `task_completion`.`was_cancelled` THEN `task_completion`.`id`
			ELSE NULL
		END) AS `cancelled_annotations`
		, COUNT(DISTINCT `prediction`.`id`) AS `total_predictions`
		, (
			SELECT DISTINCT U0.`created_at`
			FROM `task_completion` U0
				INNER JOIN `task` U1 ON U0.`task_id` = U1.`id`
			WHERE U0.`task_id` = `task`.`id`
				AND U1.`is_labeled`
			ORDER BY U0.`created_at` DESC
			LIMIT 1
		) AS `completed_at`, ARRAY_AGG(DISTINCT `task_completion`.`completed_by_id`) AS `annotators`, `task`.`id` AS `__col1`
	FROM `task`
		LEFT JOIN `task_completion` ON `task`.`id` = `task_completion`.`task_id`
		LEFT JOIN `prediction` ON `task`.`id` = `prediction`.`task_id`
	WHERE `task`.`project_id` = 1
	GROUP BY `task`.`id`
	ORDER BY NULL
) subquery

I do not know if I have missed some other settings, and I have set POSTGRE_HOST to be empty as referencing to docker-compose.mysql.yml.

@makseq
Copy link
Member

makseq commented Sep 22, 2021

@loveychen #1501 --- check this PR

@loveychen
Copy link

@loveychen #1501 --- check this PR

Ok, I will check whether it solve the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants