Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Participating in the Open Web with the Expectation You Won't Be Part of Data Harvests #109

Open
cgrobb opened this issue Feb 17, 2024 · 3 comments
Assignees

Comments

@cgrobb
Copy link

cgrobb commented Feb 17, 2024

The web is the ultimate digital venue for sharing professional works. Professional artists especially use the open web to share samples of their works as a mechanism for a) getting more work and b) being discovered by other artists who want to collaborate. The professional "work" here is often a raster or vector web format that has its own https URI(s).

Over the past few years, data harvesters have been harvesting these professional works for use as digital raw materials for generative AI models whose outputs are commercial, at an unprecedent scale. The quality and completeness of the outputs of these commercial systems depends primarily on the harvested inputs.

The issue is well-framed by the top resources in this Google query:
https://www.google.com/search?q=robots.txt+abuse+generative+AI

Given the scale of the economic harms, I was then surprised that this doc had no discussion/inclusion of intellectual property/licensing/robots.txt abuse and the discussion of data rights doesn't address professional creative works as data.

I'd like to see the doc incorporate professional data rights and the right to opt out of data harvesting as fundamental ethical web principles and reference any related standards work.

@cgrobb cgrobb changed the title Participating in the Open Web with the Expectation You Won't Be Part of Data Harvest Participating in the Open Web with the Expectation You Won't Be Part of Data Harvests Feb 17, 2024
@rhiaro
Copy link
Contributor

rhiaro commented Feb 19, 2024

Thanks for raising this. We discussed this in our breakout today, and we agree that what you describe is a harmful practice, both for end users and the integrity of the web platform as a whole. At the moment we think that a discussion of IP/copyright specifically is too low level for the EWP, but it connects to broader discussions we've been having recently about the web being used for harmful exploitative/extractive practices in general. We anticipate that we will at some point write something on this topic (like a Finding) that goes into more detail, which we can then link to from an existing EWP (for example, "does not cause harm" or "enhances individual control and power").

@cgrobb
Copy link
Author

cgrobb commented Feb 19, 2024

Thanks for the quick reply. Good to know of the broader discussion.

I'm (just now) seeing the existing W3C work on policy expression (https://en.wikipedia.org/wiki/ODRL).

Here's a resource that frames data rights as human rights and conversely: https://www.regulations.gov/comment/COLC-2023-0006-10317
"This would lead to the closing up of the web as organizations protect themselves in other ways, the disappearance
of revenue streams for many worthwhile jobs (like Artist or Journalist), and the loss of all human
rights included within data rights
."

@csarven
Copy link

csarven commented Feb 19, 2024

There is also the W3C CG work on the Data Privacy Vocabulary :

[..] enables expressing machine-readable metadata about the use and processing of personal data based on legislative requirements [..]

@torgo torgo added this to the 2024-03-04-week milestone Mar 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants