Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Liquid Communism
Mar 9, 2004

коммунизм хранится в яичках

GlyphGryph posted:

They don't reference things, though? Once the AI is created and trained it doesn't have any of the original artwork to reference, and conceptually it doesn't make any sense anyhow. That's not how these things work.

The AI's entire 'memory' consists of its training set. Hence why you cannot remove something from said training set without retraining the AI, or it will continue to use what has been indexed.

It is incapable of creativity. It is simply pulling elements from training data that is tagged similarly to the prompt given.

This is a large part of why the EU is looking at it sideways, as present designs cannot comply with the GDPR both in proving they do not contain PII, or obeying right to be forgotten.

Adbot
ADBOT LOVES YOU

Tree Reformat
Apr 2, 2022

by Fluffdaddy

Pvt. Parts posted:

So I was pretty firmly in the "AI models synthesize their training data into new images/outputs in much the same way a human artist pulls inspiration from various sources in a manner that we are already cool with" mindset but have been reckoning with more with this "creative merit added" aspect recently. A helpful framework for me has been likening it to music, where we have a pretty established trend of remixes and mashups with clear (and sometimes not so clear) lineages. How are remixes and (perhaps the more applicable analogy) mashups of songs treated by copyright law? Very cursory Googling and a bit of common sense would reveal discussions of the "artistic merit" of a remix or mashup apart from it's source material which allows it to be legal under fair use.

What is the artistic merit of what an AI model does to its training data, which is in most cases at the moment is basically via a random diffusion process whereby noise is iteratively "improved" to the point of resembling something the model has seen before?

I have noticed a bit of a backlash against remix culture in general by the more ardent copyright defenders during all this.

https://twitter.com/neilturkewitz/status/1655631363704233984

GlyphGryph
Jun 23, 2013

Down came the glitches and burned us in ditches and we slept after eating our dead.
... these AI's do not retain their training sets after they have been trained. Nothing is "indexed". If it did and was, if it was simply "pulling elements from its training data" then you wouldn't have to retrain the AI - you have to retrain the AI specifically because it doesn't retain any of the training set, and so there's no way to tell it what to remove because it hasn't the slightest ability to reference anything it was trained on - that stuff no longer exists.

There's been an entire thread now discussing this technology, you've been here since page one, and you still don't have the slightest idea what you are talking about - and yet you confidently propose "solutions" that would have little to no impact on the AI (except perhaps by unintentionally banning it completely) and would far more likely hurt actual artists

You're basically the perfect supporting argument for most of the posts I've made in this thread, keep doing what you're doing.

GlyphGryph fucked around with this message at 14:35 on May 22, 2023

Bwee
Jul 1, 2005

Liquid Communism posted:

The AI's entire 'memory' consists of its training set. Hence why you cannot remove something from said training set without retraining the AI, or it will continue to use what has been indexed.

It is incapable of creativity. It is simply pulling elements from training data that is tagged similarly to the prompt given.

This is a large part of why the EU is looking at it sideways, as present designs cannot comply with the GDPR both in proving they do not contain PII, or obeying right to be forgotten.

AI does not work like this

cat botherer
Jan 6, 2022

I am interested in most phases of data processing.

Liquid Communism posted:

The AI's entire 'memory' consists of its training set. Hence why you cannot remove something from said training set without retraining the AI, or it will continue to use what has been indexed.

It is incapable of creativity. It is simply pulling elements from training data that is tagged similarly to the prompt given.

This is a large part of why the EU is looking at it sideways, as present designs cannot comply with the GDPR both in proving they do not contain PII, or obeying right to be forgotten.
Let's just stop and think how much space ChatGPT or StableDiffusion would take up if they retained their entire training set...

BrainDance
May 8, 2007

Disco all night long!

Liquid Communism posted:

The AI's entire 'memory' consists of its training set. Hence why you cannot remove something from said training set without retraining the AI, or it will continue to use what has been indexed.

It is incapable of creativity. It is simply pulling elements from training data that is tagged similarly to the prompt given.

This is a large part of why the EU is looking at it sideways, as present designs cannot comply with the GDPR both in proving they do not contain PII, or obeying right to be forgotten.

Literally every single detail about this is wrong.

I'm not trying to be a jerk, it just is. As others have mentioned, you don't seem to understand how AIs actually work, but you also don't seem to understand how human creativity works either. Human creativity is pulling things from our senses and memory to manipulate them into something new.

I don't paint, but I write and have had a good amount of creative works published. All of that comes entirely from things outside of me. Things I've read before, things I've seen, things I've heard, etc. I made something new out of them, but it was that, out of them not without them. Someone is just as capable of doing that with an AI, and we've seen that already. We've seen people have ideas for something new that they then create with the AI.

Like all my projects, the AI language models I've trained by training the model on two kinds of philosophy and not differentiating them to create something that fuses them. That is creativity and I used the AI to accomplish that. The foundation for that creativity comes from the two philosophies themselves and, I dunno, life?

This should be very obvious because the alternative would be either a completely random process or would be just magic, stuff coming from nowhere.

SubG
Aug 19, 2004

It's a hard world for little things.

Clarste posted:

Then force it to keep track of that? Rewrite the programs so they have to. If they can't, then it's illegal. Boom, solved.
One of the first practical (for some definition of "practical") machine learning algorithms was implemented, in the early '60s, entirely using matchboxes. It was called MENACE, the Matchbox Educable Noughts and Crosses Engine, and it plays what in the US is usually called tic-tac-toe. It always plays the "O" player, who goes second.

Without handling all the fiddly implementation details, it works something like: take a shitload of matchboxes and draw every possible state of a tic-tac-toe game on one of each. Now get a shitload of coloured beads, nine different colours. Assign a colour to each of the positions on the tic-tac-toe board. Now into each of your matchboxes put a set number of coloured beads for each possible next move given the current state drawn on the matchbox, where the number of beads of each colour is the same as the number of possible moves. So for example each "first AI choice" box will have 8 beads each of 8 different colours in it (for a total of 64 beads), each "second AI choice" box will have 6 beads each of 6 different colours (for a total of 36), and so on.

Then you train the system by playing it. The human player picks a move, you get the box corresponding to the new board state, shake it up, and randomly pick a bead out of it. This selects the AI's move. After each AI choice, set aside the matchbox and bead selected for that move. At the end of the game, if the AI ends up winning, return each selected bead to the box it came from along with one additional bead of the same colour. If the AI loses, remove the selected bead.

You can fiddle around with the training rate by varying the starting number of beads and the rate at which you add or remove beads. But ignoring pathological cases, the system will eventually find the Nash equilibrium.

Okay, now say you've played a couple hundred or so games on the system and it's reached the point where it will always play to a draw.

Point to where, in the current setup, it is "plagiarizing", or whatever you want to call it, the strategy of the human player in, say, the third training game. Like the kind of attribution or "linking" or whatever that you're suggesting be required of AIs, how would you implement it for MENACE?

Serotoning
Sep 14, 2010

D&D: HASBARA SQUAD
HANG 'EM HIGH


We're fighting human animals and we act accordingly

SubG posted:

One of the first practical (for some definition of "practical") machine learning algorithms was implemented, in the early '60s, entirely using matchboxes. It was called MENACE, the Matchbox Educable Noughts and Crosses Engine, and it plays what in the US is usually called tic-tac-toe. It always plays the "O" player, who goes second.

Without handling all the fiddly implementation details, it works something like: take a shitload of matchboxes and draw every possible state of a tic-tac-toe game on one of each. Now get a shitload of coloured beads, nine different colours. Assign a colour to each of the positions on the tic-tac-toe board. Now into each of your matchboxes put a set number of coloured beads for each possible next move given the current state drawn on the matchbox, where the number of beads of each colour is the same as the number of possible moves. So for example each "first AI choice" box will have 8 beads each of 8 different colours in it (for a total of 64 beads), each "second AI choice" box will have 6 beads each of 6 different colours (for a total of 36), and so on.

Then you train the system by playing it. The human player picks a move, you get the box corresponding to the new board state, shake it up, and randomly pick a bead out of it. This selects the AI's move. After each AI choice, set aside the matchbox and bead selected for that move. At the end of the game, if the AI ends up winning, return each selected bead to the box it came from along with one additional bead of the same colour. If the AI loses, remove the selected bead.

You can fiddle around with the training rate by varying the starting number of beads and the rate at which you add or remove beads. But ignoring pathological cases, the system will eventually find the Nash equilibrium.

Okay, now say you've played a couple hundred or so games on the system and it's reached the point where it will always play to a draw.

Point to where, in the current setup, it is "plagiarizing", or whatever you want to call it, the strategy of the human player in, say, the third training game. Like the kind of attribution or "linking" or whatever that you're suggesting be required of AIs, how would you implement it for MENACE?

It's a fun comparison, but one that I think falls short of reach when you consider the objectivity of game performance vs. something like Art production which has no tangible meta-goal. In other words, there is no ultimate creative effort, and art is for all practical purposes infinitely iterable in a way that a Noughts and Crosses playing machine is not. For our purposes of this discussion, this basically translates into MENACE having only a claim on a particular way of playing Tic Tac Toe, but not it's rules which by definition must exist freely available to all (public domain). The lapse in analogy is important I think because MENACE is being judged by an already-known answer, whereas creative AI have only their inputs to be judged against. It's sorta like the difference between reasoning by induction vs. deduction.

Private Speech
Mar 30, 2011

I HAVE EVEN MORE WORTHLESS BEANIE BABIES IN MY COLLECTION THAN I HAVE WORTHLESS POSTS IN THE BEANIE BABY THREAD YET I STILL HAVE THE TEMERITY TO CRITICIZE OTHERS' COLLECTIONS

IF YOU SEE ME TALKING ABOUT BEANIE BABIES, PLEASE TELL ME TO

EAT. SHIT.


Just because I know someone will bring it up - yes you can use a model to figure out related tokens in an image; however consider the following thought experiment:

You have a model which, among other things, is trained to copy some specific author's style (say Greg Rutkowski). Now you take an entirely different model which does not having anything from Greg in it's dataset or tokens, but it's still a good model, and you feed it tokens such that it creates an image approximating the style of Greg Rutkowski. Then you take the output and give it to the first model to find related tokens - and the model correctly identifies the image as Greg-style. Have you committed copyright infringement? After all you have an AI-generated image clearly in the style of Greg Rutkowski.

KillHour
Oct 28, 2007


Pvt. Parts posted:

It's a fun comparison, but one that I think falls short of reach when you consider the objectivity of game performance vs. something like Art production which has no tangible meta-goal. In other words, there is no ultimate creative effort, and art is for all practical purposes infinitely iterable in a way that a Noughts and Crosses playing machine is not. For our purposes of this discussion, this basically translates into MENACE having only a claim on a particular way of playing Tic Tac Toe, but not it's rules which by definition must exist freely available to all (public domain). The lapse in analogy is important I think because MENACE is being judged by an already-known answer, whereas creative AI have only their inputs to be judged against. It's sorta like the difference between reasoning by induction vs. deduction.

I believe SubG's point was more about the concrete details of the training. If you pretend that the goal of Tic Tac Toe is to make pretty pictures via painting characters on a grid, you can ignore the difference in application. It's more about "if the algorithm weights itself to be more likely to fill in the same square that you filled in during the same situation, is that plagiarism?" If you think it is, then learning anything in any way has to be plagiarism.

SubG
Aug 19, 2004

It's a hard world for little things.

Pvt. Parts posted:

It's a fun comparison, but one that I think falls short of reach when you consider the objectivity of game performance vs. something like Art production which has no tangible meta-goal.
There's absolutely nothing intangible (in the sense you seem to mean it) about the output of e.g. Stable Diffusion. Push the button, get an image. It's an algorithm, just like MENACE.

Put in slightly different terms, if you think the difference is in some intangible [something] involving "Art", then the locus of that difference is strictly in your (generic "your") head, not in the AI.

Clarste
Apr 15, 2013

Just how many mistakes have you suffered on the way here?

An uncountable number, to be sure.

GlyphGryph posted:

If your goal is to ban the tech, just ban the tech, don't wrap it up in a dressing that could potentially have serious knockoff effects and hurt real artists.

We do not require humans to cite their stylistic sources.

You specifically said there is no copyright element involved in this proposal - that was how it would avoid hurting the careers of existing artists. That this would be an additional, AI-only regulation unrelated to it, and that it would serve as a defacto ban because its custom tailored to be impossible. This is why I'm sure any such proposal would hurt artists - if you can't prevent it from becoming a copyright issue within the span of a few minutes, how could we trust legislators to?

The goal would be to leave it open for a hypothetical AI art program that would play nice with copyright. Frankly, "banning the tech" is also way more complicated because then you have to define the tech, legally speaking, and lol legislators have no loving idea where to start with that. They'd probably end up banning photoshop by accident. It's actually far easier to just outline what you don't want it doing and saying "don't do that." They do have at least some idea of what "copyrighted material" is. And I never said it wasn't a copyright issue, it is blatantly a copyright issue on the face of it. What I said was that we'd just have to arbitrarily claim that computers are not considered "creators" for the purposes of fair use laws or whatever, and all existing copyright laws will hit them square in the face without any other changes.

Like, to be perfectly honest I don't care about how the technology works, all I'm looking at is what it does (ie: what is the input, what is the output), and whether or not we, as a society, want something that does that. Whether feeding it copyrighted images comes before or after "training" seems entirely irrelevant.

Clarste fucked around with this message at 16:39 on May 22, 2023

Serotoning
Sep 14, 2010

D&D: HASBARA SQUAD
HANG 'EM HIGH


We're fighting human animals and we act accordingly

Private Speech posted:

Just because I know someone will bring it up - yes you can use a model to figure out related tokens in an image; however consider the following thought experiment:

You have a model which, among other things, is trained to copy some specific author's style (say Greg Rutkowski). Now you take an entirely different model which does not having anything from Greg in it's dataset or tokens, but it's still a good model, and you feed it tokens such that it creates an image approximating the style of Greg Rutkowski. Then you take the output and give it to the first model to find related tokens - and the model correctly identifies the image as Greg-style. Have you committed copyright infringement? After all you have an AI-generated image clearly in the style of Greg Rutkowski.

To me, this extra middle-man step of "use one model to feed another model tokens" has no bearing on the ultimate effort. In this case, the two models can be treated to be acting as one in whole. You don't get to transfer responsibility by abstracting away individual steps of the causal process. We say "trespassing" for both the act of entering a protected space and the lingering therein, nothing useful is to be gained legally by considering each act separately if in the first place the act was implied to be committed by one party. You have committed the transformative of data from one form to a highly related other. Whether or not you have committed plagiarism is a function of how similar the end product is to it's zygote, not the process by which it came to be.

GlyphGryph
Jun 23, 2013

Down came the glitches and burned us in ditches and we slept after eating our dead.
Have you forgotten the proposal we're actually discussing? There are no existing copyright rules anything like that. Copyright explicitly and intentionally doesn't cover what is being discussed.

In order for it to be a copyright issue, copyright would have to be extended to grant that coverage, at which point we are back to my original argument

quote:

If you decide to cover artistic styles with copyright, you are loving over and absolutely massive number of artists forevermore.

To which you responded that it wouldn't cover humans... which would by definition require it not to expand the copyright system.

Clarste posted:

Frankly, "banning the tech" is also way more complicated because then you have to define the tech, legally speaking, and lol legislators have no loving idea where to start with that.

You have to do that for what you're arguing as well - you can't just, say, exempt machines from fair use and say if you use a machine to create a copy of copywritable work you're liable. That would make browsing these forums or watching Netflix or basically anything online illegal.

GlyphGryph fucked around with this message at 16:45 on May 22, 2023

Clarste
Apr 15, 2013

Just how many mistakes have you suffered on the way here?

An uncountable number, to be sure.
The idea is to define the program as something that cannot "learn" and can only "copy" so therefore anything in its training set is copying by definition. Like tracing. A computer cannot have a style, it can only trace things.

GlyphGryph posted:

You have to do that for what you're arguing as well - you can't just, say, exempt machines from fair use and say if you use a machine to create a copy of copywritable work you're liable. That would make browsing these forums or watching Netflix or basically anything online illegal.

It is literally already illegal to make copies with a machine! Netflix is giving you permission to watch it on your computer, but not to spread it any further than that! It's in the Terms of Service you skipped!

Serotoning
Sep 14, 2010

D&D: HASBARA SQUAD
HANG 'EM HIGH


We're fighting human animals and we act accordingly

SubG posted:

There's absolutely nothing intangible (in the sense you seem to mean it) about the output of e.g. Stable Diffusion. Push the button, get an image. It's an algorithm, just like MENACE.

Put in slightly different terms, if you think the difference is in some intangible [something] involving "Art", then the locus of that difference is strictly in your (generic "your") head, not in the AI.

The intangibility is in the fact that there is no overarching ruleset by which to defend the choices "made" in creating art. I can defend myself objectively that I was "copying" the Tic Tac Toe AI only because I, on my own, discovered the optimal strategy for the game. The sense that there is something to "take" or plagiarize here stems from the fact that there is an objectively right answer attainable in the first place.

SubG
Aug 19, 2004

It's a hard world for little things.

Pvt. Parts posted:

To me, this extra middle-man step of "use one model to feed another model tokens" has no bearing on the ultimate effort. In this case, the two models can be treated to be acting as one in whole. You don't get to transfer responsibility by abstracting away individual steps of the causal process. We say "trespassing" for both the act of entering a protected space and the lingering therein, nothing useful is to be gained legally by considering each act separately if in the first place the act was implied to be committed by one party. You have committed the transformative of data from one form to a highly related other. Whether or not you have committed plagiarism is a function of how similar the end product is to it's zygote, not the process by which it came to be.
He's talking about the difference between having a model trained specifically on Andy Warhol's paintings of Cambell's soup cans and having a general model that you ask to produce a painting of a Cambell's soup can using clean line art and flat colouring. Even if the second model hasn't been trained on Warhol's paintings, you'll probably end up with something very similar. But if the model wasn't trained on Warhol's paintings, how could it be the thing infringing on Warhol's work, as opposed to the person behind the keyboard.

Private Speech
Mar 30, 2011

I HAVE EVEN MORE WORTHLESS BEANIE BABIES IN MY COLLECTION THAN I HAVE WORTHLESS POSTS IN THE BEANIE BABY THREAD YET I STILL HAVE THE TEMERITY TO CRITICIZE OTHERS' COLLECTIONS

IF YOU SEE ME TALKING ABOUT BEANIE BABIES, PLEASE TELL ME TO

EAT. SHIT.


Pvt. Parts posted:

To me, this extra middle-man step of "use one model to feed another model tokens" has no bearing on the ultimate effort. In this case, the two models can be treated to be acting as one in whole. You don't get to transfer responsibility by abstracting away individual steps of the causal process. We say "trespassing" for both the act of entering a protected space and the lingering therein, nothing useful is to be gained legally by considering each act separately if in the first place the act was implied to be committed by one party. You have committed the transformative of data from one form to a highly related other. Whether or not you have committed plagiarism is a function of how similar the end product is to it's zygote, not the process by which it came to be.

I think you misunderstood my point, to put it in simpler terms:

You want to create an image in the style of Greg Rutkowski. You look at his images and identify the features of his style, yourself, in your head. Then you feed those features into an AI which does not know about Greg or his art and have it create an image in that style.

Now you have an image in the style of Greg Rutkowski made by an AI, but without the AI being trained on it (because you the human made the necessary connections yourself).

StratGoatCom
Aug 6, 2019

Our security is guaranteed by being able to melt the eyeballs of any other forum's denizens at 15 minutes notice


Pvt. Parts posted:

To me, this extra middle-man step of "use one model to feed another model tokens" has no bearing on the ultimate effort. In this case, the two models can be treated to be acting as one in whole. You don't get to transfer responsibility by abstracting away individual steps of the causal process. We say "trespassing" for both the act of entering a protected space and the lingering therein, nothing useful is to be gained legally by considering each act separately if in the first place the act was implied to be committed by one party. You have committed the transformative of data from one form to a highly related other. Whether or not you have committed plagiarism is a function of how similar the end product is to it's zygote, not the process by which it came to be.

It's like a crypto tumbler - it exists to obfuscate the origin of what went in; basically the same, but for IP.

Frankly, both tagging and an ability to determine data set contents is inevitable, because of (a) the need to efficiently find misinfo, and (b) any other finding basically destroys the ability of the IP system to function. Granted, again, as crypto to banks, this to copyright.

Also, lmao at that open source BS; it's not gonna be good for that scene if it's used as an end run around the law, and CC specifically needs a rewrite to prevent LAION style chicanery. And the copyright infringement isn't the output, it's the model itself.

StratGoatCom fucked around with this message at 16:55 on May 22, 2023

SubG
Aug 19, 2004

It's a hard world for little things.

Pvt. Parts posted:

The intangibility is in the fact that there is no overarching ruleset by which to defend the choices "made" in creating art. I can defend myself objectively that I was "copying" the Tic Tac Toe AI only because I, on my own, discovered the optimal strategy for the game. The sense that there is something to "take" or plagiarize here stems from the fact that there is an objectively right answer attainable in the first place.
Again, whatever distinction you're trying to make here is taking place purely in your head, not in the AI. A pile of matchboxes doesn't "know" that it's "only" playing tic-tac-toe, and SD doesn't "know" that it's making (or trying to make, or plagiarizing, or whatever you want to call it) art. The only difference is that the math involved in MENACE is a lot simpler.

Bar Ran Dun
Jan 22, 2006




Fake pentagon bombing headlines on Twitter this morning based on AI generated images. Most of the tweets about it already pulled.

cat botherer
Jan 6, 2022

I am interested in most phases of data processing.

Clarste posted:

The idea is to define the program as something that cannot "learn" and can only "copy" so therefore anything in its training set is copying by definition. Like tracing. A computer cannot have a style, it can only trace things.

It is literally already illegal to make copies with a machine! Netflix is giving you permission to watch it on your computer, but not to spread it any further than that! It's in the Terms of Service you skipped!
This would be a good point if these models copy images, but they don’t.

StratGoatCom
Aug 6, 2019

Our security is guaranteed by being able to melt the eyeballs of any other forum's denizens at 15 minutes notice


cat botherer posted:

This would be a good point if these models copy images, but they don’t.

They do fairly regularly, and it is quite possible to figure out what it trained on.

Private Speech
Mar 30, 2011

I HAVE EVEN MORE WORTHLESS BEANIE BABIES IN MY COLLECTION THAN I HAVE WORTHLESS POSTS IN THE BEANIE BABY THREAD YET I STILL HAVE THE TEMERITY TO CRITICIZE OTHERS' COLLECTIONS

IF YOU SEE ME TALKING ABOUT BEANIE BABIES, PLEASE TELL ME TO

EAT. SHIT.


StratGoatCom posted:

They do fairly regularly, and it is quite possible to figure out what it trained on.

See my post above about that, the more complicated one.

Or the Campbell soup example, that works too.

Basically you can figure out similarities in the training dataset to a given image, but it doesn't mean that image was created with that dataset.

StratGoatCom
Aug 6, 2019

Our security is guaranteed by being able to melt the eyeballs of any other forum's denizens at 15 minutes notice


Private Speech posted:

See my post above about that, the more complicated one.

Or the Campbell soup example, that works too.

Basically you can figure out similarities in the training dataset to a given image, but it doesn't mean that image was created with that dataset.

Then you had better be able to produce a searchable dataset to prove if it can or otherwise, and if it can't, you need to train your net to avoid infringing outputs harder.

Serotoning
Sep 14, 2010

D&D: HASBARA SQUAD
HANG 'EM HIGH


We're fighting human animals and we act accordingly

Private Speech posted:

I think you misunderstood my point, to put it in simpler terms:

You want to create an image in the style of Greg Rutkowski. You look at his images and identify the features of his style, yourself, in your head. Then you feed those features into an AI which does not know about Greg or his art and have it create an image in that style.

Now you have an image in the style of Greg Rutkowski made by an AI, but without the AI being trained on it (because you the human made the necessary connections yourself).

Yeah, and? What I'm saying is that you are not meaningfully changing the question. Things cannot infringe, only people can. You (acting alongside the machine) have created an image. Whether or not it is infringing should be a matter of the alleged "copy" and it's original, not of the internal process used to get there. If the painting is indeed in Rutkowski's style, it is because the style is generically describable enough so as to be convincingly reproduced. "Form" is not copyrightable, in other words, only individual works are. It's like, if I sat in GIMP hitting the randomize noise filter enough times I might eventually get an, I dunno, Interchange by de Kooning but it is still my responsibility to not release it as my own.

e: it's like getting caught next to a printing press illegally whirling away reproducing copies of a book you don't own the rights to and pointing at the press saying "but officer, it is not me who is plagiarizing the work, it's the machine" and expecting anything but laughter.

Serotoning fucked around with this message at 17:51 on May 22, 2023

cat botherer
Jan 6, 2022

I am interested in most phases of data processing.
Yeah you cannot copyright styles. That's never been the case, and doing it on a computer does not change that.

https://www.thelegalartist.com/blog/you-cant-copyright-style

Liquid Communism
Mar 9, 2004

коммунизм хранится в яичках

Clarste posted:

The idea is to define the program as something that cannot "learn" and can only "copy" so therefore anything in its training set is copying by definition. Like tracing. A computer cannot have a style, it can only trace things.

Yep. Even the 'draw a thing in Bob Ross' style' prompt is a dodge, because the algorithm has no idea what Bob Ross' style is. It knows there were files in its training set that were human-tagged as being produced by or similar to Bob Ross, and will now iterate on parts of them to generate an image that the human user will then decide is or is not what they wanted.

The only thought of style here is in the prompt giver and those creating the metadata in the training set.

cat botherer
Jan 6, 2022

I am interested in most phases of data processing.

Liquid Communism posted:

Yep. Even the 'draw a thing in Bob Ross' style' prompt is a dodge, because the algorithm has no idea what Bob Ross' style is. It knows there were files in its training set that were human-tagged as being produced by or similar to Bob Ross, and will now iterate on parts of them to generate an image that the human user will then decide is or is not what they wanted.
It does not do this. Are you reading any of the posts where people have repeatedly explained that the models do not contain their training sets? It cannot "iterate" on the tagged set to generate a new image because that set of images does not exist at prediction time. You haven't the faintest idea of what you are talking about here.

KillHour
Oct 28, 2007


Clarste posted:

The idea is to define the program as something that cannot "learn" and can only "copy" so therefore anything in its training set is copying by definition. Like tracing. A computer cannot have a style, it can only trace things.

This is begging the question. You are stating that "learning" is, by definition, a thing computers can't do. And other people are saying that's not true. You are assuming there is something special about being human that means you fall into a different category of rules and the people you're arguing against are saying that humans and computers should be judged by the same rules. Until both sides of the argument agree to one of those things, you can't possibly convince each other.

cat botherer
Jan 6, 2022

I am interested in most phases of data processing.
This thread would be a lot easier if people argued based on the ML models and copyright laws that actually exist. It seems that people think these models are some kind of database.

Count Roland
Oct 6, 2013

As an aside, I'm mostly lurking this thread and am finding it quite interesting. Keep it up, and please keep it a civil debate.

Clarste
Apr 15, 2013

Just how many mistakes have you suffered on the way here?

An uncountable number, to be sure.

KillHour posted:

This is begging the question. You are stating that "learning" is, by definition, a thing computers can't do. And other people are saying that's not true. You are assuming there is something special about being human that means you fall into a different category of rules and the people you're arguing against are saying that humans and computers should be judged by the same rules. Until both sides of the argument agree to one of those things, you can't possibly convince each other.

I am saying the law can declare it so regardless of what you or anyone thinks, and a lot of people with a lot of money have a vested interest in strong copyright laws. This isn't a philosophical discussion, the law is a tool that you use to get what you want.

Clarste
Apr 15, 2013

Just how many mistakes have you suffered on the way here?

An uncountable number, to be sure.

cat botherer posted:

This thread would be a lot easier if people argued based on the ML models and copyright laws that actually exist. It seems that people think these models are some kind of database.

I super do not see how this actually matters. You input copyrighted material into the machine. Whether it happened before or after "training" is 100% irrelevant to the issue of whether we want that to be a thing and how we might stop it.

cat botherer
Jan 6, 2022

I am interested in most phases of data processing.

Clarste posted:

I am saying the law can declare it so regardless of what you or anyone thinks, and a lot of people with a lot of money have a vested interest in strong copyright laws. This isn't a philosophical discussion, the law is a tool that you use to get what you want.
Case law can go anywhere, always. That's a specious argument in itself.

Clarste posted:

I super do not see how this actually matters. You input copyrighted material into the machine. Whether it happened before or after "training" is 100% irrelevant to the issue of whether we want that to be a thing and how we might stop it.
It absolutely matters, because it factors into whether it is free use or not. It has also been a thing for years now. ML models are nothing new.

Clarste
Apr 15, 2013

Just how many mistakes have you suffered on the way here?

An uncountable number, to be sure.
Case law can go wherever it wants, but if people with money don't like where it went they can buy a senator or 50. All I have ever been saying is that the law can stop it if it wants to, and all these arguments about the internal workings of the machine or the nature of art are pretty irrelevant to that.

GlyphGryph
Jun 23, 2013

Down came the glitches and burned us in ditches and we slept after eating our dead.
The problem is that you don't seem to understand how the machines work, so your declarations based on how they work are spurious.
You also don't seem to understand how existing copyright law works, so your appeals to just let them get hit by existing laws don't work.
You don't seem to be able to wrap your head around the idea that there are things that are difficult to legislate away without doing a lot of collateral damage, even though it happens all the damned time, and you keep just handwaving away every issue.
And to top it all off, because of those things you're making suggestions that wouldn't even work to actually make things better for literally anyone and acting as if you've "solved" things, bing bong so simple.

Literally no one benefits from the schemes being proposed right now (except maybe the biggest media megacorps), but a hell of a lot of folks would be hurt.

cat botherer
Jan 6, 2022

I am interested in most phases of data processing.

Clarste posted:

Case law can go wherever it wants, but if people with money don't like where it went they can buy a senator or 50. All I have ever been saying is that the law can stop it if it wants to, and all these arguments about the internal workings of the machine or the nature of art are pretty irrelevant to that.
Places like Getty or others want it to be restricted (and generally go against any fair use whenever possible). However there's even bigger money (the tech industry) that wants to maintain the status quo. Machine learning on fair-use data has been a thing for many years now.

cat botherer fucked around with this message at 18:36 on May 22, 2023

Count Roland
Oct 6, 2013

Clarste posted:

I am saying the law can declare it so regardless of what you or anyone thinks, and a lot of people with a lot of money have a vested interest in strong copyright laws. This isn't a philosophical discussion, the law is a tool that you use to get what you want.

Who is intended to benefit from such laws?

Adbot
ADBOT LOVES YOU

Tree Reformat
Apr 2, 2022

by Fluffdaddy

Clarste posted:

Case law can go wherever it wants, but if people with money don't like where it went they can buy a senator or 50. All I have ever been saying is that the law can stop it if it wants to, and all these arguments about the internal workings of the machine or the nature of art are pretty irrelevant to that.

How exactly these generative systems work is quite relevant because we're discussing if the very act of searching for and copying a computer file (or five billion) and then having a computer program use the data from that file to do some math so it can then change the data of a different file (the model, to keep things clear) constitutes copyright infringement and learning or not. The landmark cases that are working their way through the courts right now, which have the interest and backing of large corporations on both sides, will in fact revolve around the answers to these questions!

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply