Let's chat about AI!

The Something Awful Forums > Discussion > Debate & Discussion > Let's chat about AI!

Abhorrence: Feb 5, 2010; A love that crushes like a mace.

Also, the thread needs to be open to shunt AI art slapfights into, otherwise they are going to pop up all over the forum.

# ? Dec 19, 2023 22:15

Adbot: ADBOT LOVES YOU

# ? May 28, 2024 15:31

Verviticus: Mar 13, 2006; I'm just a total piece of shit and I'm not sure why I keep posting on this site. Christ, I have spent years with idiots giving me bad advice about online dating and haven't noticed that the thread I'm in selects for people that can't talk to people worth a damn.

edit: eh wasnt really an original or interesting thought

Verviticus fucked around with this message at 22:47 on Dec 19, 2023

# ? Dec 19, 2023 22:41

Tei: Feb 19, 2011

Lemming posted:

No, it's not. This goes back to the original point that using the word "hallucination" has made everyone discuss this situation in a really dumb way, but a great example is that fact that most people don't hallucinate (because hallucinations are manifestation of some kind of mental illness, where your brain isn't working the way it's supposed to), and ALL LLMs "hallucinate" necessarily as a function of how they work (because they're text predictors and don't have any "understanding" of the underlying truth of a situation)

Hallucinations can happens easily with visual illusions, even on demand, for people withouth a mental illness.

The probably cause of visual illusions are optimizations on our vision system that are caught off guard when the optimization happens to be in the wrong time.

Anyway they are hallucinations, we think we see something, but is not there, is hallucinated.

Another common hallucination for humans and artificial vision systems is

Both AI systems and humans suffer from pareidolia hallucinations. And not crazy people.

Tei fucked around with this message at 23:08 on Dec 19, 2023

# ? Dec 19, 2023 23:04

Lucid Dream: Feb 4, 2003; That boy ain't right.

I set up a makeshift version of chatgpt using the gemini api, and gemini pro seems kind of lovely. It's basically free though, so that's nice I guess. I haven't really experimented with the multimodal capabilities, but I haven't been particularly impressed by chatgpt's ability to "see" images so I'm curious to compare at least.

Lucid Dream fucked around with this message at 23:21 on Dec 19, 2023

# ? Dec 19, 2023 23:19

Nervous: Jan 25, 2005; Why, hello, my little slice of pecan pie.

Tei posted:

Both AI systems and humans suffer from pareidolia hallucinations. And not crazy people.

Wanna be that mug of coffee's friend.

# ? Dec 19, 2023 23:20

The Artificial Kid: Feb 22, 2002; Plibble

Nervous posted:

Wanna be that mug of coffee's friend.

He�s got a lot on right now, between visiting his speaker friend with the nasogastric tube in the hospital, evidently loving the absolute *hell* out of that washing machine and doing it all under the judgmental eyes of those power sockets.

# ? Dec 20, 2023 02:59

repiv: Aug 13, 2009

whoops, turns out the base dataset that most of the big image generation models are trained on contains at least 3000 instances of child sexual abuse material

https://twitter.com/jason_koebler/status/1737460292299190371

# ? Dec 20, 2023 23:34

Space Skeleton: Sep 28, 2004

https://x.com/jason_koebler/status/1737469369154773301?s=20

Hey they are exactly as careless as everyone assumes people who scrape the internet are.

"We cannot check every image manually"

You mean you don't want to.

"Since we are not distributing or deriving other images from originals, I do not think the image licensing apply"

Setting up a nice space where maybe they aren't technically doing anything wrong but the law hasn't caught up to there being multiple digital middlemen in the process.

# ? Dec 21, 2023 00:26

The Artificial Kid: Feb 22, 2002; Plibble

Space Skeleton posted:

https://x.com/jason_koebler/status/1737469369154773301?s=20

"We cannot check every image manually"

You mean you don't want to.

This has been a cornerstone of the entire disruptopocene epoch.

"There's no way we can review every image in this dataset...that would require us to do our research at a more measured pace"
"There's no way we can be responsible for what we publish on our social media site...I mean without paying a bunch of people to moderate it all properly"
"We can't be expected to account for all the impacts of AI on the wider community, that would cost us money and time"
"It's not my fault your house is underwater. Congress passed an act that says I can tear down any municipal dam without liability provided I use one of the new Techdozers. This will stimulate investment in the exciting new Techdozer sector"

# ? Dec 21, 2023 08:44

Tei: Feb 19, 2011

I do not read that has "our dataset contains child porn" and more like "who knows?, lol, I don't think so, and we took actions to activelly filter horrible stuff, but that can happens".

Also, it contains a genuine, naive and direct answer of: "Do we train our bot in copyrighted data ignoring the license of such data? yes."

If theres a image online with a license "Any use except being feed to AI bots datasets", that image is going to be feed to AI bots datasets.

# ? Dec 21, 2023 14:17

KillHour: Oct 28, 2007

Tei posted:

If theres a image online with a license "Any use except being feed to AI bots datasets", that image is going to be feed to AI bots datasets.

A copyright license dictates how you can copy (share) an image, not how you can use it.* If I put a license on an image saying "any use except jacking off to it" and you jack off to it, I can't sue you for doing that.

The legal question at issue is if and when the image is being copied as part of running an AI system trained on that image.

*Yes, I know there are clickwrap agreements on software that try to dictate the manner in which it is used but everyone pretty much ignores them anyways and the legal figleaf is that installing software is technically making a copy, which is stupid, but the law can be stupid sometimes.

Also, technically there are some restrictions on use like breaking DRM, even if you don't share it, but that's explicitly illegal based on the DMCA (in the US), not based on the license.

KillHour fucked around with this message at 15:07 on Dec 21, 2023

# ? Dec 21, 2023 15:04

Liquid Communism: Mar 9, 2004; коммунизм хранится в яичках

Tei posted:

I do not read that has "our dataset contains child porn" and more like "who knows?, lol, I don't think so, and we took actions to activelly filter horrible stuff, but that can happens".

Also, it contains a genuine, naive and direct answer of: "Do we train our bot in copyrighted data ignoring the license of such data? yes."

If theres a image online with a license "Any use except being feed to AI bots datasets", that image is going to be feed to AI bots datasets.

Oh, they're well aware it does, and have been for years. Stanford just published a study where they went through the Canadian Center for Child Protection and matched known CSAM to existing parts of the LAION-5B dataset.

In any reasonable society, this is a moment where regulators slam the loving brakes and ask how exactly this happened, and why known safeguarding measures weren't applied.

# ? Dec 22, 2023 07:05

Tei: Feb 19, 2011

KillHour posted:

A copyright license dictates how you can copy (share) an image, not how you can use it.* If I put a license on an image saying "any use except jacking off to it" and you jack off to it, I can't sue you for doing that.

The legal question at issue is if and when the image is being copied as part of running an AI system trained on that image.

I am not lawyer, so I know I am on quicksands but USE licenses also exist.

That make me think... what are the conditions required for a landgrab?

- The previous owners don't have the strength to stop it. ( indigenous people in america, all the lands that had their treasures stolen by the british to put in a museum in england, the artists in 2023 )
- The enforcement arm of the land is okay with it
- Racism / hate towards the minority group that own the property that is going to be stolen by the larger more powerfull group

Art does no really have a value in capitalism. But you can ask a monthly fee for the access to a AI bot trained from that art.
A forest have not value in capitalism. But if you burn it, and sell the burned trees as cheap coal, somebody can make money from that.

# ? Dec 22, 2023 10:18

KillHour: Oct 28, 2007

Tei posted:

I am not lawyer, so I know I am on quicksands but USE licenses also exist.

In most cases, if I buy a thing, you can't tell me what to do or not do with that thing. An exception would be if you were leasing or renting it instead of selling it. Another exception is real estate because of old laws from 16th century Saxony or whatever. But if I sell you a book or a painting, I have no legal way to tell you that you can't wipe your rear end with the paper it's made of. That doesn't change if I give it to you for free either. If I created that work myself, what I can do is tell you not to make copies of it and sell them for a dollar or whatever.

In theory, digital art should be the same, but it turns out that laws that make sense in the context of a physical item don't necessarily make logical sense for a bunch of abstract bits of information that get "copied" every time you use them.

KillHour fucked around with this message at 15:54 on Dec 22, 2023

# ? Dec 22, 2023 15:47

KillHour: Oct 28, 2007

# ? Dec 22, 2023 15:52

Killer robot: Sep 6, 2010; I was having the most wonderful dream. I think you were in it!; Pillbug

Back when ACDSee was one of the popular shareware image viewers for Windows, I recall its license agreement specified that you couldn't use it for viewing porn. It was always funny imagining a world where that was in any way enforced.

I mean, leaving out those who didn't just crack it anyway.

# ? Dec 22, 2023 16:08

Liquid Communism: Mar 9, 2004; коммунизм хранится в яичках

KillHour posted:

In most cases, if I buy a thing, you can't tell me what to do or not do with that thing. An exception would be if you were leasing or renting it instead of selling it. Another exception is real estate because of old laws from 16th century Saxony or whatever. But if I sell you a book or a painting, I have no legal way to tell you that you can't wipe your rear end with the paper it's made of. That doesn't change if I give it to you for free either. If I created that work myself, what I can do is tell you not to make copies of it and sell them for a dollar or whatever.

In theory, digital art should be the same, but it turns out that laws that make sense in the context of a physical item don't necessarily make logical sense for a bunch of abstract bits of information that get "copied" every time you use them.

While this is a fun simplification, it completely ignores the concept of how derivative works interact with copyright, which is the real thing in question here.

The primary issues:

1. Per the USPTO's present ruling AI-generated works cannot be copyrighted because they lack human authorship, although human-generated works that use AI-generated work may be.
2. The question of if using others' works without license as training data for generative AI is still being worked out through the courts, although the primary claim from OpenAI et al seems to be 'we can't grow our business fast enough if we're expected to vet and license the data we use'. See my prior post above as to the moral hazard involved in not vetting datasets.

There's a pretty solid summary prepared by the Congressional Research Service. My personal reading is that we're going to see the ruling come down that it is in fact infringement, as there have been multiple demonstrations of ways to make generative AI (ChatGPT in particular) regurgitate training data in whole or part, which OpenAI's argument in the lawsuit hinges on not being possible.

# ? Dec 22, 2023 17:23

KillHour: Oct 28, 2007

Liquid Communism posted:

While this is a fun simplification, it completely ignores the concept of how derivative works interact with copyright, which is the real thing in question here.

The primary issues:

1. Per the USPTO's present ruling AI-generated works cannot be copyrighted because they lack human authorship, although human-generated works that use AI-generated work may be.
2. The question of if using others' works without license as training data for generative AI is still being worked out through the courts, although the primary claim from OpenAI et al seems to be 'we can't grow our business fast enough if we're expected to vet and license the data we use'. See my prior post above as to the moral hazard involved in not vetting datasets.

There's a pretty solid summary prepared by the Congressional Research Service. My personal reading is that we're going to see the ruling come down that it is in fact infringement, as there have been multiple demonstrations of ways to make generative AI (ChatGPT in particular) regurgitate training data in whole or part, which OpenAI's argument in the lawsuit hinges on not being possible.

It ignores it because derivative works aren't relevant to the original question of doing the model training in the first place. It's running the trained model that is alleged to create potentially derivative works. I'm saying the training is unrelated to copyright, which is why you can't make a license that says not to.

# ? Dec 22, 2023 18:01

SCheeseman: Apr 23, 2003

Liquid Communism posted:

1. Per the USPTO's present ruling AI-generated works cannot be copyrighted because they lack human authorship, although human-generated works that use AI-generated work may be.

This won't hold up. What is "human authorship"? Intent? Methodology? Technique? How much? What if you use control net? What if you use infill? Does it matter how big or how specific the prompt is? What about non-image generators, like audio? Do lyrics count?

quote:

There's a pretty solid summary prepared by the Congressional Research Service. My personal reading is that we're going to see the ruling come down that it is in fact infringement, as there have been multiple demonstrations of ways to make generative AI (ChatGPT in particular) regurgitate training data in whole or part, which OpenAI's argument in the lawsuit hinges on not being possible.

Google Books spits out copyrighted snippets of text that were scanned into a database without permission by design and it was still considered transformative. You need to be extremely specific to do the same with generative AI models, it's not an intended outcome and such output is generally considered undesirable by anyone using it anyway.

SCheeseman fucked around with this message at 18:27 on Dec 22, 2023

# ? Dec 22, 2023 18:22

PhazonLink: Jul 17, 2010

e: opps quoting from several pages and days ago,

Tei posted:
Maybe part of the reason the human brain is so slow is because is mechanical. Biological cells must actually build new connections and chemistry changes (molecules) actually have to move.

A computer can learn a new language is fractions of a second, but for a human brain it takes years.

We use to change the definition of inteligence every 10 years.
Then in the 2000-ish we started updating the definition every year.
And in 2023, the definition of intelligence need to be updated every week to opt-out progress from IA.

So of course, we need to do the same with imagination.

isnt the "computer" in this example a massive server farm? like sure it can do things fast, but its probably taking more energy/reoursces and space (for now??) than an unpaid intern

PhazonLink fucked around with this message at 20:07 on Dec 22, 2023

# ? Dec 22, 2023 20:00

Serotoning: Sep 14, 2010; D&D: HASBARA SQUAD
HANG 'EM HIGH

We're fighting human animals and we act accordingly

Tei posted:

Art does no really have a value in capitalism. But you can ask a monthly fee for the access to a AI bot trained from that art.
A forest have not value in capitalism. But if you burn it, and sell the burned trees as cheap coal, somebody can make money from that.

What the gently caress are you talking about? What's with everyone thinking they are so smart by lumping every phenomena of human valuation in with capitalism?

Art obviously has value, regardless of the economic "system" (I consider true "laissez-faire" capitalism to be the virtual lack of coherent economic system, basically being just freedom under robust property rights and a rule of law) one lives under, that's why we have been making it for as long as we have records of ourselves as a species.

You are talking about human beings being reflected in markets/capitalism , not the markets/capitalism themselves. The free-er the markets, the more they will reflect the actual desires of people.

A forest "doesn't have value" (which is untrue) and coal does because forests are everywhere and coal is not. You are deeply confused, or stupid, or both.

# ? Dec 22, 2023 21:44

The Artificial Kid: Feb 22, 2002; Plibble

Serotoning posted:

Art obviously has value, regardless of the economic "system" (I consider true "laissez-faire" capitalism to be the virtual lack of coherent economic system, basically being just freedom under robust property rights and a rule of law) one lives under, that's why we have been making it for as long as we have records of ourselves as a species.

And it's not just a question of the product having a value and competing equally with AI generated art. People place a premium on human involvement in art. They'll often pay more for the same work because it was made by someone worthy, or famous, or even just someone pitiable.

# ? Dec 22, 2023 22:11

KwegiboHB: Feb 2, 2004; nonconformist art brut
Negative prompt: amenable, compliant, docile, law-abiding, lawful, legal, legitimate, obedient, orderly, submissive, tractable
Steps: 32, Sampler: DPM++ 2M Karras, CFG scale: 11, Seed: 520244594, Size: 512x512, Model hash: 99fd5c4b6f, Model: seekArtMEGA_mega20

I'm preparing a longer write-up addressing several major AI topics. One significant development I can briefly bring up is the next pending legal case that could set important future precedence. This will go before a jury to decide.

Thomson Reuters v. Ross Intelligence
https://copyrightlately.com/why-a-little-known-copyright-case-may-shape-the-future-of-ai/

At the heart of the matter is that simple lists of facts are not copyrightable, only the creative aspects, or layout, or inclusion or exclusion of those facts.
The Case Law organized by Thomson Reuters Westlaw is not copyrightable, only their organizing and summaries of it.
Ross Intelligence training their AI Model on the Case Law itself is not an issue as that is public domain data.
I don't know the extent of further training Ross Intelligence did on their model as I'm still learning about this case but it seems to be what the entire case is about and why it's important and worth following.

One thing I am working on writing about is what is actually inside the AI Model now that I've had some time to do a deeper look after attempting to make my own.
A Key Issue behind AI Training, the actual saved frozen AI Weights are only a list of relations of facts from the training data. This is an important distinction because even if copyrighted material was in a dataset used for training, the actual results of training are only a list of facts about the trained material and not the actual copyrighted material itself. They are not a mish mash of stored images or compressed files like was suggested in the first headline grabbing lawsuit that ended up dismissed.

There is obviously a lot more to be said about this but it's enough to start with. I will say that copyright seems a poor method to handle such a fundamentally transformative multi-faceted societal changing issue.

# ? Dec 22, 2023 23:32

repiv: Aug 13, 2009

KwegiboHB posted:

One thing I am working on writing about is what is actually inside the AI Model now that I've had some time to do a deeper look after attempting to make my own.
A Key Issue behind AI Training, the actual saved frozen AI Weights are only a list of relations of facts from the training data. This is an important distinction because even if copyrighted material was in a dataset used for training, the actual results of training are only a list of facts about the trained material and not the actual copyrighted material itself. They are not a mish mash of stored images or compressed files like was suggested in the first headline grabbing lawsuit that ended up dismissed.

well they can be if the model is over-fitted, as demonstrated by the new midjourney V6 model which seems to be especially prone to regurgitating near-perfect replicas of images from the training set for some reason

in that last example the name and creator of the original piece aren't even included in the prompt, and MJv6 still zeroed in on replicating that piece in particular

repiv fucked around with this message at 00:03 on Dec 23, 2023

# ? Dec 22, 2023 23:43

KwegiboHB: Feb 2, 2004; nonconformist art brut
Negative prompt: amenable, compliant, docile, law-abiding, lawful, legal, legitimate, obedient, orderly, submissive, tractable
Steps: 32, Sampler: DPM++ 2M Karras, CFG scale: 11, Seed: 520244594, Size: 512x512, Model hash: 99fd5c4b6f, Model: seekArtMEGA_mega20

repiv posted:

well they can be if the model is over-fitted, as demonstrated by the new midjourney update which seems to be especially prone to regurgitating near-perfect replicas of images from the training set for some reason

This is why I need a much longer write-up because those are not pixel perfect recreations and the explanation of why that matters is important. Yes, I know that distinction is not going to matter for most people when it's 99.999999% the same. Pixel-perfect recreations are actually mathematically possible regardless of what a model was trained on and that is a big deal. I need time and I'm going to take what time I need to write this up proper.

# ? Dec 22, 2023 23:54

Lucid Dream: Feb 4, 2003; That boy ain't right.

repiv posted:

well they can be if the model is over-fitted, as demonstrated by the new midjourney V6 model which seems to be especially prone to regurgitating near-perfect replicas of images from the training set for some reason

in that last case the name and creator of the original piece aren't even named in the prompt, and MJv6 still copied it almost to a tee

This is a weird one to me. Like, if you tell the AI to draw a picture of Mona Lisa and it draws a good Mona Lisa that isn't a bug, except in that it might be legally problematic. If you had a perfect AGI and asked it to output "Mona Lisa" it would be a bug if it *wasn't* a perfect representation.

# ? Dec 23, 2023 00:04

repiv: Aug 13, 2009

Lucid Dream posted:

This is a weird one to me. Like, if you tell the AI to draw a picture of Mona Lisa and it draws a good Mona Lisa that isn't a bug, except in that it might be legally problematic. If you had a perfect AGI and asked it to output "Mona Lisa" it would be a bug if it *wasn't* a perfect representation.

in the case of the mona lisa i'd agree, since that's a specific still image you would expect a sufficiently large model to reproduce it exactly if asked for the mona lisa. it's only notable as an example that models can store and reproduce the images they were trained on, contrary to some claims that the training data is always atomized beyond recognition so it doesn't count as reproducing it.

the other two examples though - the joker prompt doesn't ask for a specific known image but the model decided to regurgitate a specific frame from the movie rather than interpolating over the broad space of "joker movie images", and in the last example the piece being copied isn't referenced in the prompt whatsoever. that to me indicates over-fitting, biasing the model towards being less creative and more plagiarizey in an effort to improve quality.

repiv fucked around with this message at 00:54 on Dec 23, 2023

# ? Dec 23, 2023 00:11

Lucid Dream: Feb 4, 2003; That boy ain't right.

repiv posted:

in the case of the mona lisa i'd agree, since that's a specific still image you would expect a sufficiently large model to reproduce it exactly if asked for the mona lisa. it's only notable as an example that models can store and reproduce the images they were trained on, contrary to some claims that the training data is always atomized beyond recognition so it doesn't count as reproducing it.

the other two examples though - the joker prompt doesn't ask for a specific known image but the model decided to regurgitate a specific frame from the movie rather than interpolating over the broad space of "joker movie images", and in the last example the piece being copied isn't referenced in the prompt whatsoever. that to me indicates over-fitting, biasing the model towards being less creative and more plagiarizey in an effort to improve quality.

Hmm, I still think the Joker one is similar to the Mona Lisa example in that it was asking for a screenshot from the film and we don't actually know how many attempts it took. The third one is pretty damning though I suppose.

# ? Dec 23, 2023 02:18

Tei: Feb 19, 2011

Lucid Dream posted:

This is a weird one to me. Like, if you tell the AI to draw a picture of Mona Lisa and it draws a good Mona Lisa that isn't a bug, except in that it might be legally problematic. If you had a perfect AGI and asked it to output "Mona Lisa" it would be a bug if it *wasn't* a perfect representation.

I think creating copies of movies, songs, cars.. etc.. won't be a problem if is for personal use. But if you start selling Ferrari models you 3D printed on your computer, that can count has counterfeight product.

# ? Dec 23, 2023 03:23

BougieBitch: Oct 2, 2013; Basic as hell

I think the potential issue here, to me at least, is that having a piece of software that shows you the mona lisa when you type in "mona lisa" isn't exactly a novel issue - Google does that already. It's also not enough to say that a tool can produce an exact or near exact replica of something copyrighted when you give it specific instructions - you've always been able to do that in Photoshop or whatever even from a blank canvas if you happen to know what instructions to give it.

I'm not really sure there is any way to make a standard about AI in the general case - it basically has to be a case-by-case basis each time. In a lot of ways this is the worst of all worlds, because people will absolutely use AI in place of human artists, but I think you have to prove they used it instead of a PARTICULAR artist (the claimant) to really be able to win. On the other hand, companies can't trust that AI outputs will be safe, unless you are Disney or whoever and have enough of your own IP to train a model by yourself. Everyone else will have to live in constant fear that some panel or another is directly lifted from a comic book or whatever and it turns out that people with good lawyers have a slam-dunk case against you.

I don't see any way that you can ban AI in general on this basis either, to be clear. As KwegiboHB said, it's basically just stored statistics about the overall dataset, like "in 80% of images associated with the word "steeple", there was a pair of sharp borders forming a 20 degree angle". Even that is assigning more intent/comprehension to it than actually exists.

The issue with the three images listed there is that exact duplicates, or images that are basically the same but with different aspect ratios, probably exist hundreds of times in the dataset. On top of THAT, the Mona Lisa in particular is going to have a bunch of poo poo like "this is what an AI generated when I told it "mysterious smile", look how close it got!" and "here is what researchers think the Mona Lisa would have looked like at the time it was painted, accounting for the age of the paint" and "look, I photoshopped my face into the Mona Lisa with this filter".

I'm pretty sure the Dorothea Lange photo has been in every high school history textbook ever published in the US, which means it also is on every educational website about the era. It's obviously less ubiquitous than the Mona Lisa internationally, but assuming they are matching photos found on the Internet with text found on the same webpage it makes perfect sense this would happen - I probably had that photo in at least 4 different textbooks (photography, journalism, US history, economics)

The Joker one is really the most problematic example. In principle it probably amounts to something similar, with a lot of posts about people becoming "jokerfied" about things where they might have posted the screen cap with different aspect ratios, jpegs that are just "top text, joker face", etc. I think there was an issue/running joke about early image generators thinking that photos of cats are supposed to have impact font text on the top and bottom, so you'd get garbled not-quite-letters if you didn't plan around it . In the same sense, it's not surprising to me that "2019 joker movie" is going to give you the most-memed picture.

BougieBitch fucked around with this message at 07:24 on Dec 23, 2023

# ? Dec 23, 2023 07:21

Mega Comrade: Apr 22, 2004; Listen buddy, we all got problems!

BougieBitch posted:

you've always been able to do that in Photoshop or whatever even from a blank canvas if you happen to know what instructions to give it.

Photoshop didnt have copyrightable material fed into it though so I don't think the comparison works here. Photoshop is worlds closer to traditional art creation than AI image generators are.

I think the law will end up in place where the large model companies have to do everything they can to restrict copyright regurgitation but with everyone understanding it can't be totally prevented. Similar to how social media companies aren't held liable for hate speech on their platforms as long as they show they are actively trying to stop it.

Mega Comrade fucked around with this message at 10:21 on Dec 23, 2023

# ? Dec 23, 2023 10:18

Bug Squash: Mar 18, 2009

repiv posted:

well they can be if the model is over-fitted, as demonstrated by the new midjourney V6 model which seems to be especially prone to regurgitating near-perfect replicas of images from the training set for some reason

Related, we had a guy try to sell us a generative model to produce synthetic data from our locked-down private data. Turns out the optimal strategy to produce synthetic data that looks real is to output data identical to the original!

# ? Dec 23, 2023 11:23

repiv: Aug 13, 2009

Mega Comrade posted:

I think the law will end up in place where the large model companies have to do everything they can to restrict copyright regurgitation but with everyone understanding it can't be totally prevented. Similar to how social media companies aren't held liable for hate speech on their platforms as long as they show they are actively trying to stop it.

yeah i could see the AI vendors implementing a "you can copy my homework but don't make it too obvious" filter if they're pressured to, they could build a database of image fingerprints from the training set (similar to how GIS/tineye works) then check the generated output against that and re-roll with a different seed if it's too close to a training image within some threshold

that would be an additional compute burden though, and inevitably make the models quality worse as its forced to throw away good (stolen) images, so they'd rather not if they don't have to

# ? Dec 23, 2023 15:53

esquilax: Jan 3, 2003

Mega Comrade posted:

Photoshop didnt have copyrightable material fed into it though so I don't think the comparison works here. Photoshop is worlds closer to traditional art creation than AI image generators are.

I think the law will end up in place where the large model companies have to do everything they can to restrict copyright regurgitation but with everyone understanding it can't be totally prevented. Similar to how social media companies aren't held liable for hate speech on their platforms as long as they show they are actively trying to stop it.

Alternatively, a regulatory capture situation could result where regulation on AI will gradually and continually grow more expensive so that only the largest AI companies will be able to effectively comply, effectively banning any new competitors.

# ? Dec 23, 2023 16:09

Killer robot: Sep 6, 2010; I was having the most wonderful dream. I think you were in it!; Pillbug

esquilax posted:

Alternatively, a regulatory capture situation could result where regulation on AI will gradually and continually grow more expensive so that only the largest AI companies will be able to effectively comply, effectively banning any new competitors.

Yeah, it's pretty easy to see a path where generative AI exists and becomes a standard tool of creative professions but it's impossible to use without a large corporate middleman holding their hand out.

# ? Dec 23, 2023 16:51

Tei: Feb 19, 2011

BougieBitch posted:

I think the potential issue here, to me at least, is that having a piece of software that shows you the mona lisa when you type in "mona lisa" isn't exactly a novel issue - Google does that already. It's also not enough to say that a tool can produce an exact or near exact replica of something copyrighted when you give it specific instructions - you've always been able to do that in Photoshop or whatever even from a blank canvas if you happen to know what instructions to give it.

I'm not really sure there is any way to make a standard about AI in the general case - it basically has to be a case-by-case basis each time. In a lot of ways this is the worst of all worlds, because people will absolutely use AI in place of human artists, but I think you have to prove they used it instead of a PARTICULAR artist (the claimant) to really be able to win. On the other hand, companies can't trust that AI outputs will be safe, unless you are Disney or whoever and have enough of your own IP to train a model by yourself. Everyone else will have to live in constant fear that some panel or another is directly lifted from a comic book or whatever and it turns out that people with good lawyers have a slam-dunk case against you.

I don't see any way that you can ban AI in general on this basis either, to be clear. As KwegiboHB said, it's basically just stored statistics about the overall dataset, like "in 80% of images associated with the word "steeple", there was a pair of sharp borders forming a 20 degree angle". Even that is assigning more intent/comprehension to it than actually exists.

The issue with the three images listed there is that exact duplicates, or images that are basically the same but with different aspect ratios, probably exist hundreds of times in the dataset. On top of THAT, the Mona Lisa in particular is going to have a bunch of poo poo like "this is what an AI generated when I told it "mysterious smile", look how close it got!" and "here is what researchers think the Mona Lisa would have looked like at the time it was painted, accounting for the age of the paint" and "look, I photoshopped my face into the Mona Lisa with this filter".

I'm pretty sure the Dorothea Lange photo has been in every high school history textbook ever published in the US, which means it also is on every educational website about the era. It's obviously less ubiquitous than the Mona Lisa internationally, but assuming they are matching photos found on the Internet with text found on the same webpage it makes perfect sense this would happen - I probably had that photo in at least 4 different textbooks (photography, journalism, US history, economics)

The Joker one is really the most problematic example. In principle it probably amounts to something similar, with a lot of posts about people becoming "jokerfied" about things where they might have posted the screen cap with different aspect ratios, jpegs that are just "top text, joker face", etc. I think there was an issue/running joke about early image generators thinking that photos of cats are supposed to have impact font text on the top and bottom, so you'd get garbled not-quite-letters if you didn't plan around it . In the same sense, it's not surprising to me that "2019 joker movie" is going to give you the most-memed picture.

Something similar happens with Gartic Phone

https://garticphone.com/

Where if you draw something that varelly resemble Mario, it will become Mario, or something blue with like lines in his back, it become Sonic.

Is like these images have a magnetic power to attract drawing to themselves. Memetic power.

# ? Dec 23, 2023 19:17

Imaginary Friend: Jan 27, 2010; Your Best Friend

After watching some YouTube videos about robots and how there are chatgpt-powered ones being developed a question sprung up in my mind; when will we see the first murderbots? I mean, eventually, tech-savvy people will be able to buy a robot and dunk an open source, unrestricted AI into them. Will we see a robot war � la "I, Robot" soon?

# ? Dec 24, 2023 14:51

Mega Comrade: Apr 22, 2004; Listen buddy, we all got problems!

We already have human controlled drone warfare. The various militaries around the world have been experimenting with machine learning AIs controlling them for a while

# ? Dec 24, 2023 15:00

Quixzlizx: Jan 7, 2007

If you're asking whether or not a self-aware robot army is about to rise up against humanity anytime soon, then the answer is no.

# ? Dec 24, 2023 19:00

Adbot: ADBOT LOVES YOU

# ? May 28, 2024 15:31

Nervous: Jan 25, 2005; Why, hello, my little slice of pecan pie.

Quixzlizx posted:

If you're asking whether or not a self-aware robot army is about to rise up against humanity anytime soon, then the answer is no.

What about a non self aware murder swarm with faulty/breached IFF coding that's running amok on a civilian population center?

# ? Dec 24, 2023 21:45

The Something Awful Forums > Discussion > Debate & Discussion > Let's chat about AI!

«‹›54 »