Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Sivart13
May 18, 2003
I have neglected to come up with a clever title

Bar Ran Dun posted:

So Krugman has written this on AI with a point that boils down to well we will see it’s impacts a decade from now.

https://www.nytimes.com/2023/03/31/opinion/ai-chatgpt-jobs-economy.html?smid=nytcore-ios-share&referringSource=articleShare
AI has already changed the world in how quickly "Webster's dictionary defines X as..." has been replaced by "ChatGPT wrote this paragraph about X" as a hacky opening paragraph

More seriously, I've gone full AI doomer in my psyche but it's lightly tempered by walking around the physical world seeing how many dang people and buildings are in it. It's gonna take a long time for our GPT9 overlords to recycle all those buildings into holocubes or whatever.

Adbot
ADBOT LOVES YOU

Char
Jan 5, 2013
In a more-or-less unprecedented move, the Italian Privacy Guarantor declared that OpenAI's Chat-GPT must be made unaccessible until it's cleared out how it actually offers GDPR compliance with regards to

How training data was acquired
Why is there no age filtering system
Why during registering there is no explicit warning to let the user know OpenAI might use its data

Obviously on places ike Reddit and such this measure is getting laughed at but these seem reasonable concerns to have addressed sooner than later, in my opinion.

Charlz Guybon
Nov 16, 2010

Insanite posted:

.

Machine translation already annihilated the technical translation labor market, and what's left there seems like it'll disappear almost completely.
Has it really? I teach English in Asia as a second language and translation apps still seem poo poo beyond giving anything more than a definition or a translation of one simple sentence.

Insanite
Aug 30, 2005

Charlz Guybon posted:

Has it really? I teach English in Asia as a second language and translation apps still seem poo poo beyond giving anything more than a definition or a translation of one simple sentence.

The norm AFAIK in the tech world now is to give machine translation the first pass on everything, and then contract out final checking + editing to humans.

Takes way fewer people than doing it all with human translators, and it's cheaper for the corp.

Leon Trotsky 2012
Aug 27, 2009

YOU CAN TRUST ME!*


*Israeli Government-affiliated poster
Italy became the first western country in the world to ban ChatGPT from using any Italian data to train it.

quote:

Italy became the first Western country to ban ChatGPT. Here’s what other countries are doing

Italy has become the first country in the West to ban ChatGPT, the popular artificial intelligence chatbot from U.S. startup OpenAI.

Last week, the Italian Data Protection Watchdog ordered OpenAI to temporarily cease processing Italian users’ data amid a probe into a suspected breach of Europe’s strict privacy regulations.

The regulator, which is also known as Garante, cited a data breach at OpenAI which allowed users to view the titles of conversations other users were having with the chatbot.

There “appears to be no legal basis underpinning the massive collection and processing of personal data in order to ‘train’ the algorithms on which the platform relies,” Garante said in a statement Friday.

Garante also flagged worries over a lack of age restrictions on ChatGPT, and how the chatbot can serve factually incorrect information in its responses.

OpenAI, which is backed by Microsoft, risks facing a fine of 20 million euros ($21.8 million), or 4% of its global annual revenue, if it doesn’t come up with remedies to the situation in 20 days.

Italy isn’t the only country reckoning with the rapid pace of AI progression and its implications for society. Other governments are coming up with their own rules for AI, which, whether or not they mention generative AI, will undoubtedly touch on it. Generative AI refers to a set of AI technologies that generate new content based on prompts from users. It is more advanced than previous iterations of AI, thanks in no small part to new large language models, which are trained on vast quantities of data.

There have long been calls for AI to face regulation. But the pace at which the technology has progressed is such that it is proving difficult for governments to keep up. Computers can now create realistic art, write entire essays, or even generate lines of code, in a matter of seconds.

“We have got to be very careful that we don’t create a world where humans are somehow subservient to a greater machine future,” Sophie Hackford, a futurist and global technology innovation advisor for American farming equipment maker John Deere, told CNBC’s “Squawk Box Europe” Monday.

“Technology is here to serve us. it’s there to make our cancer diagnosis quicker or make humans not have to do jobs that we don’t want to do.”

“We need to be thinking about it very carefully now, and we need to be acting on that now, from a regulation perspective,” she added.

Various regulators are concerned by the challenges AI poses for job security, data privacy, and equality. There are also worries about advanced AI manipulating political discourse through generation of false information.

Many governments are also starting to think about how to deal with general purpose systems such as ChatGPT, with some even considering joining Italy in banning the technology.

Britain
Last week, the U.K. announced plans for regulating AI. Rather than establish new regulations, the government asked regulators in different sectors to apply existing regulations to AI.

The U.K. proposals, which don’t mention ChatGPT by name, outline some key principles for companies to follow when using AI in their products, including safety, transparency, fairness, accountability, and contestability.

Britain is not at this stage proposing restrictions on ChatGPT, or any kind of AI for that matter. Instead, it wants to ensure companies are developing and using AI tools responsibly and giving users enough information about how and why certain decisions are taken.

In a speech to Parliament last Wednesday, Digital Minister Michelle Donelan said the sudden popularity of generative AI showed that risks and opportunities surrounding the technology are “emerging at an extraordinary pace.”

By taking a non-statutory approach, the government will be able to “respond quickly to advances in AI and to intervene further if necessary,” she added.

Dan Holmes, a fraud prevention leader at Feedzai, which uses AI to combat financial crime, said the main priority of the U.K.’s approach was addressing “what good AI usage looks like.”

“It’s more, if you’re using AI, these are the principles you should be thinking about,” Holmes told CNBC. “And it often boils down to two things, which is transparency and fairness.”

The EU
The rest of Europe is expected to take a far more restrictive stance on AI than its British counterparts, which have been increasingly diverging from EU digital laws following the U.K.’s withdrawal from the bloc.

The European Union, which is often at the forefront when it comes to tech regulation, has proposed a groundbreaking piece of legislation on AI.

Known as the European AI Act, the rules will heavily restrict the use of AI in critical infrastructure, education, law enforcement, and the judicial system.

It will work in conjunction with the EU’s General Data Protection Regulation. These rules regulate how companies can process and store personal data.

When the AI act was first dreamed up, officials hadn’t accounted for the breakneck progress of AI systems capable of generating impressive art, stories, jokes, poems and songs.

According to Reuters, the EU’s draft rules consider ChatGPT to be a form of general purpose AI used in high-risk applications. High-risk AI systems are defined by the commission as those that could affect people’s fundamental rights or safety.

They would face measures including tough risk assessments and a requirement to stamp out discrimination arising from the datasets feeding algorithms.

“The EU has a great, deep pocket of expertise in AI. They’ve got access to some of the top notch talent in the world, and it’s not a new conversation for them,” Max Heinemeyer, chief product officer of Darktrace, told CNBC.

“It’s worthwhile trusting them to have the best of the member states at heart and fully aware of the potential competitive advantages that these technologies could bring versus the risks.”

But while Brussels hashes out laws for AI, some EU countries are already looking at Italy’s actions on ChatGPT and debating whether to follow suit.

“In principle, a similar procedure is also possible in Germany,” Ulrich Kelber, Germany’s Federal Commissioner for Data Protection, told the Handelsblatt newspaper.

The French and Irish privacy regulators have contacted their counterparts in Italy to learn more about its findings, Reuters reported. Sweden’s data protection authority ruled out a ban. Italy is able to move ahead with such action as OpenAI doesn’t have a single office in the EU.

Ireland is typically the most active regulator when it comes to data privacy since most U.S. tech giants like Meta
and Google
have their offices there.

U.S.
The U.S. hasn’t yet proposed any formal rules to bring oversight to AI technology.

The country’s National Institute of Science and Technology put out a national framework that gives companies using, designing or deploying AI systems guidance on managing risks and potential harms.

But it runs on a voluntary basis, meaning firms would face no consequences for not meeting the rules.

So far, there’s been no word of any action being taken to limit ChatGPT in the U.S.

Last month, the Federal Trade Commission received a complaint from a nonprofit research group alleging GPT-4, OpenAI’s latest large language model, is “biased, deceptive, and a risk to privacy and public safety” and violates the agency’s AI guidelines.

The complaint could lead to an investigation into OpenAI and suspension of commercial deployment of its large language models. The FTC declined to comment.

China

ChatGPT isn’t available in China, nor in various countries with heavy internet censorship like North Korea, Iran and Russia. It is not officially blocked, but OpenAI doesn’t allow users in the country to sign up.

Several large tech companies in China are developing alternatives. Baidu, Alibaba and JD.com, some of China’s biggest tech firms, have announced plans for ChatGPT rivals.

China has been keen to ensure its technology giants are developing products in line with its strict regulations.

Last month, Beijing introduced first-of-its-kind regulation on so-called deepfakes, synthetically generated or altered images, videos or text made using AI.

Chinese regulators previously introduced rules governing the way companies operate recommendation algorithms. One of the requirements is that companies must file details of their algorithms with the cyberspace regulator.

Such regulations could in theory apply to any kind of ChatGPT-style of technology.

https://www.cnbc.com/2023/04/04/italy-has-banned-chatgpt-heres-what-other-countries-are-doing.html

Leon Trotsky 2012 fucked around with this message at 15:32 on Apr 4, 2023

Summit
Mar 6, 2004

David wanted you to have this.
Preaching to the choir but wow that headline is bad. No such thing happened.

Leon Trotsky 2012
Aug 27, 2009

YOU CAN TRUST ME!*


*Israeli Government-affiliated poster

Summit posted:

Preaching to the choir but wow that headline is bad. No such thing happened.

Yeah, I added a sentence up top clarifying what the "ban" is.

gurragadon
Jul 28, 2006

I don't know what is going to happen to Europeans if the countries just keep suing AI companies for using there data. I would assume that OpenAI is just going to stop operating in Europe so they wont be under the laws of the European Union. I don't know if that is really a sustainable position for Europe because in the United States it dosen't seem like regulation is going to keep up so were going to see AI advance beyond regulation either way.

Not really disageeing with Italy and the EU trying to slow this down though. If they are able to use there privacy laws to give us a slight pause on AI development and deployment it could help with AI safety.

quote:

The French and Irish privacy regulators have contacted their counterparts in Italy to learn more about its findings, Reuters reported. Sweden’s data protection authority ruled out a ban. Italy is able to move ahead with such action as OpenAI doesn’t have a single office in the EU.

So, is this all just the EU saying stay out then or just making a statement? I know OpenAI would probably like to operate worldwide, but maybe they just won't operate in Europe?

Owling Howl
Jul 17, 2019

gurragadon posted:

I don't know what is going to happen to Europeans if the countries just keep suing AI companies for using there data. I would assume that OpenAI is just going to stop operating in Europe so they wont be under the laws of the European Union. I don't know if that is really a sustainable position for Europe because in the United States it dosen't seem like regulation is going to keep up so were going to see AI advance beyond regulation either way.

Not really disageeing with Italy and the EU trying to slow this down though. If they are able to use there privacy laws to give us a slight pause on AI development and deployment it could help with AI safety.

So, is this all just the EU saying stay out then or just making a statement? I know OpenAI would probably like to operate worldwide, but maybe they just won't operate in Europe?

If laws are broad and vague enough you could probably make it more trouble than it is worth. As others have mentioned it is already a useful tool so Europeans wouldn't benefit from that but depending on assessed risk that may be a fair trade-off. It isn't clear if Europeans wouldn't still face a lot of the problems though. Other countries are absolutely going to deploy it and use to attack democratic institutions, social networks and public trust like they already do now with other tools.

The arguments seems to focus mostly on copyright or bias. I think the copyright issue is a little forced, in that the data isn't stored or copied, but if using copyrighted works for training were forbidden we'd still end up with these systems. Less capable perhaps and it would take longer to deploy them but we'd end up in the same place. It would at best be a delay.

Bias is an issue but it's obviously also a problem for humans. It would be a problem if people assumed AI is unbiased, neutral and objective and it would be the exact same problem if people assumed other people were unbiased, neutral and objective. Human unreliability is why we regulate everything from hiring practices to public tenders and have a constant flow of discrimination lawsuits. It's not really clear why we shouldn't just regulate AI the same way we regulate regular decision making processes.

Char
Jan 5, 2013

Summit posted:

Preaching to the choir but wow that headline is bad. No such thing happened.

The title is completely misleading: the Privacy Guarantor (which is an independent entity to the Government) asked OpenAI if it could prove they're GDPR compliant (edit: citing specific issues, as I said in my previous post), and to release a technical document within 20 days to clarify how it does.
OpenAI has not yet responded to regulators; meanwhile, they have taken ChatGPT offline in Italy on Friday. "Offline" means web access has been made impossible to Italian IPs.
It can still be reached through APIs, though.

Char fucked around with this message at 17:28 on Apr 4, 2023

gurragadon
Jul 28, 2006

Owling Howl posted:

If laws are broad and vague enough you could probably make it more trouble than it is worth. As others have mentioned it is already a useful tool so Europeans wouldn't benefit from that but depending on assessed risk that may be a fair trade-off. It isn't clear if Europeans wouldn't still face a lot of the problems though. Other countries are absolutely going to deploy it and use to attack democratic institutions, social networks and public trust like they already do now with other tools.

The arguments seems to focus mostly on copyright or bias. I think the copyright issue is a little forced, in that the data isn't stored or copied, but if using copyrighted works for training were forbidden we'd still end up with these systems. Less capable perhaps and it would take longer to deploy them but we'd end up in the same place. It would at best be a delay.

Bias is an issue but it's obviously also a problem for humans. It would be a problem if people assumed AI is unbiased, neutral and objective and it would be the exact same problem if people assumed other people were unbiased, neutral and objective. Human unreliability is why we regulate everything from hiring practices to public tenders and have a constant flow of discrimination lawsuits. It's not really clear why we shouldn't just regulate AI the same way we regulate regular decision making processes.

I'm a little confused as to why OpenAI can't train ChatGPT on copyrighted works as long as they aren't just replicating the work wholesale. Anybody who is training to be a writer will train themselves on copyrighted works, and every work is derivative of some experience the writer had. No writer emerges from the ether to release fully new ideas into the world, they would need to have there own language to do it.

Obviously, when ChatGPT reproduces a Mickey Mouse script or Midjourney copies a picture of Mickey Mouse you have a claim. But to claim that something couldn't train on copyrighted materials would basically change how art and writing come to be. I mean if I couldn't train to write on anything not copyrighted, that would be like open source stuff and anything before 1924 I guess?

Edit: I know ChatGPT doesn't have personhood so it's not a direct 1 to 1 comparison. But I don't how else you could really train AI to be useful without copyrighted material. It would be so far behind, like an AI that only had information about a time before computers somehow.

gurragadon fucked around with this message at 18:11 on Apr 4, 2023

Main Paineframe
Oct 27, 2010

gurragadon posted:

I'm a little confused as to why OpenAI can't train ChatGPT on copyrighted works as long as they aren't just replicating the work wholesale. Anybody who is training to be a writer will train themselves on copyrighted works, and every work is derivative of some experience the writer had. No writer emerges from the ether to release fully new ideas into the world, they would need to have there own language to do it.

Obviously, when ChatGPT reproduces a Mickey Mouse script or Midjourney copies a picture of Mickey Mouse you have a claim. But to claim that something couldn't train on copyrighted materials would basically change how art and writing come to be. I mean if I couldn't train to write on anything not copyrighted, that would be like open source stuff and anything before 1924 I guess?

Edit: I know ChatGPT doesn't have personhood so it's not a direct 1 to 1 comparison. But I don't how else you could really train AI to be useful without copyrighted material. It would be so far behind, like an AI that only had information about a time before computers somehow.

If a product can't be made usable without breaking the law and trampling all over people's rights, then I don't see how that's a problem for anyone besides the company that made the product. I know we've all gotten very used to tech startups building business models around breaking the law and betting their lawyers can delay the consequences long enough for them to build a lobbying organization to change the laws, but let's not pretend that's a good thing.

But you're making a very big omission in your statement here. It's not that you can't train AIs on copyrighted works, it's that you have to get the permission of the copyright holders to train on their copyrighted works. That might be expensive or difficult, sure, but that's the cost of building something that's entirely dependent on other people's content. If you don't like the cost or the annoyance of buying or negotiating license rights, go make your own content like Netflix eventually ended up doing.

Hell, that even applies to human writers. They're paying for much of the copyrighted media they consume, or otherwise complying with the licensing conditions of those works. How much money have you spent on media (or had spent on your behalf, by parents or teachers or libraries) over your entire life? Even if they're really dedicated to piracy, they've still paid for more books (or movie tickets, or Netflix subscriptions, or whatever) than OpenAI Inc has.

SCheeseman
Apr 23, 2003

The argument that the threat of AI art is because of mass copyright infringement is a red herring, with the inevitable result of those advocating for enforcement of copyright being already entrenched IP holders building an even tighter legal and technological grip on the creation and distribution of artistic works.

The idea that the technology could be stopped using copyright is delusional. Maybe a long time ago copyright had the primary purpose of protecting artists rights, but that hasn't been the case for over a century. What will happen instead is the most powerful variations of the tech will be in the hands of those most willing to exploit it and copyright will be used as a cudgel to destroy anything attempting to compete with those holding that technology. Artists aren't going to be able to make a living getting residuals from feeding training data into a machine.

Disney wants the same outcome as you do, that Stable Diffusion lawsuit is backed by lawyers affiliated with them. Stop playing their game.

gurragadon
Jul 28, 2006

Main Paineframe posted:

If a product can't be made usable without breaking the law and trampling all over people's rights, then I don't see how that's a problem for anyone besides the company that made the product. I know we've all gotten very used to tech startups building business models around breaking the law and betting their lawyers can delay the consequences long enough for them to build a lobbying organization to change the laws, but let's not pretend that's a good thing.

But you're making a very big omission in your statement here. It's not that you can't train AIs on copyrighted works, it's that you have to get the permission of the copyright holders to train on their copyrighted works. That might be expensive or difficult, sure, but that's the cost of building something that's entirely dependent on other people's content. If you don't like the cost or the annoyance of buying or negotiating license rights, go make your own content like Netflix eventually ended up doing.

Hell, that even applies to human writers. They're paying for much of the copyrighted media they consume, or otherwise complying with the licensing conditions of those works. How much money have you spent on media (or had spent on your behalf, by parents or teachers or libraries) over your entire life? Even if they're really dedicated to piracy, they've still paid for more books (or movie tickets, or Netflix subscriptions, or whatever) than OpenAI Inc has.

Personally, I think copyright law is written in a way that tramples over people's rights and using information that is available in the world isn't trampling on peoples rights. Not from a tech start only point of view either, excessive copyright laws just stifle creativity and innovation in my opinion, which is where my stance is coming from. Everything is built on something else; you can't build something entirely independent.

Your point about people paying or not paying for the information we consume is a good one. Is there a way to quantify the amount of money an average person pays out for media of all kinds through there life? I get a ton of copyrighted material for free just by virtue of existing in society. I see images that are copyrighted, read copyrighted things and hear copyrighted music. But I DO pay for some it, that is extremely true.

Would that be a fair amount to pay for a "copy" of ChatGPT? ChatGPT is a written format specifically so I think it would be cheaper than other forms. I guess OpenAI could get a library card and that would open up a ton of writing. A lot of stuff that isnt available would be left out though unless OpenAI did specific agreements with universities and other institutions.

Mega Comrade
Apr 22, 2004

Listen buddy, we all got problems!

gurragadon posted:

Anybody who is training to be a writer will train themselves on copyrighted works...

I've seen this comparison before and ones like it and it seems to ignore that we have a lot of existing things that people can do, but companies cannot. Especially when it comes to copyright and licensing.

gurragadon
Jul 28, 2006

Mega Comrade posted:

I've seen this comparison before and ones like it and it seems to ignore that we have a lot of existing things that people can do, but companies cannot. Especially when it comes to copyright and licensing.

I was comparing how ChatGPT the program trains and the way that human writers train. They have the similarities that they read text and write text to get better at writing. Other technologies don't train in the same way, so it's a new concept in my mind.

Mega Comrade
Apr 22, 2004

Listen buddy, we all got problems!
New concept or not, it's an individual vs a profit motivated multi billion dollar backed company. I find it weird to compare them.

If you use a picture of Mickey mouse to help train yourself it's allowed. If a company put Mickey mouse in their training material for employees, it's not allowed. They have to pay a license for that.

And yes much of copyright is dumb and has been highjacked by huge corporations to protect work long after the original creator has died. But it's also used to protect every living creator and their livelihoods.

Mega Comrade fucked around with this message at 19:56 on Apr 4, 2023

Main Paineframe
Oct 27, 2010

gurragadon posted:

Personally, I think copyright law is written in a way that tramples over people's rights and using information that is available in the world isn't trampling on peoples rights. Not from a tech start only point of view either, excessive copyright laws just stifle creativity and innovation in my opinion, which is where my stance is coming from. Everything is built on something else; you can't build something entirely independent.

Your point about people paying or not paying for the information we consume is a good one. Is there a way to quantify the amount of money an average person pays out for media of all kinds through there life? I get a ton of copyrighted material for free just by virtue of existing in society. I see images that are copyrighted, read copyrighted things and hear copyrighted music. But I DO pay for some it, that is extremely true.

Would that be a fair amount to pay for a "copy" of ChatGPT? ChatGPT is a written format specifically so I think it would be cheaper than other forms. I guess OpenAI could get a library card and that would open up a ton of writing. A lot of stuff that isnt available would be left out though unless OpenAI did specific agreements with universities and other institutions.

What's excessive about this particular application of copyright law? I think it's totally reasonable to use copyright law to impede a for-profit company which wants to use other people's works for free without a license in its for-profit product, especially when the only argument I've seen in favor of waiving copyright for AI companies is "it's inconvenient and expensive to pay people for the use of their work".

In your day-to-day life, you experience copyrighted media that you didn't personally directly pay for, but that doesn't mean no one paid for it. You don't have to put money directly into the TV at the sports bar to watch the big game, but the sports bar is paying for a cable package, and that cost is part of the expenses that are passed on to customers as food prices. Even for stuff you've seen for free, sometimes people make it available for free for some formats or usages but charge for others.

That said, trying to seriously nail down how much money you've spent on media throughout your entire life is besides the point. After all, the actual question at hand is "should ChatGPT be paying for the media it's trained on?". It's a yes/no question. The actual amount is none of our business. If the answer is "yes", it's up to the media's owners to decide how much they're going to charge, just as it's up to OpenAI to decide how much to charge for usage of ChatGPT.

Another reason not to get hung up on nailing down your exact media spending is that it's unlikely that OpenAI would pay the same price you do. Regardless of whether AI training is similar to human learning or not, ChatGPT is not a human. It's a product. A monetized, for-profit product that charges people money to use it, and even its limited free usage is for the purpose of driving public interest and support to the for-profit company that owns it. It's fairly common for media creators to charge a higher price for works intended to be used in for-profit endeavors than they do for pieces for simple non-profit personal entertainment.

SCheeseman
Apr 23, 2003

You're continuing to miss the point that the biggest problems generative AI creates won't actually be solved by obstructing only a subset of the businesses wanting to develop and exploit this stuff. I'm all for legislation that actually helps people who had their lives ripped apart by automation (the actual problem that capitalism has been causing for a long time!), but all you're advocating for is maintaining an already hosed status quo.

Char
Jan 5, 2013
Main Paineframe expressed most of my opinions, I'd rather try to reverse the question now - why should a generative model be treated closer to how a human being is treated, rather than how a factory is treated? It "speaks like a person" but is still a factory of content.

Part of the evaluation of human "training datasets" is priced according to human capabilities, would you really let a generative model add a book into its dataset for 16$ when it can write 10-100 stories per day?

Probably, not letting anything model-generated be eligible for copyright would be a good start, even if pretty much impossible technically; even then, there would be exactly zero money to be made in the art business, therefore society would need to find a way to incentivize the producion of arts, or be condemned into being fed the iterative regurgitation of previously created art.

The whole point is that this stuff should not be seen as human. No "if a person is allowed to do this, then" - it's completely off base.

SCheeseman
Apr 23, 2003

Char posted:

Probably, not letting anything model-generated be eligible for copyright would be a good start, even if pretty much impossible technically; even then, there would be exactly zero money to be made in the art business, therefore society would need to find a way to incentivize the producion of arts, or be condemned into being fed the iterative regurgitation of previously created art.

Generative AI output being treated as uncopyrightable would make raw output ineligible but not derivative works based on them, not dissimilar to anything based on public domain works. Changing this so derivative works are not granted copyright would be practically unenforceable, given generators can already run on consumer hardware. There would be no way to tell.

The comparisons between AI and humans is more a thought experiment than anything that should be enshrined in law, at least not yet.

SCheeseman fucked around with this message at 20:30 on Apr 4, 2023

StratGoatCom
Aug 6, 2019

Our security is guaranteed by being able to melt the eyeballs of any other forum's denizens at 15 minutes notice


Char posted:

Main Paineframe expressed most of my opinions, I'd rather try to reverse the question now - why should a generative model be treated closer to how a human being is treated, rather than how a factory is treated? It "speaks like a person" but is still a factory of content.

Part of the evaluation of human "training datasets" is priced according to human capabilities, would you really let a generative model add a book into its dataset for 16$ when it can write 10-100 stories per day?

Probably, not letting anything model-generated be eligible for copyright would be a good start, even if pretty much impossible technically; even then, there would be exactly zero money to be made in the art business, therefore society would need to find a way to incentivize the producion of arts, or be condemned into being fed the iterative regurgitation of previously created art.

The whole point is that this stuff should not be seen as human. No "if a person is allowed to do this, then" - it's completely off base.

There is also another strong reason not to shift from where we stand now: it would allow giving the Clarkesworld treatment to the copyright system and it would happen fairly quickly; it takes copyright trolling from problematic to existential threat to the function of the system. Hell, there is a drat good argument for the least charitable copyright interpretations vis a vis AI specifically to avert this risk.

StratGoatCom fucked around with this message at 21:33 on Apr 4, 2023

gurragadon
Jul 28, 2006

Main Paineframe posted:

What's excessive about this particular application of copyright law? I think it's totally reasonable to use copyright law to impede a for-profit company which wants to use other people's works for free without a license in its for-profit product, especially when the only argument I've seen in favor of waiving copyright for AI companies is "it's inconvenient and expensive to pay people for the use of their work".

In your day-to-day life, you experience copyrighted media that you didn't personally directly pay for, but that doesn't mean no one paid for it. You don't have to put money directly into the TV at the sports bar to watch the big game, but the sports bar is paying for a cable package, and that cost is part of the expenses that are passed on to customers as food prices. Even for stuff you've seen for free, sometimes people make it available for free for some formats or usages but charge for others.

That said, trying to seriously nail down how much money you've spent on media throughout your entire life is besides the point. After all, the actual question at hand is "should ChatGPT be paying for the media it's trained on?". It's a yes/no question. The actual amount is none of our business. If the answer is "yes", it's up to the media's owners to decide how much they're going to charge, just as it's up to OpenAI to decide how much to charge for usage of ChatGPT.

Another reason not to get hung up on nailing down your exact media spending is that it's unlikely that OpenAI would pay the same price you do. Regardless of whether AI training is similar to human learning or not, ChatGPT is not a human. It's a product. A monetized, for-profit product that charges people money to use it, and even its limited free usage is for the purpose of driving public interest and support to the for-profit company that owns it. It's fairly common for media creators to charge a higher price for works intended to be used in for-profit endeavors than they do for pieces for simple non-profit personal entertainment.

For this application I think it is inappropriate to apply copyright law at all really. ChatGPT was trained on 45 terabytes of text, which is just an insanely large amount of data. I don't think anyone copyright owner can claim any kind of influence on the program itself. An individual text is just a tiny bit of data that doesn't exert any influence by itself, the program needs a huge amount of text to make patterns.

Maybe if your copyrighted material was tokenized in ChatGPT like some of these glitch tokens were that were posted in the previous ChatGPT thread. There were certain usernames from Reddit that were from a counting subreddit that ChatGPT made a token out of because they were seen so often and other things like that.

But if you made OpenAI get permission from every content creator to create the ChatGPT it would be prohibitively expensive in time and money. I mean you're right, that does read as me complaining it would be too inconvenient to pay people for their work. ChatGPT needs so much text, and it needs to be diverse if it going to be useful, that I think it is valid. It just seems to be fundamentally unworkable with the way AI technology works currently for one company to be able to pay copyright fees on every piece of writing ChatGPT saw.

Which if you don't think we should use AI programs or have them its fine and could be correct, but I don't know if copyright law is the way to go about it because it's kind of an indirect way of banning AI. It's definitely a big problem with ChatGPT and the likes devaluing art and writing.

Do you not have a problem with ChatGPT because it is free? It probably won't be forever, but would a theoretical free AI program trained this way be ok to use copyrighted data?


Char posted:

Main Paineframe expressed most of my opinions, I'd rather try to reverse the question now - why should a generative model be treated closer to how a human being is treated, rather than how a factory is treated? It "speaks like a person" but is still a factory of content.

Part of the evaluation of human "training datasets" is priced according to human capabilities, would you really let a generative model add a book into its dataset for 16$ when it can write 10-100 stories per day?

Probably, not letting anything model-generated be eligible for copyright would be a good start, even if pretty much impossible technically; even then, there would be exactly zero money to be made in the art business, therefore society would need to find a way to incentivize the producion of arts, or be condemned into being fed the iterative regurgitation of previously created art.

The whole point is that this stuff should not be seen as human. No "if a person is allowed to do this, then" - it's completely off base.

I think these models should be treated differently because they are different in the way they are constructed. The model needs the writing to become useful. I'm not really saying it should be treated like a human exactly, but the way it works is just so similar to how humans learn that I don't think we can have AI without giving it a lot of data.

The "If a person thing" is really valid and it is very off base for this current generation of AI programs. But the problem I have with it is that we don't know when it won't be so off base and the improvements are pretty quick, which makes it hard to avoid veering into for me.

StratGoatCom
Aug 6, 2019

Our security is guaranteed by being able to melt the eyeballs of any other forum's denizens at 15 minutes notice


gurragadon posted:


But if you made OpenAI get permission from every content creator to create the ChatGPT it would be prohibitively expensive in time and money. I mean you're right, that does read as me complaining it would be too inconvenient to pay people for their work. ChatGPT needs so much text, and it needs to be diverse if it going to be useful, that I think it is valid. It just seems to be fundamentally unworkable with the way AI technology works currently for one company to be able to pay copyright fees on every piece of writing ChatGPT saw.


Good.

If it can't operate within data law and basic ethics, then it shouldn't at all.

gurragadon
Jul 28, 2006

StratGoatCom posted:

Good.

If it can't operate within data law and basic ethics, then it shouldn't at all.

It's not, at least in Italy where they think it is violation of Italian law. I don't know what you mean by basic ethics though.

Did you read what I said next? I can agree they are a ton of downfalls with AI that we've seen just since like November when ChatGPT came out and we could use a good conversation with new laws surrounding AI specifically. Not that we are going to go all skynet, but we are going to put a ton of people out of work really fast.

Mega Comrade
Apr 22, 2004

Listen buddy, we all got problems!
There have been lots of technologies that didn't take off because they would be prohivitely expensive to run.

Why should society single out LLMs as an exception to this common occurrence?

Bar Ran Dun
Jan 22, 2006




There is an international competition between countries in AI. They aren’t going to let it be too expensive to use in the States even if that happens in Europe.

justcola
May 22, 2004

La-Li-Lu-Le-Lo

I really want to be pro-AI and automation as see it leading to the automation of a lot of jobs and for the system to continue existing the way it is, some sort of UBI must be in place to allow people to continue buying and selling poo poo. I like the idea of certain trades becoming more valuable due to humans valuing the time it takes to make something, as we've become so far separated from production through globalisation and the exploitation of people around the world - so to have labour valued in the same way in this hemisphere I'd hope would lead to an overall appreciation for labour internationally and for the potential abundance of wealth to be spread across all nations and allow people to be free from work, and what work there is, its a collaboration between machines and humans towards the betterment of all life on Earth.

But - I don't seem to get along with a lot of people are pro-AI, I find a lot of their arguments to be solid and stuff I generally agree with, but they can conduct themselves in a bit of a condescending manner online because machine supremacy seems so inevitable and obvious I guess? But they come across a bit snotty too, I dunno. It should be this thing that has no inherent value, but I feel the community developing around pro-AI stuff comes off in a similar way to other communities I don't get on with too much. Not saying this about anyone in this thread really, just thinking of poo poo on discord, youtube, whatever, I want to be with these AI dudes but they aren't chill. You know?

Main Paineframe
Oct 27, 2010

SCheeseman posted:

You're continuing to miss the point that the biggest problems generative AI creates won't actually be solved by obstructing only a subset of the businesses wanting to develop and exploit this stuff. I'm all for legislation that actually helps people who had their lives ripped apart by automation (the actual problem that capitalism has been causing for a long time!), but all you're advocating for is maintaining an already hosed status quo.

I'm not talking about obstructing business at all. I'm talking about big business having to pay for the stuff they use in their for-profit products, just like everyone else.

If that makes generative AI uneconomical, then so be it. But when I say AI companies should respect the rights of media creators and owners, I'm not saying that as some secret backdoor strategy to kill generative AI. I'm just saying that giving AI companies an exception to the rules everyone else has to follow is bullshit. Business shouldn't get an exception from rules and regulations simply because those regulations are inconvenient to their business model. And that's doubly true when it's big business complaining that the rules are getting in the way of their attempts to gently caress over the little guy.

gurragadon posted:

For this application I think it is inappropriate to apply copyright law at all really. ChatGPT was trained on 45 terabytes of text, which is just an insanely large amount of data. I don't think anyone copyright owner can claim any kind of influence on the program itself. An individual text is just a tiny bit of data that doesn't exert any influence by itself, the program needs a huge amount of text to make patterns.

Maybe if your copyrighted material was tokenized in ChatGPT like some of these glitch tokens were that were posted in the previous ChatGPT thread. There were certain usernames from Reddit that were from a counting subreddit that ChatGPT made a token out of because they were seen so often and other things like that.

But if you made OpenAI get permission from every content creator to create the ChatGPT it would be prohibitively expensive in time and money. I mean you're right, that does read as me complaining it would be too inconvenient to pay people for their work. ChatGPT needs so much text, and it needs to be diverse if it going to be useful, that I think it is valid. It just seems to be fundamentally unworkable with the way AI technology works currently for one company to be able to pay copyright fees on every piece of writing ChatGPT saw.

Which if you don't think we should use AI programs or have them its fine and could be correct, but I don't know if copyright law is the way to go about it because it's kind of an indirect way of banning AI. It's definitely a big problem with ChatGPT and the likes devaluing art and writing.

Do you not have a problem with ChatGPT because it is free? It probably won't be forever, but would a theoretical free AI program trained this way be ok to use copyrighted data?

I think these models should be treated differently because they are different in the way they are constructed. The model needs the writing to become useful. I'm not really saying it should be treated like a human exactly, but the way it works is just so similar to how humans learn that I don't think we can have AI without giving it a lot of data.

The "If a person thing" is really valid and it is very off base for this current generation of AI programs. But the problem I have with it is that we don't know when it won't be so off base and the improvements are pretty quick, which makes it hard to avoid veering into for me.

It's silly to say that any individual piece of text has no "influence" on ChatGPT. Clearly the text has some impact, or else OpenAI wouldn't have needed it in the first place. And if OpenAI needs that text, then they should have to pay for it. Trying to divine exactly how much of a part any individual item plays in the final dataset is a distraction. It played enough of a part that OpenAI put it in the training data. If OpenAI is using (in any way) content they don't have the rights to, they should have to pay for it (or at least ask permission to use it).

Laws, rules, and regulations make plenty of products impossible to market, and they make plenty business models essentially unworkable. If AI trainers can't figure out a way to train AIs in a manner that complies with existing law, that's their problem, not ours. I don't see any reason that we should give them an exception. My ears are deaf to the cries of corporate executives complaining that regulations are getting in the way of their profit margins. And anyway, you're thinking of it exactly the wrong way around. It's definitely possible to make generative AIs with copyright-safe datasets - again, this isn't some sort of backdoor ban. However, it's only practical to do so if the law is enforced against the companies that don't use copyright-safe datasets! Otherwise, the generative AIs trained on bigger and cheaper pirated datasets will inevitably be more profitable than the generative AIs trained on copyright-safe datasets.

The sticker price of ChatGPT is irrelevant. It's a for-profit service run by a for-profit corporation that's currently valued at about 20 billion US dollars. Stability AI is fundraising at a $4 billion valuation. Midjourney is valued at around a billion bucks. There's no plucky underdog in the AI industry, no poor helpless do-gooders being crushed by the weight of government regulation. It's just Uber all over again - ignore the law completely and hope that they'll be able to hold off the lawyers long enough to convince a big customer base to lobby for the laws to be changed.

IShallRiseAgain
Sep 12, 2008

Well ain't that precious?

Main Paineframe posted:

I'm not talking about obstructing business at all. I'm talking about big business having to pay for the stuff they use in their for-profit products, just like everyone else.

If that makes generative AI uneconomical, then so be it. But when I say AI companies should respect the rights of media creators and owners, I'm not saying that as some secret backdoor strategy to kill generative AI. I'm just saying that giving AI companies an exception to the rules everyone else has to follow is bullshit. Business shouldn't get an exception from rules and regulations simply because those regulations are inconvenient to their business model. And that's doubly true when it's big business complaining that the rules are getting in the way of their attempts to gently caress over the little guy.

It's silly to say that any individual piece of text has no "influence" on ChatGPT. Clearly the text has some impact, or else OpenAI wouldn't have needed it in the first place. And if OpenAI needs that text, then they should have to pay for it. Trying to divine exactly how much of a part any individual item plays in the final dataset is a distraction. It played enough of a part that OpenAI put it in the training data. If OpenAI is using (in any way) content they don't have the rights to, they should have to pay for it (or at least ask permission to use it).

Laws, rules, and regulations make plenty of products impossible to market, and they make plenty business models essentially unworkable. If AI trainers can't figure out a way to train AIs in a manner that complies with existing law, that's their problem, not ours. I don't see any reason that we should give them an exception. My ears are deaf to the cries of corporate executives complaining that regulations are getting in the way of their profit margins. And anyway, you're thinking of it exactly the wrong way around. It's definitely possible to make generative AIs with copyright-safe datasets - again, this isn't some sort of backdoor ban. However, it's only practical to do so if the law is enforced against the companies that don't use copyright-safe datasets! Otherwise, the generative AIs trained on bigger and cheaper pirated datasets will inevitably be more profitable than the generative AIs trained on copyright-safe datasets.

The sticker price of ChatGPT is irrelevant. It's a for-profit service run by a for-profit corporation that's currently valued at about 20 billion US dollars. Stability AI is fundraising at a $4 billion valuation. Midjourney is valued at around a billion bucks. There's no plucky underdog in the AI industry, no poor helpless do-gooders being crushed by the weight of government regulation. It's just Uber all over again - ignore the law completely and hope that they'll be able to hold off the lawyers long enough to convince a big customer base to lobby for the laws to be changed.

I think AI falls under the fair use category, its pretty hard to argue that what it is doing isn't transformative. There are super rare instances when AI can produce near copies, but this isn't something desirable and stems from an image being over-represented in the training data.

Also, the requirement to have a "copyright-safe" dataset just means media companies like Disney or governments will have control over the technology and nobody else will be able to compete. They have the rights to more than enough copyrighted content that they could build a pretty good model without paying a cent to artists. The only people that will be screwed over is the general public.

Although hopefully since the models are out there, no government will be able to contain AIs even if they crack down hard on them.

IShallRiseAgain fucked around with this message at 23:23 on Apr 4, 2023

StratGoatCom
Aug 6, 2019

Our security is guaranteed by being able to melt the eyeballs of any other forum's denizens at 15 minutes notice


IShallRiseAgain posted:

I think AI falls under the fair use category, its pretty hard to argue that what it is doing isn't transformative. There are super rare instances when AI can produce near copies, but this isn't something desirable and stems from an image being over-represented in the training data.

Also, the requirement to have a "copyright-safe" dataset just means media companies like Disney or governments will have control over the technology and nobody else will be able to compete. They have the rights to more than enough copyrighted content that they could build a pretty good model without paying a cent to artists. The only people that will be screwed over is the general public.

Although hopefully since the models are out there, no government will be able to contain AIs even if they crack down hard on them.

AI content CANNOT be copyrighted, do you not understand? Nonhuman processes cannot create copyrightable images under the current framework. You can copyright human amended AI output, but as the base stuff is for all intents and purposes open domain, if someone gets rid of the human stuff, it's free. The inputs do not change this. Useless for anyone with content to defend. And it would be a singularly stupid idea to change that, because it would allow a DDOS in essence against the copyright system and bring it to a halt under a wave of copyright trolling.

Stop listening to the AI hype dingdongs, they're as bad as the buttcoiners.

SCheeseman
Apr 23, 2003

Main Paineframe posted:

I'm not talking about obstructing business at all. I'm talking about big business having to pay for the stuff they use in their for-profit products, just like everyone else.

If that makes generative AI uneconomical, then so be it. But when I say AI companies should respect the rights of media creators and owners, I'm not saying that as some secret backdoor strategy to kill generative AI. I'm just saying that giving AI companies an exception to the rules everyone else has to follow is bullshit. Business shouldn't get an exception from rules and regulations simply because those regulations are inconvenient to their business model. And that's doubly true when it's big business complaining that the rules are getting in the way of their attempts to gently caress over the little guy.

Impede, obstruct, whatever. In any case what you want isn't going to make AI art generators uneconomical, it'll make the 'legal' ones economical only for the entrenched IP hoarders. What you want changes nothing about how people will in actuality be exploited and may even serve to make it worse!

SCheeseman fucked around with this message at 00:03 on Apr 5, 2023

cat botherer
Jan 6, 2022

I am interested in most phases of data processing.

SCheeseman posted:

Impede, obstruct, whatever. In any case what you want isn't going to make AI art generators uneconomical, it'll make the 'legal' ones economical only for the entrenched IP hoarders. What you want changes nothing about how people will in actuality be exploited and may even serve to make it worse!
:yeah:

Whatever your problem is, the answer is not "expand copyright protections."

IShallRiseAgain
Sep 12, 2008

Well ain't that precious?

StratGoatCom posted:

AI content CANNOT be copyrighted, do you not understand? Nonhuman processes cannot create copyrightable images under the current framework. You can copyright human amended AI output, but as the base stuff is for all intents and purposes open domain, if someone gets rid of the human stuff, it's free. The inputs do not change this. Useless for anyone with content to defend. And it would be a singularly stupid idea to change that, because it would allow a DDOS in essence against the copyright system and bring it to a halt under a wave of copyright trolling.

Stop listening to the AI hype dingdongs, they're as bad as the buttcoiners.

I'm not saying anything about it being copyrighted? I'm saying that AI generated content probably doesn't violate copyright because its transformative. The only reason it wouldn't is because fair use is so poorly defined.

I agree that AI images that are only prompts should not be copyrightable, because of the potential issues with that. It only takes a little bit of effort to generate AI content that is actually copyrightable.

StratGoatCom
Aug 6, 2019

Our security is guaranteed by being able to melt the eyeballs of any other forum's denizens at 15 minutes notice


IShallRiseAgain posted:

I'm not saying anything about it being copyrighted? I'm saying that AI generated content probably doesn't violate copyright because its transformative. The only reason it wouldn't is because fair use is so poorly defined.


Machines should not get the considerations people do, especially billion dollar company backed ones. And don't bring 'Fair use' into this, it's an anglo wierdness, not found elsewhere. If you can't pay to honestly compete - though not really meaningful, considering the lack of copyrightability consigns this tech to bubble status - then you can get hosed, it is really that simple.

StratGoatCom fucked around with this message at 00:11 on Apr 5, 2023

SCheeseman
Apr 23, 2003

StratGoatCom posted:

And don't bring 'Fair use' into this, it's an anglo wierdness, not found elsewhere.

Weirdness or not it exists and everyone in the US sphere of influence (unfortunately) beats to the drum of US copyright law. It's going to be a factor and trying to discredit discussion of it is disingenuous

IShallRiseAgain
Sep 12, 2008

Well ain't that precious?

StratGoatCom posted:

Machines should not get the considerations people do, especially billion dollar company backed ones. And don't bring 'Fair use' into this, it's an anglo wierdness, not found elsewhere.

Its people using AI to produce content though. Like I said before its not an issue for large corporations or governments. They already have access to the images to make their own dataset without having to pay a cent to artists. Its the general public that will suffer not the corporations.

Also, Fair Use is essential when copyright exists otherwise companies or individuals can suppress any criticism of the works they produce. Japan is a great example of this, and tons of companies regularly exploit the fact that Fair Use isn't a thing.

SCheeseman
Apr 23, 2003

Reverse engineering is a fair use argument too. The computing industry is pretty hosed, but it would be considerably more hosed without reverse engineering being a protected technique.

It's kind of incredible to see someone actually against it.

StratGoatCom
Aug 6, 2019

Our security is guaranteed by being able to melt the eyeballs of any other forum's denizens at 15 minutes notice


IShallRiseAgain posted:

Its people using AI to produce content though. Like I said before its not an issue for large corporations or governments. They already have access to the images to make their own dataset without having to pay a cent to artists. Its the general public that will suffer not the corporations.


Art is a luxury, one whose manufacture often keeps some very marginalized folks in house and home. I don't give a poo poo, especially as the companies have no real use for it, and indeed may go to lengths to ensure they're not exposed to it.

Adbot
ADBOT LOVES YOU

Owling Howl
Jul 17, 2019

SCheeseman posted:

Impede, obstruct, whatever. In any case what you want isn't going to make AI art generators uneconomical, it'll make the 'legal' ones economical only for the entrenched IP hoarders. What you want changes nothing about how people will in actuality be exploited and may even serve to make it worse!

Moreover others will make these systems available no matter what. It doesn't really help an artist that Microsofts system isn't trained on his art while a Russian or Chinese system is flooding the internet with infinity "in the style of" images.

Meanwhile there is a lot of stuff that isn't copyrighted. Everything on Wikipedia, Pixabay, tonnes of stuff on Flickr etc. Everyday there's more of it and there will be people who want these systems to exist and will contribute content for them. The tech giants can probably even afford to build up their own portfolios over time. Strip out all the copyrighted stuff and you'll still end up with these systems.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply