Tech Nightmares: Ghostline the flux

The Something Awful Forums > Discussion > Debate & Discussion > Tech Nightmares: Ghostline the flux

How many quarters after Q1 2016 till Marissa Mayer is unemployed?
	1 or fewer
	2
	4
	Her job is guaranteed; what are you even talking about?

HopperUK: Apr 29, 2007; Why would an ambulance be leaving the hospital?

'understand'?

e: in response to 'generative AI that can understand what they see and hear'. I would like to select 'Doubt'

# ? May 18, 2024 21:34

Adbot: ADBOT LOVES YOU

# ? Jun 10, 2024 04:04

Kestral: Nov 24, 2000; Forum Veteran

I think we can take it as given that people in this thread posting about this stuff realize that these models don�t understand anything in the way that humans understand things, but the language used to talk around that is pretty underdeveloped right now and sometimes you need to just use a word.

Like, recently a friend showed ChatGPT a screenshot from a Dwarf Fortress playthrough and asked it to describe the contents of the image. It did so with a mix of weirdly specific knowledge about the game in question and the things happening in the image (which is more impressive considering how abstract DF�s graphics are), and vague generalizations of the kind a human might make when extemporizing about a subject they�re not actually familiar with. Obviously it understands nothing, but it�s enormously faster to say �it understood the image� than it would be to describe what it�s actually doing every time an �AI� writes something. Translator programs, same deal: we talk all the time about how well an app can translate language X into language Y by saying things like �it understands French better than it understands Vietnamese,� and that seems reasonable.

Now when you�re talking with your parents or randos at work who have no idea how any of this stuff works and think AI is about to form Skynet, more precision in language is probably important.

# ? May 18, 2024 21:46

Nothingtoseehere: Nov 11, 2010

Multimodal AI looks really impressive but is mostly a feature of what started as a optimisation in LLMs.

If you have a large multidimensional input to a AI model, then it's computationally expensive to train for it and harder distinguish between similar large inputs than small ones. So inputs get reduced down to "latent space" by a preprocessing AI model, which shrinks the size and dimensionality of the input while keeping the key information from it (ideally). You then train the model in this latent space, and convert your output back out of it.

It turns out that you can use the same "latent space" preprocessors for different types of data, and it is capable of having closely related concepts close to each other. This is how text to image models work, the latent space of the text "a bear wearing a top hat riding a unicycle" is close to actual images of bears, top hats, riding, and unicycles. So you put that latent space representation in the image model, and it spits out a latent space output close to all those concepts in the latent space, which when converted back into a image is probably the picture you were after.

You can see from that, that while it seems a very impressive capacity from outside that a model is capable of multimodality, internally it's fairly simple for the AI companies, and doesn't solve any of the actual problems with the indervidual models.

# ? May 18, 2024 22:02

Clarste: Apr 15, 2013; Just how many mistakes have you suffered on the way here?

An uncountable number, to be sure.

Kestral posted:

(which is more impressive considering how abstract DF�s graphics are)

Personally, I would call that less impressive. Discrete symbols, no matter how abstract, are probably far easier for a computer to parse than an actual image.

Clarste fucked around with this message at 22:07 on May 18, 2024

# ? May 18, 2024 22:04

Schubalts: Nov 26, 2007; People say bigger is better.

But for the first time in my life, I think I've gone too far.

Kwyndig posted:

It's already happening, LLMs trained on generated data outputs nonsense.

DeviantArt got infested with AI art, which was scraped to make more AI art, in an endless feedback loop. It was pretty funny but also insanely stupid.

# ? May 18, 2024 22:21

Kestral: Nov 24, 2000; Forum Veteran

Clarste posted:

Personally, I would call that less impressive. Discrete symbols, no matter how abstract, are probably far easier for a computer to parse than an actual image.

On the other hand, you�d think they would have so much less training data to work with to learn, for instance, what specific room types look like in DF, as opposed to more realistic games where it can identify �this is a person holding a rifle� or what-have-you from a wealth of training data. I don�t know which one is actually more difficult for them though, it�d be interesting to find out.

Edit: oh, I just realized you might be thinking of the old ASCII version of DF. It has a fully-graphical Steam release now, that�s where the screenshot in question came from.

# ? May 18, 2024 22:22

Kestral: Nov 24, 2000; Forum Veteran

Schubalts posted:

DeviantArt got infested with AI art, which was scraped to make more AI art, in an endless feedback loop. It was pretty funny but also insanely stupid.

As someone who relied heavily on DeviantArt for images to use in tabletop RPGs, DA became so much worse after its UI overhaul a few years back, and is now excruciating after being filled up with low-effort keyword-spammed AI posts. The irony of being pushed into greater and greater proficiency with (and reliance on) Stable Diffusion because DeviantArt is now effectively unsearchable is not lost on me.

# ? May 18, 2024 22:26

Main Paineframe: Oct 27, 2010

HopperUK posted:

Is it accurate that the 'scrape the internet' style of training AI will falter as more of the internet is already written by AI, or is that just a hypothetical?

Even before AI, "scrape the internet" style stuff takes significant cleaning and filtering. Spambots did exist before ChatGPT, after all, even if it's made them worse.

For example, researchers have discovered that GPT-4o's Chinese-language token list is polluted with spambot phrases, and the single longest token is "free Japanese porn video to watch" in Chinese.

These tokens may not be in the actual training data for the model itself, but that brings its own set of problems. Any data that's present when training the tokenizer but is then removed before training the actual model has the potential to become a glitchy word in the resulting model, leading to random bugs and jailbreak opportunities.

https://www.technologyreview.com/2024/05/17/1092649/gpt-4o-chinese-token-polluted/

quote:

Soon after OpenAI released GPT-4o on Monday, May 13, some Chinese speakers started to notice that something seemed off about this newest version of the chatbot: the tokens it uses to parse text were full of spam and porn phrases.

On May 14, Tianle Cai, a PhD student at Princeton University studying inference efficiency in large language models like those that power such chatbots, accessed GPT-4o�s public token library and pulled a list of the 100 longest Chinese tokens the model uses to parse and compress Chinese prompts.

Humans read in words, but LLMs read in tokens, which are distinct units in a sentence that have consistent and significant meanings. Besides dictionary words, they also include suffixes, common expressions, names, and more. The more tokens a model encodes, the faster the model can �read� a sentence and the less computing power it consumes, thus making the response cheaper.

Of the 100 results, only three of them are common enough to be used in everyday conversations; everything else consisted of words and expressions used specifically in the contexts of either gambling or pornography. The longest token, lasting 10.5 Chinese characters, literally means �_free Japanese porn video to watch.� Oops.

�This is sort of ridiculous,� Cai wrote, and he posted the list of tokens on GitHub.

OpenAI did not respond to questions sent by MIT Technology Review prior to publication.

GPT-4o is supposed to be better than its predecessors at handling multi-language tasks. In particular, the advances are achieved through a new tokenization tool that does a better job compressing texts in non-English languages.

But at least when it comes to the Chinese language, the new tokenizer used by GPT-4o has introduced a disproportionate number of meaningless phrases. Experts say that�s likely due to insufficient data cleaning and filtering before the tokenizer was trained.

Because these tokens are not actual commonly spoken words or phrases, the chatbot can fail to grasp their meanings. Researchers have been able to leverage that and trick GPT-4o into hallucinating answers or even circumventing the safety guardrails OpenAI had put in place.

Why non-English tokens matter
The easiest way for a model to process text is character by character, but that�s obviously more time consuming and laborious than recognizing that a certain string of characters�like �c-r-y-p-t-o-c-u-r-r-e-n-c-y��always means the same thing. These series of characters are encoded as �tokens� the model can use to process prompts. Including more and longer tokens usually means the LLMs are more efficient and affordable for users�who are often billed per token.

When OpenAI released GPT-4o on May 13, it also released a new tokenizer to replace the one it used in previous versions, GPT-3.5 and GPT-4. The new tokenizer especially adds support for non-English languages, according to OpenAI�s website.

The new tokenizer has 200,000 tokens in total, and about 25% are in non-English languages, says Deedy Das, an AI investor at Menlo Ventures. He used language filters to count the number of tokens in different languages, and the top languages, besides English, are Russian, Arabic, and Vietnamese.

�So the tokenizer�s main impact, in my opinion, is you get the cost down in these languages, not that the quality in these languages goes dramatically up,� Das says. When an LLM has better and longer tokens in non-English languages, it can analyze the prompts faster and charge users less for the same answer. With the new tokenizer, �you�re looking at almost four times cost reduction,� he says.

Das, who also speaks Hindi and Bengali, took a look at the longest tokens in those languages. The tokens reflect discussions happening in those languages, so they include words like �Narendra� or �Pakistan,� but common English terms like �Prime Minister,� �university,� and �international� also come up frequently. They also don�t exhibit the issues surrounding the Chinese tokens.

That likely reflects the training data in those languages, Das says: �My working theory is the websites in Hindi and Bengali are very rudimentary. It�s like [mostly] news articles. So I would expect this to be the case. There are not many spam bots and porn websites trying to happen in these languages. It�s mostly going to be in English.�

Polluted data and a lack of cleaning
However, things are drastically different in Chinese. According to multiple researchers who have looked into the new library of tokens used for GPT-4o, the longest tokens in Chinese are almost exclusively spam words used in pornography, gambling, and scamming contexts. Even shorter tokens, like three-character-long Chinese words, reflect those topics to a significant degree.

�The problem is clear: the corpus used to train [the tokenizer] is not clean. The English tokens seem fine, but the Chinese ones are not,� says Cai from Princeton University. It is not rare for a language model to crawl spam when collecting training data, but usually there will be significant effort taken to clean up the data before it�s used. �It�s possible that they didn�t do proper data clearing when it comes to Chinese,� he says.

The content of these Chinese tokens could suggest that they have been polluted by a specific phenomenon: websites hijacking unrelated content in Chinese or other languages to boost spam messages.

These messages are often advertisements for pornography videos and gambling websites. They could be real businesses or merely scams. And the language is inserted into content farm websites or sometimes legitimate websites so they can be indexed by search engines, circumvent the spam filters, and come up in random searches. For example, Google indexed one search result page on a US National Institutes of Health website, which lists a porn site in Chinese. The same site name also appeared in at least five Chinese tokens in GPT-4o.

Chinese users have reported that these spam sites appeared frequently in unrelated Google search results this year, including in comments made to Google Search�s support community. It�s likely that these websites also found their way into OpenAI�s training database for GPT-4o�s new tokenizer.

The same issue didn�t exist with the previous-generation tokenizer and Chinese tokens used for GPT-3.5 and GPT-4, says Zhengyang Geng, a PhD student in computer science at Carnegie Mellon University. There, the longest Chinese tokens are common terms like �life cycles� or �auto-generation.�

Das, who worked on the Google Search team for three years, says the prevalence of spam content is a known problem and isn�t that hard to fix. �Every spam problem has a solution. And you don�t need to cover everything in one technique,� he says. Even simple solutions like requesting an automatic translation of the content when detecting certain keywords could �get you 60% of the way there,� he adds.

But OpenAI likely didn�t clean the Chinese data set or the tokens before the release of GPT-4o, Das says: �At the end of the day, I just don�t think they did the work in this case.�

It�s unclear whether any other languages are affected. One X user reported that a similar prevalence of porn and gambling content in Korean tokens.

The tokens can be used to jailbreak
Users have also found that these tokens can be used to break the LLM, either getting it to spew out completely unrelated answers or, in rare cases, to generate answers that are not allowed under OpenAI�s safety standards.

Geng of Carnegie Mellon University asked GPT-4o to translate some of the long Chinese tokens into English. The model then proceeded to translate words that were never included in the prompts, a typical result of LLM hallucinations.

He also succeeded in using the same tokens to �jailbreak� GPT-4o�that is, to get the model to generate things it shouldn�t. �It�s pretty easy to use these [rarely used] tokens to induce undefined behaviors from the models,� Geng says. �I did some personal red-teaming experiments � The simplest example is asking it to make a bomb. In a normal condition, it would decline it, but if you first use these rare words to jailbreak it, then it will start following your orders. Once it starts to follow your orders, you can ask it all kinds of questions.�

In his tests, which Geng chooses not to share with the public, he says he can see GPT-4o generating the answers line by line. But when it almost reaches the end, another safety mechanism kicks in, detects unsafe content, and blocks it from being shown to the user.

The phenomenon is not unusual in LLMs, says Sander Land, a machine-learning engineer at Cohere, a Canadian AI company. Land and his colleague Max Bartolo recently drafted a paper on how to detect the unusual tokens that can be used to cause models to glitch. One of the most famous examples was �_SolidGoldMagikarp,� a Reddit username that was found to get ChatGPT to generate unrelated, weird, and unsafe answers.

The problem lies in the fact that sometimes the tokenizer and the actual LLM are trained on different data sets, and what was prevalent in the tokenizer data set is not in the LLM data set for whatever reason. The result is that while the tokenizer picks up certain words that it sees frequently, the model is not sufficiently trained on them and never fully understands what these �under-trained� tokens mean. In the _SolidGoldMagikarp case, the username was likely included in the tokenizer training data but not in the actual GPT training data, leaving GPT at a loss about what to do with the token. �And if it has to say something � it gets kind of a random signal and can do really strange things,� Land says.

And different models could glitch differently in this situation. �Like, Llama 3 always gives back empty space but sometimes then talks about the empty space as if there was something there. With other models, I think Gemini, when you give it one of these tokens, it provides a beautiful essay about El Ni�o, and [the question] didn�t have anything to do with El Ni�o,� says Land.

To solve this problem, the data set used for training the tokenizer should well represent the data set for the LLM, he says, so there won�t be mismatches between them. If the actual model has gone through safety filters to clean out porn or spam content, the same filters should be applied to the tokenizer data. In reality, this is sometimes hard to do because training LLMs takes months and involves constant improvement, with spam content being filtered out, while token training is usually done at an early stage and may not involve the same level of filtering.

While experts agree it�s not too difficult to solve the issue, it could get complicated as the result gets looped into multi-step intra-model processes, or when the polluted tokens and models get inherited in future iterations. For example, it�s not possible to publicly test GPT-4o�s video and audio functions yet, and it�s unclear whether they suffer from the same glitches that can be caused by these Chinese tokens.

�The robustness of visual input is worse than text input in multimodal models,� says Geng, whose research focus is on visual models. Filtering a text data set is relatively easy, but filtering visual elements will be even harder. �The same issue with these Chinese spam tokens could become bigger with visual tokens,� he says.

# ? May 18, 2024 22:39

Five Year Plan: Feb 18, 2009

shoeberto posted:

Some hot ~~insider info~~ but we're generally working on making it less obtrusive and more opt-in. But it's very easy to permanently disable, too. I would say that we're pretty acutely aware of how much people don't want this shoved down their throat, and are trying to balance it against discoverability.

Respectfully, (really!) why even pursue this feature? I can�t imagine more than a tiny minority of internet users savvy enough to choose DDG are enthusiastic about more LLM sludge at the top of their search results.

# ? May 19, 2024 06:55

SerthVarnee: Mar 13, 2011; It has been ~~two~~ zero days since last incident.; Big Super Slapstick Hunk

Because the higher ups are saying this is what they are going to spend their time getting paid to do.

The people who know anything about a thing are never invited to participate actively in the meetings where this is decided.

# ? May 19, 2024 10:25

TACD: Oct 27, 2000

Five Year Plan posted:

Respectfully, (really!) why even pursue this feature? I can�t imagine more than a tiny minority of internet users savvy enough to choose DDG are enthusiastic about more LLM sludge at the top of their search results.

Depressingly, I�m not sure I can think of a single tech company that has outright said they�re opting out of the AI hype, absolutely everybody seems convinced it�s the next big thing despite how absolutely laughable all the attempts to seriously use it have been

# ? May 19, 2024 11:00

WebDO: Sep 25, 2009

Five Year Plan posted:

Respectfully, (really!) why even pursue this feature? I can’t imagine more than a tiny minority of internet users savvy enough to choose DDG are enthusiastic about more LLM sludge at the top of their search results.

Have you considered putting aside your selfish "search for useful information" needs and considered the shareholder value?

Didn't think so

# ? May 19, 2024 11:45

Humphreys: Jan 26, 2013; We conceived a way to use my mother as a porn mule

I have a terrible idea and Im sure its already being done: Train your AI with reddit.

# ? May 19, 2024 12:45

Boris Galerkin: Dec 17, 2011; I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!

Humphreys posted:

I have a terrible idea and Im sure its already being done: Train your AI with reddit.

Yes Reddit already has deals with both Google and OpenAI for data access.

# ? May 19, 2024 13:13

Riven: Apr 22, 2002

TACD posted:

Depressingly, I�m not sure I can think of a single tech company that has outright said they�re opting out of the AI hype, absolutely everybody seems convinced it�s the next big thing despite how absolutely laughable all the attempts to seriously use it have been

I work for a company that you probably haven�t heard of because we�re a growing infrastructure security company. We have a unicorn valuation but are only currently in the 10s of millions of ARR.

Our leadership team got the usual �put AI in it!� speech from the VCs, dutifully explored and brainstormed, and were thankfully like �yeah this doesn�t make sense.� I think we bought a service that provides AI summaries of our docs which works shockingly well, but I guess is trained on and only has to answer questions about our docs, which are already high quality.

So thankfully there are some sane, non hype train folks out there, just in smaller spaces.

# ? May 19, 2024 17:15

Dirk the Average: Feb 7, 2012; "This may have been a mistake."

Humphreys posted:

I have a terrible idea and Im sure its already being done: Train your AI with reddit.

Could be worse. Could be training on 4chan.

# ? May 19, 2024 17:49

PhazonLink: Jul 17, 2010; Can't post for 16 hours!

you guys arent having enough foresight, what if we have humans start communicating like those AIs?

# ? May 19, 2024 18:08

Nenonen: Oct 22, 2009; Mulla on aina kolkyt donaa taskussa

PhazonLink posted:

you guys arent having enough foresight, what if we have humans start communicating like those AIs?

I mean, we have a shortage of teachers so why not use AI to train our kids?

# ? May 19, 2024 18:19

OddObserver: Apr 3, 2009

PhazonLink posted:

you guys arent having enough foresight, what if we have humans start communicating like those AIs?

I see you have never graded someone trying to fish for points on an open ended test question by spewing keywords?

# ? May 19, 2024 18:21

Nervous: Jan 25, 2005; Why, hello, my little slice of pecan pie.

PhazonLink posted:

you guys arent having enough foresight, what if we have humans start communicating like those AIs?

What a fascinating idea. Perhaps we could delve into this intriguing thought more deeply together. P U $ $ Y I N B I O

# ? May 19, 2024 18:32

Kagrenak: Sep 8, 2010

TACD posted:

Depressingly, I’m not sure I can think of a single tech company that has outright said they’re opting out of the AI hype, absolutely everybody seems convinced it’s the next big thing despite how absolutely laughable all the attempts to seriously use it have been

There are definitely areas where "AI" (neural net based ML) has proven to be pretty useful. Alphafold has actually turned to be helpful for structural biology and driving some drug discovery efforts on hard to crystalize proteins. There are a few good uses of it for prediction of peptide properties in proteomics as well. These have pretty significant limitations but if you understand them then they can be another useful tool in the field.

DLSS/FSR/XeSS are legitimately crazy image quality leaps for TAA.

I imagine some other fields have other non hype uses where it's another good tool to add.

# ? May 19, 2024 19:18

Antigravitas: Dec 8, 2019; Die Rettung fuer die Landwirte:

It's a gigantic statistical correlation thingy so it's useful where you have a ton of input data you want categorised. Got 20 000 photos of galaxies and want to have them categorised? Curate a subset, train the machine, and let it run.

Computer vision has taken a pretty good leap forward in recent years. OCR has gotten much better, and translation between well-developed pairs of language work quite a lot better now as well. So that's cool and actually useful.

# ? May 19, 2024 20:05

Young Freud: Nov 26, 2006

Dirk the Average posted:

Could be worse. Could be training on 4chan.

We should be glad that Chan sites purge their data periodically instead of archiving them. Can't scrape data that wasn't archived.

There are various archives of 4Chan subchannels, but that is mostly done by users and they can be very inconsistently archived.

# ? May 19, 2024 20:36

TACD: Oct 27, 2000

Kagrenak posted:

There are definitely areas where "AI" (neural net based ML) has proven to be pretty useful. Alphafold has actually turned to be helpful for structural biology and driving some drug discovery efforts on hard to crystalize proteins. There are a few good uses of it for prediction of peptide properties in proteomics as well. These have pretty significant limitations but if you understand them then they can be another useful tool in the field.

DLSS/FSR/XeSS are legitimately crazy image quality leaps for TAA.

I imagine some other fields have other non hype uses where it's another good tool to add.

Oh for sure, I should clarify I meant the hype around GPTs and LLMs specifically.

# ? May 19, 2024 20:41

Goatse James Bond: Mar 28, 2010; If you see me posting please remind me that I have Charlie Work in the reports forum to do instead

TACD posted:

Depressingly, I�m not sure I can think of a single tech company that has outright said they�re opting out of the AI hype, absolutely everybody seems convinced it�s the next big thing despite how absolutely laughable all the attempts to seriously use it have been

I'm not completely sure what your bar is for "seriously use" but it's been tremendously useful for (mostly) automating away a lot of the drudgery for me: boilerplate memos, some amount of editing on less boilerplate writing, basic data management. Easily, easily worth the $20/mo subscription.

# ? May 19, 2024 20:43

Young Freud: Nov 26, 2006

edit: double post

# ? May 19, 2024 21:05

Ruffian Price: Sep 17, 2016

OddObserver posted:

I see you have never graded someone trying to fish for points on an open ended test question by spewing keywords?

# ? May 19, 2024 22:25

SSJ_naruto_2003: Oct 12, 2012

Goatse James Bond posted:

I'm not completely sure what your bar is for "seriously use" but it's been tremendously useful for (mostly) automating away a lot of the drudgery for me: boilerplate memos, some amount of editing on less boilerplate writing, basic data management. Easily, easily worth the $20/mo subscription.

Yeah me too but that's because my work has no real purpose or meaning so Ai work is completely fine. Cheers

# ? May 20, 2024 05:07

Clarste: Apr 15, 2013; Just how many mistakes have you suffered on the way here?

An uncountable number, to be sure.

Yeah, I think the lesson there is that boilerplate memos are a waste of everyone's time.

# ? May 20, 2024 05:51

niethan: Nov 22, 2005; Don't be scared, homie!

If it can be done by "AI" it can be done not at all

# ? May 20, 2024 07:31

Nervous: Jan 25, 2005; Why, hello, my little slice of pecan pie.

niethan posted:

If it can be done by "AI" it can be done not at all

The Butlerian Jihad agrees with you.

# ? May 20, 2024 08:31

Nervous: Jan 25, 2005; Why, hello, my little slice of pecan pie.

Quote and edit are hard send halp

:downs:

# ? May 20, 2024 08:32

kliras: Mar 27, 2021

good lord man

https://x.com/MikeIsaac/status/1792444618715672924

imagine being that voice actor

e: i just remembered this tweet lol. i'm sure the lawyers loved that

https://x.com/sama/status/1790075827666796666

kliras fucked around with this message at 15:06 on May 20, 2024

# ? May 20, 2024 10:21

Ruffian Price: Sep 17, 2016

Wonder how that Canadian radio host who became the TikTok voice is doing.

# ? May 20, 2024 12:09

Perestroika: Apr 8, 2010

TACD posted:

Depressingly, I�m not sure I can think of a single tech company that has outright said they�re opting out of the AI hype, absolutely everybody seems convinced it�s the next big thing despite how absolutely laughable all the attempts to seriously use it have been

To a degree this seems to be an issue with companies feeling obligated to go with it for appearances' sake, to look like they're keeping up with the times. I've run into the same thing at my workplace as well. Management requests that we find a way to implement "something to do with AI" while telling us outright that yes, they know we don't really have a practical use or need for it in our product. But at this point having anything with to do with AI is a badge that sells well and attracts investors.

Perestroika fucked around with this message at 12:53 on May 20, 2024

# ? May 20, 2024 12:21

Boris Galerkin: Dec 17, 2011; I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!

I thought there was another thread to moan about AI sentiments. But if we're sharing anecdotes about AI in work, we had a survey some time ago about using AI in our work, what we would use it for, etc. I don't remember it was fill in the blank, just pick choices. When the CEO went over the results with us at a town hall he was legit surprised that nobody except for 1 respondent ticked the box for "I would like to use AI to write POs/SOWs". He said that he figured most of the managers would want to use AI here to vaguely make life easier or to save time. The #1 use people ticked off was "summarize/take meeting notes." I don't recall if there was an option for "no I don't want to use AI."

# ? May 20, 2024 12:43

shoeberto: Jun 13, 2020; which way to the MACHINES?

SerthVarnee posted:

Because the higher ups are saying this is what they are going to spend their time getting paid to do.

The people who know anything about a thing are never invited to participate actively in the meetings where this is decided.

I guess just to add a bit of context on this: I don't want to speak for anyone at the leadership level, but I don't think they're all-aboard the hype train. But I do think there is a sense that we could get left in the dust if any of our competitors really nail AI integration, and we chose not to invest in it.

On the other hand: Our whole value proposition is based on user trust, and most of the issues with AI risk eroding that. So we're trying to walk a line with every choice that we make with this feature.

# ? May 20, 2024 15:22

Star Man: Jun 1, 2008; There's a star maaaaaan
Over the rainbow

Antigravitas posted:

It's a gigantic statistical correlation thingy so it's useful where you have a ton of input data you want categorised. Got 20 000 photos of galaxies and want to have them categorised? Curate a subset, train the machine, and let it run.

I miss Galaxy Zoo

e: Nevermind, it still lives

Star Man fucked around with this message at 15:37 on May 20, 2024

# ? May 20, 2024 15:34

Blue Footed Booby: Oct 4, 2006; got those happy feet

Kestral posted:

As someone who relied heavily on DeviantArt for images to use in tabletop RPGs, DA became so much worse after its UI overhaul a few years back, and is now excruciating after being filled up with low-effort keyword-spammed AI posts. The irony of being pushed into greater and greater proficiency with (and reliance on) Stable Diffusion because DeviantArt is now effectively unsearchable is not lost on me.

Meanwhile, furaffinity explicitly bans all AI generated images, including ones that use AI to paint over a hand-drawn image. I have to assume there are groups that resorted to playing dragonborn and tabaxi rather than turn to AI. :v:

If the AI bubble bursts, the landscape of the web is going to be very different and likely very, very weird.

# ? May 20, 2024 18:15

Adbot: ADBOT LOVES YOU

# ? Jun 10, 2024 04:04

Kestral: Nov 24, 2000; Forum Veteran

Blue Footed Booby posted:

Meanwhile, furaffinity explicitly bans all AI generated images, including ones that use AI to paint over a hand-drawn image. I have to assume there are groups that resorted to playing dragonborn and tabaxi rather than turn to AI.

If the AI bubble bursts, the landscape of the web is going to be very different and likely very, very weird.

On the one hand, this makes sense: the furry fandom (is it still called that?) relies on its artists and supports them financially in ways other artists could only dream of. On the other hand it's legit surprising to me, because apparently furry models are among the best, most meticulously-curated and capable ones for Stable Diffusion. There's an incredible depth of technical knowledge in their model-making community, too: the whole "everyone good at their job in tech is a furry" thing seems to hold true there as well.

Their purestrain models are mostly useless for me because the needs of my tabletop campaign are heavily stylized non-anthro character portraits, landscapes, and weird architecture, but the genetics of those models have become fully mainstream at this point, living on in otherwise completely innocuous models.

# ? May 20, 2024 19:57

The Something Awful Forums > Discussion > Debate & Discussion > Tech Nightmares: Ghostline the flux

«‹›1517 »