Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
NoiseAnnoys
May 17, 2010

Boris Galerkin posted:

Ancient languages may not work like that but maybe modern ones do? This is 100% anecdote, but I have Russian speaking friends from Russia, Kazakhstan, Ukraine, etc from back in grad school, from different social groups/people that didn’t socialize with each other. I swear that every single Russian speaking person I have ever asked about their language in terms of dialects or pronunciation and whatnot, every single person has said that they all sound the same. That there is no dialect and that all say and spell things the exact same way.

(Contrast this to asking a random American if they think they have an accent or not, and some might say no because they think they sound “neutral” or whatever. These Russian speaking friends straight up said to some extent that if they, a Russian speaking Kazakh were to be dropped off to some backwater place in Russia they would still sound “local.”)

I took 1 semester of Russian 101 just to learn how to read the alphabet in grad school cause I wanted to know how to pronounce the names of these scientists and engineers who’s equations I use so often etc. (Don’t worry this was just for me. I never butted into a convo with AKSHULLY the stress falls on the second syllable so you’re saying his name wrong.) From what I did learn, Russian is hyper rules based. Like, there is a simple and easy explanation for everything in a sentence, why this word ends in this instead of that ending, etc. More rules based than German but less than Latin. I just assumed this meant Latin was very structured without variation as well.

E: I guess I should also clarify that my Russian speaking friends I’m thinking of were all met in grad school so they were all highly educated and possibly above average wealthy as they all had the money, time, and ability to gently caress off to west Europe for school. So my anecdote here are going to be biased towards that end of the spectrum.

lol the whole canard of "russian has no dialects" is pretty much a disproven lie that haunts slavic departments. russians just don't like to talk about how, for example, a lot of people in various former republics don't speak hyper-correct moscow russian. but there is less dialectical spread in russia proper thanks to the authoritarian government's control of media and a centralized body limiting "accepted" innovations in vocabulary or pronounciation. but i know several native russian-speaking people here in the czech republic who definitely do not speak "proper" russian, and they're almost all from areas with significant contact with non-slavic populations.

there is nothing special about russian in regards to how language functions and the bolded just shows the blind spot which teachers of russian and nationalist russians love to repeat.

gurragadon posted:

I don't know anything about languages so this is all layman, but it seems incredible to me that a written language could have rules that are so strict that there would be no dialects between different areas. Maybe because of modern media then dialects would flatten over time as people speak more like the dominant dialect that is most common.

But Russian isn't influenced in local areas by the Kazakh language at all? I wonder if they mean the same thing about dialects when there speaking to you. I can understand somebody speaking with a British, Southern, or Philadelphian accent. But I've spoken to people in Scotland who had to speak more like a standard British accent for me to understand them. I could see Russian not having extreme variations like that, but no variation seems really hard to believe.

Im not a linguist though, so I could be wrong and if there are any linguists reading who could speak to the Russian language it would be interesting to learn more.

there is definitely a large area of contact and language transfer between certain non-slavic russian speaking populations and the russian language, but often it gets waived away as just "improper" russian, or russian "sociolects" instead of actual dialects. but prison or work camp slang is not pushkin, and as much as everyone's awful russian 101 teacher wants to pretend it doesn't exist, it does and did. then there's the naked racisim of not acknowledging that places like tajikistan developed their own dialect of russian because of reasons.

anyway, the best use of ai would be eliminating all horrible russian instructors in a skynet-esque preventative strike, but sadly, ai is too lovely to do that.

(also the easiest way to see how russian has dialects listen to how russian speakers from different areas pronounce various vowel sounds, north generally has less to no vowel reduction, second easiest are things like the use of айда or дон)

NoiseAnnoys fucked around with this message at 16:28 on May 24, 2023

Adbot
ADBOT LOVES YOU

KillHour
Oct 28, 2007


GlyphGryph posted:

There are quite a few standards, the dot test is one, but also contingency checking and unusual self-directed behaviour that doesn't occur when others of their species are put on the other side of a glass wall.

The relevant study here is “Are Ants (Hymenoptera, Formicidae) Capable of Self Recognition?” (Cammaerts, Marie-Claire, and Cammaerts)

It did all three mirror tests and found positive results for each of them, including the dot painting test.

There have been criticisms levied that the study is fraudulent, but I'm not aware of any attempts yet.

This is super interesting, thanks. I think I should point out this hedge in the paper though:

quote:

Even if our results suggest a certain degree
of self recognition in ants, they do not explain how ants
take and use such information, how then functions the
underlying cognitive processes, and if ants detain some
self awareness. For many animals, such an assumption is
not unanimous [39, 17]; for ants, we are conscious that it
might even be less plausible. Here, we only showed that
the assumption of some self recognition by ants, in front
of their reflection, is not unrealistic.

Still pretty crazy.

Reveilled
Apr 19, 2007

Take up your rifles

Boris Galerkin posted:

I see. It makes sense for early writing to be simple, lists, who owes who, etc. I didn’t realize that that was kinda all we had on Linear A. I don’t know much about it, I just pulled it out of my rear end as a classic example of an undeciphered script that may or may not have been related to languages we know today.

What about for stuff like the Voynich manuscript? IIRC it’s an entire book or two of undeciphered writing that may or may not have been the bored doodles of some random guy.

I would absolutely put money on someone feeding the Voynich manuscript into an AI to attempt to decipher it, in fact I'd be honestly surprised if there's not somewhere out there trying to do it right now. Guaranteed headlines in the media, that one.

I'm not really sure what it could do, though; it seems to me incredibly unlikely that the manuscript is in a language that existed in medieval times that we somehow have zero record of outside this book, which to me leaves only the ciphertext and gibberish hypotheses. If it's ciphertext, I've not heard of AI being able to do anything our own conventional codebreaking techniques couldn't muster, and if it's gibberish, nothing can decipher it.

Re: Russian, I have very little knowledge of Russian but I'd consider it very, very unlikely that there'd be no differences locally between Russian speakers. Possibly you might find that sort of phenomenon in the sense that, say, many siberian towns and cities are populated by people who moved from European Russia during and post-WWII so they have fairly standardised speech and amUscovite wouldn't sound out of place, but I'd be flabbergasted if there were no variations within European Russia except some very convenient bright line distinctions between Russian, Belarussian and Ukrainian. If nothing else, I've never met a person who lived in even a moderately sized city who couldn't tell the difference in accents between people from the rich part of town and the poor part.

NoiseAnnoys
May 17, 2010

i think a bigger part of the discussion that we're kind of overlooking is, that no language adheres to the rules it claims to have, because most of the rules are just desperate attempts to systematize something that is evolving.

native speakers of english violate the rules all the time, we just shrug it off as "stylistics". take the famous quote from shakespeare: "this was the most unkindest cut of all." what's wrong with it according to our modern evolution of the language?

again, we can understand something that is grammatically incorrect because we can infer or interpret meanings, but ai cannot as of yet and this is a fundamental part of reconstructing a language, as well as translating.

cat botherer
Jan 6, 2022

I am interested in most phases of data processing.
Languages are indeed fuzzy, which is probably why computational linguistics hasn't made nearly as much progress as simpler statistical models on things like machine translation.

Boris Galerkin
Dec 17, 2011

I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!
By rules I'm referring to grammatical rules like declension and conjugation of verbs. English is not really one of these languages, but there are languages that are very highly structured and by how a certain word ends you can 100% infer if that word was an adjective or not and if so what age/sociogroup it's referring to and how many, just by looking at one word.

NoiseAnnoys
May 17, 2010

Boris Galerkin posted:

By rules I'm referring to grammatical rules like declension and conjugation of verbs. English is not really one of these languages, but there are languages that are very highly structured and by how a certain word ends you can 100% infer if that word was an adjective or not and if so what age/sociogroup it's referring to and how many, just by looking at one word.

yeah in principle but that never happens 100% in practice. This is what I’m trying to tell you.

Russian dialects collapse certain cases in some areas. all Slavic languages do. hell, russian like most slavic languages outside of two exceptions (that i'm aware of, my structure of east and west slavic courses only covered so much) collapses the accusative and genitive for masculine animate singular nouns, and that's the codified "official" form of the language.

NoiseAnnoys fucked around with this message at 17:43 on May 24, 2023

cat botherer
Jan 6, 2022

I am interested in most phases of data processing.
English isn’t really “less structured” than Russian. What Russian conveys in conjugation and declension, English conveys with word order and sometimes more words, like auxiliary verbs. In linguistic terms, English is analytic in that it breaks things down, with a small ratio of morphemes (word parts) to words. Russian is the opposite in that it is a synthetic language. Speech in both languages can exist on a wide continuum of ambiguous to exact.

NoiseAnnoys
May 17, 2010

cat botherer posted:

English isn’t really “less structured” than Russian. What Russian conveys in conjugation and declension, English conveys with word order and sometimes more words, like auxiliary verbs. In linguistic terms, English is analytic in that it breaks things down, with a small ratio of morphemes (word parts) to words. Russian is the opposite in that it is a synthetic language. Speech in both languages can exist on a wide continuum of ambiguous to exact.

exactly, thank you.

again, don't get me wrong, i'd love for ai to help us crack some of these long undeciphered scripts and manuscripts, but considering the problems these translation apps have with living languages with absolutely huuuuuuge corpuses to draw from, the ais involved need to be way more powerful, or we need to find waaaaaaay more text/data to feed into them.

NoiseAnnoys fucked around with this message at 17:50 on May 24, 2023

gurragadon
Jul 28, 2006

NoiseAnnoys posted:

there is definitely a large area of contact and language transfer between certain non-slavic russian speaking populations and the russian language, but often it gets waived away as just "improper" russian, or russian "sociolects" instead of actual dialects. but prison or work camp slang is not pushkin, and as much as everyone's awful russian 101 teacher wants to pretend it doesn't exist, it does and did. then there's the naked racisim of not acknowledging that places like tajikistan developed their own dialect of russian because of reasons.

anyway, the best use of ai would be eliminating all horrible russian instructors in a skynet-esque preventative strike, but sadly, ai is too lovely to do that.

(also the easiest way to see how russian has dialects listen to how russian speakers from different areas pronounce various vowel sounds, north generally has less to no vowel reduction, second easiest are things like the use of айда or дон)

So, its basically just elitism and snobbery from the people in Moscow? Like how French people think (used to think?) that regional accents weren't really French.

cat botherer
Jan 6, 2022

I am interested in most phases of data processing.

NoiseAnnoys posted:

exactly, thank you.

again, don't get me wrong, i'd love for ai to help us crack some of these long undeciphered scripts and manuscripts, but considering the problems these translation apps have with living languages with absolutely huuuuuuge corpuses to draw from, the ais involved need to be way more powerful, or we need to find waaaaaaay more text/data to feed into them.
Given that we have no idea what language family Linear A (it's thought to be non-Indo-European) is from, I would be surprised if it isn't information-theoretically impossible to decipher with current evidence. Same deal with the Indus script :(.

gurragadon posted:

So, its basically just elitism and snobbery from the people in Moscow? Like how French people think (used to think?) that regional accents weren't really French.
Definitely a current thing. They're still assholes to French Canadians about it, even though French Canadians speak a French that is much closer to the standard French of a couple hundred years ago. Not that English speakers should talk with how people view dialects like AAVE.

e: ambiguous typo

cat botherer fucked around with this message at 18:09 on May 24, 2023

NoiseAnnoys
May 17, 2010

gurragadon posted:

So, its basically just elitism and snobbery from the people in Moscow? Like how French people think (used to think?) that regional accents weren't really French.

partially, but take this with a caveat that while i am a slavicist (among other things), i am not a specialist in east slavic, nor am i a practicing linguist anymore, though i attend the conferences.

part of the debate is that more and more linguists are pushing back against the prescriptivist idea of there being only one correct form of the language and instead arguing that there are sort of language continuums where a certain amount of variation is acceptable in communication even between native speakers of a language.

the other is the growing realization that hierarchical and prescriptivist language pedagogy is largely politically and socially driven, not really because it's the best way to teach a language, etc. this isn't to say that you teach everyone that whatever they want is okay, but it's re-orienting a some instruction in the fields i know to be foremost on phonetics and pronunciation for spoken communication, and gradually teaching the rules of written language, not focusing on charts and rules from the get go.

also, don't forget the numerous politically-minded language reforms that russian underwent in order to mold it in the form we have it today, whereas other slavic languages often developed quite differently. (czech for example consciously made its case system more robust in the 19th century, bulgarian dropped almost all inflection over centuries, and god only knows what the sorbs or rusyns are doing.)

NoiseAnnoys fucked around with this message at 20:33 on May 24, 2023

NoiseAnnoys
May 17, 2010

cat botherer posted:

Given that we have no idea what language family Linear A (it's thought to be non-Indo-European) is from, I would be surprised if it isn't information-theoretical to decipher with current evidence. Same deal with the Indus script :(.

Definitely a current thing. They're still assholes to French Canadians about it, even though French Canadians speak a French that is much closer to the standard French of a couple hundred years ago. Not that English speakers should talk with how people view dialects like AAVE.

yeah, i haven't really kept up with that saga, but last i heard linear a had some tentative phonetics and a few people arguing for a vso structure, but nothing more concrete. i'd love for someone to make a breakthrough there, but odds of that happening in my life time are slim. same with the IVC "texts" if they are such.

what is wild to me is that the ivc inventory of signs is comparatively massive. last i heard it was almost 500-ish distinct signs, no?

Iamgoofball
Jul 1, 2015

Clarste posted:

I super do not see how this actually matters. You input copyrighted material into the machine. Whether it happened before or after "training" is 100% irrelevant to the issue of whether we want that to be a thing and how we might stop it.

should we ban photocopiers because i can put a printed photograph of a nintendo game in a photocopier and then press a button and it gives me a near perfect copy of that nintendo game photograph?

SubG
Aug 19, 2004

It's a hard world for little things.

Reveilled posted:

I would absolutely put money on someone feeding the Voynich manuscript into an AI to attempt to decipher it, in fact I'd be honestly surprised if there's not somewhere out there trying to do it right now.
This has already happened multiple times. In fact "throw an algorithm at it" is an approach to the VM older than digital computers.

I'm not aware of any prior efforts involving specifically a ChatGPT-like LLM, but various machine learning techniques have been tried, usually resulting in confident proclamations that don't hold up under scrutiny by experts in the relevant subject matter (as opposed to machine learning experts). E.g., there was one a couple years back that used a machine learning approach and concluded that the VM was Hebrew written in an idiosyncratic script. This got reported breathlessly in the corners that breathlessly report on such things, until linguists who know Hebrew took a look at the results and started pointing out the multitude of problems with the analysis (model trained on modern Hebrew instead of 15th Century Hebrew, proposed Hebrew text reading like jumbled garbage, model assumes encryption via anagramming...which allows you to pick more or less any translation you want in an abjad, and so on).

The quarterly cryptographic journal Cryptologia has a "solution" to the Voynich Manuscript about every other issue, and in the past several years it's usually some kind of machine learning thing.

Jaxyon
Mar 7, 2016
I’m just saying I would like to see a man beat a woman in a cage. Just to be sure.
I'm not sure why people think that a LLM would be able to do that, given that it's not trained in Voynich or whatever.

As stated, it would make a guess that would likely be super wrong as soon as a human checked it's work.

KillHour
Oct 28, 2007


On the undeciphered language thing and whether it's a practical use case for LLMs, I would lean towards that being unlikely. The first reason has been covered already - they need lots and lots and lots of examples. They are going to be categorically bad at something we have very few examples of. Even for things they are "good" at, they aren't better than an expert at any individual thing, instead, they are quantity over quality. They can scan a text and answer questions about it way faster than any human ever could, and can do it forever without breaks or mistakes from fatigue/distraction - just like any other thing computers are "good" at. Basically, if a LLM could figure it out, an expert human also could. It's just that having expert humans look through millions of lines of text is error prone and expensive, so those are the applications LLMs will be best suited for.

The other thing that I didn't see mentioned is tokenization. These systems need text to be tokenized, and that tokenization is created by a different ML process that optimizes the tokens for a corpus. You can see where I'm going here - an undeciphered language isn't going to be included in that tokenization, so the characters from the text literally don't exist to enter them. Sure, you could map them to arbitrary existing tokens, but then the learned meanings of those tokens will pollute your data. I guarantee the choice of what tokens you map to will drastically change your results, which kind of makes the whole thing moot.


On the topic of biological intelligence, I just found this video, and it is incredibly fascinating.
https://www.youtube.com/watch?v=cufOEzoVMVA

It's a dense video and there is a lot in there (and a lot of it is honestly beyond my ability to speak confidently about), but here's my best summary along with the thing I want to talk about from it:

- Author is a computational neuroscience student / researcher
- Video explains how a Tolman-Eichenbaum Machine works. Basically, it's a machine meant to emulate the function of a hippocampus, by having a position encoding piece and a memory encoding piece.
- By creating these systems and training them on data that resembles sensory input from a real animal, you can compare the NN activations to brain scans from said real animals. I am not well versed in biological neuroscience at all, but when he showed the images of the neuron activations from the machine, my first thought was "holy poo poo those look exactly like pictures I've seen illustrating how biological brains process position data" and sure enough, the author claims you see neuron specialization broken down in ways that are very similar to real cells.
- (This is the part I want to talk about specifically): At one point, the video covered an experiment where the AI neural net was used to predict a certain type of cell activation, and then the researchers found that type of cell in rat brains. That's the exciting part to me - if we can look at an AI's "brain patterns" and use that to predict biological analogues, we might start to have the bare basics of some framework to describe a relationship between artificial brains and real ones. Potentially? I don't want to read too much into it because it's not my area, but that seems to be the conclusion being hinted at.
- Towards the end of the video, it is pointed out that TEMs are very closely related to transformer models, so it's hinted that there is a "promising link" between neural science and modern machine learning. I don't know if I buy that - it sounds very handwavy to me, but I figured I'd mention it.

Granted there are a lot of nuances here - obviously researchers didn't make anything nearly as complicated as a rat's actual hippocampus, and this is just an approximation based on a model that emulates a few functions we know about. But the fact that we can look at it and go "yeah, it's doing a lot of the same things we would expect a real brain to do in practice" is super exciting (to me).

KillHour fucked around with this message at 06:21 on May 25, 2023

Goa Tse-tung
Feb 11, 2008

;3

Yams Fan

Iamgoofball posted:

should we ban photocopiers because i can put a printed photograph of a nintendo game in a photocopier and then press a button and it gives me a near perfect copy of that nintendo game photograph?

no we tax or prohibit persons from profiting from that exact copy, we should do the same to AI

Tei
Feb 19, 2011

Jaxyon posted:

I'm not sure why people think that a LLM would be able to do that, given that it's not trained in Voynich or whatever.

As stated, it would make a guess that would likely be super wrong as soon as a human checked it's work.

Backtracking wold be a good argument, imo.

Many machine learning systems seems to work by starting by the result and trying to answer "what is question".



Like how a AI powered xerox machine get a blob of blurry data, and try to figure out what it means. Or NVIDIA DLSS technology in games.
We can treach "Voynitch" has the answer, and we may try to ask the computer to figure out where it come from.
Like maybe a ML can find that Voynitch is russian writen how japanese would pronounce it, if it where writted in the indian alphabet.




I have this book, and it says that to play a phonograph, you need the entire universe.

https://www.youtube.com/watch?v=BkHCO8f2TWs

I saw this dude TV serie, and he said that to make a cake you need the entire universe.


Perhaps the reason we can't translate the Voynich is because part of the original inforamtion is lost, so we can't "play" it again, the parts of the universe tha required to read it, are lost forever.

GlyphGryph
Jun 23, 2013

Down came the glitches and burned us in ditches and we slept after eating our dead.

Goa Tse-tung posted:

no we tax or prohibit persons from profiting from that exact copy, we should do the same to AI

That's the situation as it stands, though. If you use AI to make an exact copy or even something close and then try to profit from it, you are on the hook for copyright violation.

Count Roland
Oct 6, 2013

Goa Tse-tung posted:

no we tax or prohibit persons from profiting from that exact copy, we should do the same to AI

Yeah, which is the case now.

Not sure if that's your point or not, but this is how I view it. Why ban or restrict AI when the enforcement and outcomes are basically the same?

NoiseAnnoys
May 17, 2010

it's also entirely possible there was no information in the voynich manuscript whatsoever, so even if our hypothetical ai was advanced enough to decode a completely alien script and language from a corpus that consists of a single entry, there's a fairly good possibility there just isn't anything to decipher.

Tei
Feb 19, 2011

NoiseAnnoys posted:

it's also entirely possible there was no information in the voynich manuscript whatsoever, so even if our hypothetical ai was advanced enough to decode a completely alien script and language from a corpus that consists of a single entry, there's a fairly good possibility there just isn't anything to decipher.

yea but the stastiscal analisys said that it ressemble a human languaje (or so I heard) so even if is a elaborated joke, it seems based on a real languaje, maybe ofuscated


we are only commenting this por the possibily of AI to uncover long misteries like this one

Tei fucked around with this message at 17:46 on May 25, 2023

KillHour
Oct 28, 2007


Tei posted:

yea but the stastiscal analisys said that it ressemble a human languaje (or so I heard) so even if is a elaborated joke, it seems based on a real languaje, maybe ofuscated


we are only commenting this por the possibily of AI to uncover long misteries like this one

Given the vagueness of "AI" in regards to specific implementation, that's pretty much equivalent to asking "can math be used to solve x"? I know I'm stating the obvious, and I'm not really directing this at you, but more to remind everyone that AI isn't a singular thing that you use like a crystal ball, but a whole class of statistical techniques. Just like any other kind of math, you can get an "answer" out of banging equations together, but it won't get you anywhere if you don't have a justification for why those equations produce a useful result.

It's not a perfect comparison, but the hype around string theory comes to mind. String theory is a mathematic technique that you can apply, and it will give you "answers," but there's no way to prove or disprove that the answers it gives you mean anything in the sense that they can't be used to make predictions that you can experimentally verify or disprove. In the same way, if you stick an unknown piece of text into a LLM and it tells you "this is a joke about farts," what's really happening is you're putting X into some equation and getting "fart joke" out. Given that we have no way of knowing if the text is really about farts or not, all you can say is "math happened and the result is farts." It is a true statement, but it's not a particularly useful statement.

SubG
Aug 19, 2004

It's a hard world for little things.

Tei posted:

yea but the stastiscal analisys said that it ressemble a human languaje (or so I heard) so even if is a elaborated joke, it seems based on a real languaje, maybe ofuscated


we are only commenting this por the possibily of AI to uncover long misteries like this one
There's no "the" statistical analysis of the VN, there are many different statistical analyses, and they don't all point in the same direction. As a specific, concrete example, if you consider the frequency of "words" of a given length (that is, how many times 1-letter words are used, 2-letter words are used, and so on versus the total word count) you get a distribution similar to many European languages (e.g. Latin). But if you on the other hand consider the number of unique words of a given length (that is, how many unique 1-letter words, 2-letter words, and so on there are versus the total number of unique words) Voynichese doesn't look like any known language (it has roughly twice as many unique words compared to a Latin text of the same length). And in fact in the latter case the distribution is nearly identical to a nine random variable binomial distribution (that is, the distribution looks like what you'd get if you were picking words by flipping nine coins and picking a word based on the number of heads). As an example.

KillHour posted:

Given the vagueness of "AI" in regards to specific implementation, that's pretty much equivalent to asking "can math be used to solve x"? I know I'm stating the obvious, and I'm not really directing this at you, but more to remind everyone that AI isn't a singular thing that you use like a crystal ball, but a whole class of statistical techniques. Just like any other kind of math, you can get an "answer" out of banging equations together, but it won't get you anywhere if you don't have a justification for why those equations produce a useful result.
More strongly we can often say that we can't provide an answer, from a purely information theoretic standpoint. I don't think anyone has developed a formal theory for this around this for otherwise unknown languages, but when you're talking about finding a potsherd with "foo" written on it in an otherwise unknown script/language and not knowing if "foo" means "Bob was here" or "gently caress you" is not just a linguistic problem, it's intractable from first principles.

KillHour
Oct 28, 2007


SubG posted:

More strongly we can often say that we can't provide an answer, from a purely information theoretic standpoint. I don't think anyone has developed a formal theory for this around this for otherwise unknown languages, but when you're talking about finding a potsherd with "foo" written on it in an otherwise unknown script/language and not knowing if "foo" means "Bob was here" or "gently caress you" is not just a linguistic problem, it's intractable from first principles.

If you want to crack out the information theory, it's technically more like having a bell curve of possibilities that gets narrower as you have more information, such that you can never be completely sure, but the amount of data we have on it puts us somewhere between ¯\(°_o)/¯ and w(゚Д゚)w

SubG
Aug 19, 2004

It's a hard world for little things.

KillHour posted:

If you want to crack out the information theory, it's technically more like having a bell curve of possibilities that gets narrower as you have more information, such that you can never be completely sure, but the amount of data we have on it puts us somewhere between ¯\(°_o)/¯ and w(゚Д゚)w
With the VM? Nah. There's no "curve of possibilities" because there's nothing indicating what the underlying distribution is. We can estimate e.g. the entropy of the script and from that estimate the amount of information the VM encodes...but to a first order approximation that just tells us how much additional information we'd need in order to produce a meaningful "solution".

A lot of the approaches to understanding the VM start out with a hypothesis like "maybe it's actually Chinese" (or Hebrew or Vietnamese or whatever) because if that happens to be true then you get a huge amount of information about the text more or less for free. But all "solutions" of this form are explicitly predicated on the idea that Voynichese isn't an previously unknown language.

cat botherer
Jan 6, 2022

I am interested in most phases of data processing.

SubG posted:

With the VM? Nah. There's no "curve of possibilities" because there's nothing indicating what the underlying distribution is. We can estimate e.g. the entropy of the script and from that estimate the amount of information the VM encodes...but to a first order approximation that just tells us how much additional information we'd need in order to produce a meaningful "solution".

A lot of the approaches to understanding the VM start out with a hypothesis like "maybe it's actually Chinese" (or Hebrew or Vietnamese or whatever) because if that happens to be true then you get a huge amount of information about the text more or less for free. But all "solutions" of this form are explicitly predicated on the idea that Voynichese isn't an previously unknown language.
Probability theory and information theory are two sides of the same coin. KL divergence (either for variational Bayes or information gain from prior to posterior) is just relative entropy. The Bayesian optimal model is the one that has minimal message length. It's all the same thing but from different perspectives.

As you say, you never know what the underlying distribution is, but you also never can know the actual entropy of the script, because it depends on an unknown optimal code, or equivalently, a distribution, to describe it. Information entropy is just the expected value of the log probability, but that requires knowledge of the probability distribution to calculate in the first place. Thus, information theory is inseparable from probability. No matter what, there is some kind of assumptions that must be made, and the anything we infer is colored by those decisions.

cat botherer fucked around with this message at 00:10 on May 26, 2023

SubG
Aug 19, 2004

It's a hard world for little things.

cat botherer posted:

Probability theory and information theory are two sides of the same coin. KL divergence (either for variational Bayes or information gain from prior to posterior) is just relative entropy. The Bayesian optimal model is the one that has minimal message length. It's all the same thing but from different perspectives.

As you say, you never know what the underlying distribution is, but you also never can know the actual entropy of the script, because it depends on an unknown optimal code, or equivalently, a distribution, to describe it. Information entropy is just the expected value of the log probability, but that requires knowledge of the probability distribution to calculate in the first place. Thus, information theory is inseparable from probability. No matter what, there is some kind of assumptions that must be made, and the anything we infer is colored by those decisions.
I understand what you're trying to say, but I don't see how this contradicts anything I said. The issue is that any estimation of the entropy of Voynichese, and therefore information in the VM, just tells us how much there is, not what it is. Put in slightly different terms: it lets us figure out how to compress the text, not how to decrypt/translate it.

cat botherer
Jan 6, 2022

I am interested in most phases of data processing.

SubG posted:

I understand what you're trying to say, but I don't see how this contradicts anything I said. The issue is that any estimation of the entropy of Voynichese, and therefore information in the VM, just tells us how much there is, not what it is. Put in slightly different terms: it lets us figure out how to compress the text, not how to decrypt/translate it.
That’s very true. However, a good parsimonious (eg good compression) statistical model of the manuscript would be more-or-less optimal for describing the essential features. If the model structure were interpretable linguistically, it would give you meaningful linguistic and semantic information. Of course, it’s probably impossible to come up with any such well-reasoned model in the case of the Voynich manuscript. We don’t know enough about it to propose any kind of meaningful prior (we’re obviously in agreement there).

SubG
Aug 19, 2004

It's a hard world for little things.
Yes. And since we lack any additional information about the "meaning" of the VN (assuming it has one), then we don't have a "curve of possibilities" of possible translations/decryptions.

And really the only way I see machine learning helping with this is by providing the ability to sorta brute force investigate possible external correlations that might otherwise not be discovered except by chance. The "rediscovering Menander's plays because pages from them were used to repair a mummy" kind of thing.

KillHour
Oct 28, 2007


SubG posted:

Yes. And since we lack any additional information about the "meaning" of the VN (assuming it has one), then we don't have a "curve of possibilities" of possible translations/decryptions.

And really the only way I see machine learning helping with this is by providing the ability to sorta brute force investigate possible external correlations that might otherwise not be discovered except by chance. The "rediscovering Menander's plays because pages from them were used to repair a mummy" kind of thing.

I was being imprecise, sorry. What I meant to say is that you can assign a statement like "Foo means KillHour is the best poster" a probability of being true, but not only is that going to be really low, the error bars on your probability are also huge to the point where any random guess is basically as good as any other.

Tei
Feb 19, 2011

KillHour posted:

Given the vagueness of "AI" in regards to specific implementation, that's pretty much equivalent to asking "can math be used to solve x"? I know I'm stating the obvious, and I'm not really directing this at you, but more to remind everyone that AI isn't a singular thing that you use like a crystal ball, but a whole class of statistical techniques.

AI is also a algorithm that check every possible solution of a problem exausting all of them, then spit outs the solution.

AI is not just ML the way they are being build just now. AI can be heuristic and tools that use the power of the computer to amplify the mental muscle of humans to reach things we could not do manually.

A bad trait of the current uses of AI is that try to solve a problem that did not need solving and don't synnergize with humans.

AI ART is just a bad idea, we already have artists making art and we don't need to turn that mechanical. Is not some disgusting work that is best given to machines and not operated by humans. AI ART do very littel to enhance the art humans already do (is not zero, but is small).

The "AI is just math" oversimplification, that I am NOT against, I just don't support.

KillHour
Oct 28, 2007


Tei posted:

AI is also a algorithm that check every possible solution of a problem exausting all of them, then spit outs the solution.

This is extremely not true. There are algorithms that do that, but what most people call "AI" isn't doing that at all, or anything even remotely close to that.

Tei
Feb 19, 2011

KillHour posted:

This is extremely not true. There are algorithms that do that, but what most people call "AI" isn't doing that at all, or anything even remotely close to that.

haha


I can't imagine your opinion on a expert system that is just a database of solutions where the algorithm is somewhat like "SELECT solution FROM expert_system__medical_problems WHERE issue in ('bad cough', 'fever') and issue in ('bad cough', 'fever')"

Edit:
I am not against the "everything is Math", I just don't do it.

Tei fucked around with this message at 11:02 on May 26, 2023

KwegiboHB
Feb 2, 2004

nonconformist art brut
Negative prompt: amenable, compliant, docile, law-abiding, lawful, legal, legitimate, obedient, orderly, submissive, tractable
Steps: 32, Sampler: DPM++ 2M Karras, CFG scale: 11, Seed: 520244594, Size: 512x512, Model hash: 99fd5c4b6f, Model: seekArtMEGA_mega20

Tei posted:

haha


I can't imagine your opinion on a expert system that is just a database of solutions where the algorithm is somewhat like "SELECT solution FROM expert_system__medical_problems WHERE issue in ('bad cough', 'fever') and issue in ('bad cough', 'fever')"

Oh you mean like "SELECT Oxycontin FROM *"?

Bar Ran Dun
Jan 22, 2006




Most Decision Support systems are basically just automated flow charts with checklists that some of the information feeds into automatically and some of the actions happen automatically.

I mean some of them are impressive. Some of the cruise ship ones are really neat to watch in action. But they don’t even have to be computerized. Others are just binders full of cards. Some are set up almost like a cyoa book. If the fire is in the engine compartment goto card 56.

Bar Ran Dun
Jan 22, 2006




Modern SMS (safety management systems) are this type of expert system, to some extent it in a literal sense is what a bureaucracy is. The root of the word bureau it refers to desks and cabinets in an office, the storage place of the physical documents that made up what we now call management or expert systems.

Bar Ran Dun fucked around with this message at 17:58 on May 26, 2023

Bar Ran Dun
Jan 22, 2006




Sorry to keep posting on in following posts but historically the origins of these expert systems… and of bureaucracy itself are colonialism. Not it’s like colonialism. The administration of colonies… this is practically how it’s done. The other place it starts is vessels. These systems are how you have a vessel on the other side of the world being operated in the standardized manner Capital wants it operated in.

These are tools created as systems for colonialism and capitalism.

Adbot
ADBOT LOVES YOU

StratGoatCom
Aug 6, 2019

Our security is guaranteed by being able to melt the eyeballs of any other forum's denizens at 15 minutes notice


SCheeseman posted:

Consciousness doesn't mean an innate desire to be free. We don't weep for ants toiling under a queen.

Queens are often somewhat subordinate to the workers in eusocial species. Not always, but often.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply