Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Trig Discipline
Jun 3, 2008

Please leave the room if you think this might offend you.
Grimey Drawer
instead of continuing to fill up the idiot spare time projects thread with my literal nonsense, it was suggested i make a new thread about it so here it is

inspired by sarah palin's most recent word salad, i decided to make a markov text bot to generate virtual sarah palin quotes. turns out this is an idea a million other people have had, because anyone who knows both palin and markov text bots sees the rather obvious connection. however, this is only step 1 in our voyage.

oh before we get started here's what a markov text generator is

https://en.wikipedia.org/wiki/Markov_chain posted:

A Markov chain (discrete-time Markov chain or DTMC), named after Andrey Markov, is a random process that undergoes transitions from one state to another on a state space. It must possess a property that is usually characterized as "memorylessness": the probability distribution of the next state depends only on the current state and not on the sequence of events that preceded it. This specific kind of "memorylessness" is called the Markov property. Markov chains have many applications as statistical models of real-world processes.

blah blah blah a bunch of poo poo about math and then something about baseball?

Markov processes can also be used to generate superficially real-looking text given a sample document: they are used in a variety of recreational "parody generator" software (see dissociated press, Jeff Harrison, Mark V Shaney).

These processes are also used by spammers to inject real-looking hidden paragraphs into unsolicited email and post comments in an attempt to get these messages past spam filters.

In the bioinformatics field, they can be used to simulate DNA sequences.

okay, so basically the procedure is this: you take a source text, you break it down into n-grams. an n-gram is a string of words of length n. so basically you just take every pairing of sequential words, stick them in a table, and count how often they happen. using this n-gram table, you just pick a random starting point. then you calculate the probability of the next n-gram based solely on the current n-gram, select the next n-gram based on an RNG and that table of probabilities, and just keep doing that until you get tired.

all very simple, but i'm still lazy as a motherfuck so i'm just using the ngram package in R. that looks like this:

code:
infile <- file("palin.txt")
diarrhea.in <- paste(readLines(infile), collapse = " ") 
palin.ngram <- ngram(diarrhea.in, 2)
palin.babble <- babble(palin.ngram)
where "palin.txt" is just a bunch of palin interviews, speeches, and debate performances stuffed into a text file. that gives you this:

quote:

"“pay-to-play.” Between bailouts for Wall Street cronies and stimulus projects or, as someone put it, this was all about Denali, mom, dad, ungulate eyeballs, slaying salmon on the floor of the world really works in order to accomplish after he's done turning back the waters and healing the planet? The answer is to challenge the status quo has got to call the devastation that a bill wouldn't be signed into law before we probably even got that first revolution.” We are the ones, right? You’re the ones who pay the bills in our enemies — proving peace through strength. In that respect, I applaud the president and his American dream endures. He knew the best of America are open, unfortunately though, some would want you to succeed too. And that we love. We’re here to stop that they inherited. Real reform never sits well with entrenched interests and power brokers. "

so it's basically perfect

NEXT UP: FROM SHITPOSTER TO TWITPOSTER

Adbot
ADBOT LOVES YOU

Trig Discipline
Jun 3, 2008

Please leave the room if you think this might offend you.
Grimey Drawer
okay it's fun to generate nonsense for your friends and relatives, but what you really want to do is spread your garbage far and wide. of course that means you need twitter, the web's primary outlet for meaningless garbage. luckily R has a p great package for accessing the twits, called twitteR. first you need to register an app on twitter and get your keys though. that means:

1. register an account for your garbage bot
2. while logged into your garbage bot account, go to apps.twitter.com
3. click "create new app"
4. fill in some details here:



5. click "i agree" to the user agreement, which probably says that you're not going to do the things we're about to do. i don't know for sure because i never read it, i just like agreeing to stuff
6. go to the tab that says "keys and access tokens". you'll need to generate a token. then you need to copy down the gibberish after Consumer Key (API Key), Consumer Secret (API Secret), and then Access Token and Access Token Secret, lower down on the page. now twitteR can talk to twitter

code:
library(twitteR)


options(httr_oauth_cache=T)

apikey <- "apikey"
apisecret <- "apisecret"
token <- "token"
tokensecret <- "tokensecret"
setup_twitter_oauth(apikey, apisecret, token, tokensecret)
you now have the ability to do all sorts of stuff: post, search, get trending hashtags, etc. we're just going to set sarah up to post her garbage to the tweets, so all we need to do is take that babble, do some regex on it to get vaguely sentence-like objects of under 140 characters, and tweet them. thusly

code:
palin.babble <- babble(palin.ngram)
sentences <- c()
sentence.starts <- as.vector(gregexpr("[?.!] +[A-Z]", palin.babble)[[1]])
for(i in 1:(length(sentence.starts) - 1)){
   this.sentence <- substr(palin.babble, sentence.starts[i]+2, sentence.starts[i+1])
   if(nchar(this.sentence) <= 140){
      sentences <- c(sentences, this.sentence)
   }
}
tweet(sentences[1])
then hook that into a loop that runs every 30 minutes or something

result: mostly tedious gibberish but sometimes something entertaining comes out

https://twitter.com/markov_palin/status/690819879537131520

https://twitter.com/markov_palin/status/690818464269869056

https://twitter.com/markov_palin/status/690856828675104771

https://twitter.com/markov_palin/status/693176875078660096


and sometimes something chilling

https://twitter.com/markov_palin/status/690823441524604928




might as well make a markov trump while we're at it, it's basically just a matter of plugging in a new text file

result: markov trump is feeling romantic

https://twitter.com/markov_trump/status/693132349282787329



but not so romantic that he can't still be a brutal dictator

https://twitter.com/markov_trump/status/693011475791790081

https://twitter.com/markov_trump/status/692935871893471232

NEXT UP: ROOTING THROUGH THE TRASH

Trig Discipline fucked around with this message at 00:02 on Feb 14, 2016

Trig Discipline
Jun 3, 2008

Please leave the room if you think this might offend you.
Grimey Drawer
okay how can we get that extra dose of twitter realness? answer: harvest twitter itself

turns out we can again do that super-easy with the twitteR package, just by using the userTimeline function

code:
trump.tweets <- twListToDF(userTimeline('realDonaldTrump', n=3200))$text
write(new.tweets, file="trumptweets.txt")
now we have a text file with all of donald's tweets, which we can load in and append to our speeches, debates, and interviews. this results in true social media engagement

https://twitter.com/markov_trump/status/691789588885544960



now let's get sarah in on the action

https://twitter.com/markov_palin/status/693288342524284928

https://twitter.com/markov_palin/status/693282630360432640

https://twitter.com/markov_palin/status/693244859478515712

https://twitter.com/markov_palin/status/693169316821233664

UP NEXT: KEEPING UP WITH CURRENT EVENTS

Trig Discipline fucked around with this message at 00:08 on Feb 14, 2016

Trig Discipline
Jun 3, 2008

Please leave the room if you think this might offend you.
Grimey Drawer
okay, so we're generating meaningless chaos, but we still want to keep up with the hottest trends and news items. we can use the tm text mining packages and its various derivatives to see what's going on in the news

we're going to do this in the context of generating prophecies. we're going to mix Revelations, The Necronomicon, The Egyptian Book of the Dead, and Nostradamus with today's hot news and hashtags

code:
library(tm)
library(tm.plugin.webmining)

googlenews <- WebCorpus(GoogleNewsSource("Microsoft"))

googlenews.in <- paste(unlist(lapply(googlenews$content, function(x) x$content)), collapse = " ")
googlenews.in <- gsub("\\n", " ", googlenews.in)
googlenews.in <- gsub("([\\])", " ", googlenews.in)

yahoonews <- WebCorpus(YahooNewsSource("Microsoft"))

yahoonews.in <- paste(unlist(lapply(yahoonews$content, function(x) x$content)), collapse = " ")
yahoonews.in <- gsub("\\n", " ", yahoonews.in)
yahoonews.in <- gsub("([\\])", " ", yahoonews.in)
that gets us the text of the top stories on google and yahoo news. then we need our holy texts

code:
infile <- file("holy.txt")
holy.in <- readLines(infile)
holy.in <- paste(holy.in, collapse=" ")
and finally we need the hot hot tweets. since twitter gets really shirty when you get too much info at once, i'm just grabbing the top 100 tweets from the 20 most popular hashtags in the USA (that's the number in the getTrends arguments)

code:
tweets <- c()
trends <- getTrends(23424977)
for(i in 1:20){
   thistag <- trends[i, 1]
   print(paste("Harvesting tag", i, ":", thistag))
   these.tweets <- searchTwitter(thistag, 100)
   these.tweets <- paste(twListToDF(these.tweets)$text, collapse = " ")
   tweets <- paste(tweets, these.tweets, collapse = " ")
}
then you basically stuff all of that stuff into a single character vector, ngram it, and markov_thebeast is born

https://twitter.com/markov_thebeast/status/697993864607498241

https://twitter.com/markov_thebeast/status/698103558290280448

https://twitter.com/markov_thebeast/status/698043117576957953


NEXT UP: THE INTERNET BARFS UP ITS OWN rear end in a top hat

Trig Discipline fucked around with this message at 00:21 on Feb 14, 2016

Trig Discipline
Jun 3, 2008

Please leave the room if you think this might offend you.
Grimey Drawer
for our (current) final iteration, we're going to turn twitter into a literal echo chamber. in response to the above, cheese-cube posted the following

cheese-cube posted:

has anyone made a twitter bot that makes markov chains using the tweets of users who follow it? idk could either be terrible or funny.

which is a frickin' genius idea. turns out to be pretty easy to do, too! this one is named markov_polov

polov runs as two separate scripts. one just grabs all of the followers of the twitter account, then scrapes their tweets and does some regex stuff to get rid of special characters. it also strips URLs so it doesn't end up reposting whatever weird porn you guys are passing around on twitter. it waits five minutes between searches so that twitter doesn't boot it off. then, after it's done one pass of all of the users, it writes their tweets to a text file. the second script just reads that text file every ten minutes, builds an ngram table, and spouts some bullshit

code:
while(1){
    mp <- getUser('markov_polov')
    followers <- mp$getFollowers()
    follower.tweets <- c()
    for(i in 1:length(followers)){
        print(paste("Grabbing tweets from", followers[[i]]$getScreenName()))
        follower.tweets <- paste(follower.tweets, paste(twListToDF(userTimeline(followers[[i]], n=3200))$text, collapse = " "), collapse = " ")
        print(paste("Got", nchar(follower.tweets), "characters so far..."))
        Sys.sleep(300)
    }


    tweets <- gsub("http[^[:space:]]*", "", follower.tweets)
    tweets <- gsub('\\\\n', "", tweets, perl=TRUE)
    tweets <- gsub('\\n', "", tweets, perl=TRUE)
    tweets <- gsub("([\\])", " ", tweets)
    tweets <- gsub("([\"])", " ", tweets)
    tweets <- gsub(" , ", " ", tweets)
    tweets <- iconv(tweets, "latin1", "ASCII", sub="")
    write(tweets, file="tweets.txt")
}
and the result is lovely

https://twitter.com/markov_polov/status/698366265946038272

https://twitter.com/markov_polov/status/698489628798488577

https://twitter.com/markov_polov/status/698604570893660160

https://twitter.com/markov_polov/status/698651563330383873


particularly when it catches someone who doesn't know wtf is going on

https://twitter.com/Pleasure__Kevin/status/698621578817343489

bonus: it has already passed the australian turing test by becoming self-aware enough to complain about telstra

https://twitter.com/Telstra/status/698479305781698560

Trig Discipline fucked around with this message at 00:45 on Feb 14, 2016

echinopsis
Apr 13, 2004

by Fluffdaddy
:69snypa:

Bloody
Mar 3, 2013

pity reply

echinopsis
Apr 13, 2004

by Fluffdaddy
also all you morones not on the twitter... get on it and follow this sweet motehr fucker

pram
Jun 10, 2001

Bloody posted:

pity reply

Trig Discipline
Jun 3, 2008

Please leave the room if you think this might offend you.
Grimey Drawer
side note: markov trump's followers are mostly actual trump supporters now who seem to have no idea that it's a bot. loving amazing

Jonny 290
May 5, 2005



[ASK] me about OS/2 Warp
oh my god

https://twitter.com/markov_polov/status/698654084681768960

craisins
May 17, 2004

A DRIIIIIIIIIIIIVE!

fishmech?

Jonny 290
May 5, 2005



[ASK] me about OS/2 Warp
jesus dont say its name. if it starts tweeting runescape thats him too

Trig Discipline
Jun 3, 2008

Please leave the room if you think this might offend you.
Grimey Drawer
oh yeah, a few notes:

* once it hits an ngram from a statistically unusual sentence, it has a tendency to repeat the rest of the sentence verbatim. the longer the corpus gets (i.e., the more users in markov polov's case), the less this happens

* the ngram package in R is buggy as gently caress, so every one of these bots just dies and hard-crashes R at random intervals. since all of the ngram processing is done via C calls and since i am both (1) lazy and (2) a poo poo C programmer, i am just restarting the bots when they die instead of fixing the issue. i suppose i could just write my own ngram package for R, but see point (1)

* because of the way the twitscraper script works for markov polov, the twitscraping gets five minutes slower for each new user. if you follow the bot, it may be a few hours or even days before your tweets get incorporated

Trig Discipline fucked around with this message at 00:59 on Feb 14, 2016

graph
Nov 22, 2006

aaag peanuts

goldmine

oh no blimp issue
Feb 23, 2011

i wrote a markov tweet generator years ago when i was bored at work, it was pretty funny for a while as i fed my friends tweets into it

Trig Discipline
Jun 3, 2008

Please leave the room if you think this might offend you.
Grimey Drawer
oh i've also been thinking that i might wait until palin and trump got a hundred followers or so and then gradually start feeding other texts into them. i'm thinking a handmaid's tale fed a chapter at a time into palinbot would be fun. not sure about trump, though. the wife suggested a combination of yosemite sam quotes and mein kampf, but i don't think there's that much text for the former

Jonny 290
May 5, 2005



[ASK] me about OS/2 Warp
idk i think this bears investigation http://yosemitesamquotes.com/yosemite-sam-sayings/

Trig Discipline
Jun 3, 2008

Please leave the room if you think this might offend you.
Grimey Drawer
well i'll be damned

in a well actually
Jan 26, 2011

dude, you gotta end it on the rhyme

Trig Discipline posted:

oh i've also been thinking that i might wait until palin and trump got a hundred followers or so and then gradually start feeding other texts into them. i'm thinking a handmaid's tale fed a chapter at a time into palinbot would be fun. not sure about trump, though. the wife suggested a combination of yosemite sam quotes and mein kampf, but i don't think there's that much text for the former

mein kampf translated through the Simple English vocuabulary

qntm
Jun 17, 2009
you could extract the text from mario games and call it markov chain chomp

this is an original idea I've had

Trig Discipline
Jun 3, 2008

Please leave the room if you think this might offend you.
Grimey Drawer

PCjr sidecar posted:

mein kampf translated through the Simple English vocuabulary

ooooh

definitely want to wait until he gets more followers though. i'm getting 2-5 new people a day

Trig Discipline
Jun 3, 2008

Please leave the room if you think this might offend you.
Grimey Drawer
markov polov seems to have decided to just rip on Pleasure Kevin today

https://twitter.com/markov_polov/status/698669209794949121

big scary monsters
Sep 2, 2011

-~Skullwave~-
have you had anyone @ed by them try to respond to the palin/trump bots? anything good?

Trig Discipline
Jun 3, 2008

Please leave the room if you think this might offend you.
Grimey Drawer

big scary monsters posted:

have you had anyone @ed by them try to respond to the palin/trump bots? anything good?

a lot of retweets, but no actual engagement. if and when that happens, i'm just going to run the babbler locally and keep pasting replies until they realize what's up

craisins
May 17, 2004

A DRIIIIIIIIIIIIVE!
does it go through historical posts and add them to the markov bot? or only new posts from its followers?

Trig Discipline
Jun 3, 2008

Please leave the room if you think this might offend you.
Grimey Drawer

craisins posted:

does it go through historical posts and add them to the markov bot? or only new posts from its followers?

all posts, up to 3200 posts for each user. it rescrapes on a regular basis, at intervals determined by how many friends it has

in a well actually
Jan 26, 2011

dude, you gotta end it on the rhyme

a reminder that manually approving your bots tweets may be a good idea when potentially corresponding with people with secret service protection http://www.slate.com/blogs/future_tense/2015/02/25/who_is_responsible_for_death_threats_from_a_twitter_bot.html

or at least removing words like 'kill' or 'death' from your corpus

Trig Discipline
Jun 3, 2008

Please leave the room if you think this might offend you.
Grimey Drawer
O_O

well it seems like they mainly just wanted him to turn it off, and i would definitely do that if it came to that

as it is, it's just endorsing alternative medicine

https://twitter.com/markov_polov/status/698681810432053248

NoneMoreNegative
Jul 20, 2000
GOTH FASCISTIC
PAIN
MASTER




shit wizard dad

https://twitter.com/markov_polov/status/698684329396842496

vodkat
Jun 30, 2012



cannot legally be sold as vodka
this is v. cool and I am defiantly stealing this idea as my next idiot spare time project. I'm going to make a little zoo of markov bots and pretend that real people still use twitter :nsa:

Pile Of Garbage
May 28, 2007



vodkat posted:

this is v. cool and I am defiantly stealing this idea as my next idiot spare time project. I'm going to make a little zoo of markov bots and pretend that real people still use twitter :nsa:

https://www.youtube.com/watch?v=t-7mQhSZRgM&t=17s

markov polo is killing it nice work, thanks for the props re the idea trig!

Trig Discipline
Jun 3, 2008

Please leave the room if you think this might offend you.
Grimey Drawer

cheese-cube posted:

https://www.youtube.com/watch?v=t-7mQhSZRgM&t=17s

markov polo is killing it nice work, thanks for the props re the idea trig!

it's a killer idea. should i credit your twitter handle instead of your forums handle?


also that video is magical

Trig Discipline
Jun 3, 2008

Please leave the room if you think this might offend you.
Grimey Drawer
drat yospos y'all some nasty tweeters

https://twitter.com/markov_polov/status/698701970811392001

Quebec Bagnet
Apr 28, 2009

mess with the honk
you get the bonk
Lipstick Apathy

Trig Discipline posted:

oh i've also been thinking that i might wait until palin and trump got a hundred followers or so and then gradually start feeding other texts into them. i'm thinking a handmaid's tale fed a chapter at a time into palinbot would be fun. not sure about trump, though. the wife suggested a combination of yosemite sam quotes and mein kampf, but i don't think there's that much text for the former

francis e. dec + time cube


might come out incredibly racist though

CRIP EATIN BREAD
Jun 24, 2002

Hey stop worrying bout my acting bitch, and worry about your WACK ass music. In the mean time... Eat a hot bowl of Dicks! Ice T



Soiled Meat

Trig Discipline posted:

side note: markov trump's followers are mostly actual trump supporters now who seem to have no idea that it's a bot. loving amazing

i had a ronpaul twitter bot about 5 years ago that had political pundits following him.

and a real senator or rep i forget which

jony ive aces
Jun 14, 2012

designer of the lomarf car


Buglord
cool thread :waycool:

i got some tweet notification yesterday that a bunch of yosposters had followed some markov thing but the stupid app couldn't find the actual account for some reason. following it now

echinopsis?

Trig Discipline
Jun 3, 2008

Please leave the room if you think this might offend you.
Grimey Drawer
i'm thinkin yeah

jony ive aces
Jun 14, 2012

designer of the lomarf car


Buglord
https://twitter.com/markov_polov/status/698654084681768960

https://twitter.com/markov_polov/status/698691888589635585

https://twitter.com/markov_polov/status/698712047861653504

fishmeched again

Adbot
ADBOT LOVES YOU

Trig Discipline
Jun 3, 2008

Please leave the room if you think this might offend you.
Grimey Drawer
yeah seriously how many of those are there ffs

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply