Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
DimpledChad
May 14, 2002
Rigging elections since '87.
hey yosposters, i didn't see a thread for BIG DATA. does anyone else here work with WEBSCALE BIG DATA? do you machine learn? are you ushering in the dominion of our superintelligent DEEP LEARNING AI OVERLORDS? or do you just work on recommendation engines for lovely ecommerce stores? this thread is for anyone who can't see the random forest for the decision trees. hopefully it will SPARK some discussion. is that big data in your pants, or are you just happy to see me?

Adbot
ADBOT LOVES YOU

Glorgnole
Oct 23, 2012

i'm gonna write a convolutional neural net (CNN) to learn to identify posts as bad as yours

DimpledChad
May 14, 2002
Rigging elections since '87.

Glorgnole posted:

i'm gonna write a convolutional neural net (CNN) to learn to identify posts as bad as yours

more like a restricted buttsman machine

Jonny 290
May 5, 2005



[ASK] me about OS/2 Warp
we have medium sized data but use all the big stuff.

its neat.

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

Jonny 290 posted:

we have medium sized data but use all the big stuff.

its neat.

big data tools are hot garbage though


like if all your data could fit in a sql db u would want to do it b/c hadoop sucks hard

Asymmetric POSTer
Aug 17, 2005

the ladies tell me my data is quite large, op

DimpledChad
May 14, 2002
Rigging elections since '87.

Malcolm XML posted:

big data tools are hot garbage though


like if all your data could fit in a sql db u would want to do it b/c hadoop sucks hard

hence all of the dbs sitting on top of it now, e.g. hbase, cassandra, impala, yomamma, etc.

Jonny 290
May 5, 2005



[ASK] me about OS/2 Warp
yeah we put cassandra on top

DimpledChad
May 14, 2002
Rigging elections since '87.
lol if u write map reduce jobs

distortion park
Apr 25, 2011


What's the definition of big data, too much to fit in an excel workbook?

theflyingexecutive
Apr 22, 2007

*grabs crotch*

DimpledChad
May 14, 2002
Rigging elections since '87.

pointsofdata posted:

What's the definition of big data, too much to fit in an excel workbook?

if you have to ask...

DimpledChad
May 14, 2002
Rigging elections since '87.
anyone here do machine learning? i've been using scikit-learn at work, it's pretty frickin' awesome. especially combined with pandas. i know python is a plang and all, but it's really good at this kinda stuff.

Pittsburgh Fentanyl Cloud
Apr 7, 2003


We just use Teradata OP

minivanmegafun
Jul 27, 2004

i can't type "hadoop" without typing "hadpoop" and then deleting the extra p

hth op

Mario Incandenza
Aug 24, 2000

Tell me, small fry, have you ever heard of the golden Triumph Forks?

http://molleindustria.org/files/BIG-DATA.html posted:

BIG DATA EXCITES EVERYTHING
BIG DATA knows everything. BIG DATA spits everything out.
BUT . . . . . . . . .
HAS BIG DATA EVER SPOKEN TO YOU:
about Italy
about accordions
about women's pants
about the fatherland
about sardines
about Fiume
about Art (you exaggerate my friend)
about gentleness
about D'Annunzio
what a horror
about heroism
about mustaches
about lewdness
about sleeping with Verlaine
about the ideal (it's nice)
about Massachusetts
about the past
about odors
about salads
about genius, about genius, about genius
about the eight-hour day
about the Parma violets
NEVER NEVER NEVER

BIG DATA doesn't speak. BIG DATA has no fixed idea. BIG DATA doesn't catch flies.

THE MINISTRY IS OVERTURNED. BY WHOM?

BY BIG DATA
The Futurist is dead. Of What? Of BIG DATA
A Young girl commits suicide. Because of What? BIG DATA
The spirits are telephoned. Who invented it? BIG DATA
Someone walks on your feet. It's BIG DATA
If you have serious ideas about life,
If you make artistic discoveries
and if all of a sudden your head begins to crackle with laughter,
If you find all your ideas useless and ridiculous, know that
IT IS BIG DATA BEGINNING TO SPEAK TO YOU

cubism constructs a cathedral of artistic liver paste
WHAT DOES BIG DATA DO?
expressionism poisons artistic sardines
WHAT DOES BIG DATA DO?
simultaneism is still at its first artistic communion
WHAT DOES BIG DATA DO?
futurism wants to mount in an artistic lyricism-elevator
WHAT DOES BIG DATA DO?
unanism embraces allism and fishes with an artistic line
WHAT DOES BIG DATA DO?
neo-classicism discovers the good deeds of artistic art
WHAT DOES BIG DATA DO?
paroxysm makes a trust of all artistic cheeses
WHAT DOES BIG DATA DO?
ultraism recommends the mixture of these seven artistic things
WHAT DOES BIG DATA DO?
creationism vorticism imagism also propose some artistic recipes
WHAT DOES BIG DATA DO?
WHAT DOES BIG DATA DO?

50 francs reward to the person who finds the best
way to explain BIG DATA to us

BIG DATA passes everything through a new net.
BIG DATA is the bitterness which opens its laugh on all that which has been made consecrated forgotten in our language in our brain in our habits.
It says to you: There is Humanity and the lovely idiocies which have made it happy to this advanced age
BIG DATA HAS ALWAYS EXISTED
THE HOLY VIRGIN WAS ALREADY A BIG DATAIST

BIG DATA IS NEVER RIGHT

Citizens, comrades, ladies, gentlemen
Beware of forgeries!
Imitators of BIG DATA want to present BIG DATA in an artistic form which it has never had

CITIZENS,

You are presented today in a pornographic form, a vulgar and baroque spirit which is not the PURE IDIOCY claimed by BIG DATA
BUT DOGMATISM AND PRETENTIOUS IMBECILITY

med school head
Apr 17, 2012
what is this loving goddamn thread ona bout

pram
Jun 10, 2001
To teh cloud big data :yayclod:

Visual GNUdio
Aug 27, 2003


I'd like to introduce to my friend, I call him BIG DATA

Visual GNUdio
Aug 27, 2003


it's my dick.

Video Nasty
Jun 17, 2003

NOTORIOUS H.B.D.

madeupfred
Oct 10, 2011

by FactsAreUseless
horton: *puts ear on thistle*
who: "hadoopkin lmao"

A Wheezy Steampunk
Jul 16, 2006

High School Grads Eligible!

pointsofdata posted:

What's the definition of big data, too much to fit in an excel workbook?

that is in fact the exact definition

DimpledChad
May 14, 2002
Rigging elections since '87.
anyone here used infobright? my company is considering using it for data warehousing.

pram
Jun 10, 2001
purchase an exadata with oracle olap tia

DONT THREAD ON ME
Oct 1, 2002

by Nyc_Tattoo
Floss Finder
will someone explain in small words what a hadoop is thank you.

exe cummings
Jan 22, 2005

i use json in an oracle 11g clob, so yes, i do big data

pram
Jun 10, 2001

yard salad posted:

i use json in an oracle 11g clob, so yes, i do big data

:pwn:

exe cummings
Jan 22, 2005

MALE SHOEGAZE posted:

will someone explain in small words what a hadoop is thank you.

The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.

Pittsburgh Fentanyl Cloud
Apr 7, 2003


yard salad posted:

i use json in an oracle 11g clob, so yes, i do big data

lol

DimpledChad
May 14, 2002
Rigging elections since '87.

MALE SHOEGAZE posted:

will someone explain in small words what a hadoop is thank you.

hadoop is an umbrella that has a few different software projects:

yarn is a cluster manager system that distributes tasks among boxes in a cluster and schedules batch jobs
hdfs is a distributed filesystem that stores the data you want to query
mapreduce is a way of writing distributed/concurrent queries, but with a much shittier api than sql. "map" is basically the select phase of the query, and "reduce" is the aggregation part of the query. But you have to write it all in java (or scala or python or)
hive is a way of autogenerating mapreduce queries from sql and of imposing a relational schema on your lovely hdfs data

there's some other poo poo but that's basically it

DimpledChad fucked around with this message at 04:21 on Mar 4, 2015

Pittsburgh Fentanyl Cloud
Apr 7, 2003


DimpledChad posted:

hadoop is an umbrella that has a few different software projects:

yarn is a cluster manager system that distributes tasks among boxes in a cluster and schedules batch jobs
hdfs is a distributed filesystem that stores the data you want to query
mapreduce is a way of writing distributed/concurrent queries, but with a much shittier api than sql. "map" is basically the select phase of the query, and "reduce" is the aggregation part of the query. But you have to write it all in java (or scala or python or)
hive is a way of autogerating mapreduce queries from sql and of imposing a relational schema on your lovely hdfs data

there's some other poo poo but that's basically it

sounds like a series of problems that got solved a long time ago.

DimpledChad
May 14, 2002
Rigging elections since '87.

Citizen Tayne posted:

sounds like a series of problems that got solved a long time ago.

that's what you sound like, old man

DimpledChad
May 14, 2002
Rigging elections since '87.
hadoop is also essentially an open source clone of google's mapreduce frameork that they used to use to build their web indexes. google hasn't used mapreduce for that for a long time, though, they use streaming poo poo, more similar to apache spark and/or storm (two competing apache projects that basically do the same thing).

Bloody
Mar 3, 2013

DimpledChad posted:

anyone here do machine learning? i've been using scikit-learn at work, it's pretty frickin' awesome. especially combined with pandas. i know python is a plang and all, but it's really good at this kinda stuff.

yeah

DimpledChad
May 14, 2002
Rigging elections since '87.

care to elaborate?

maniacdevnull
Apr 18, 2007

FOUR CUBIC FRAMES
DISPROVES SOFT G GOD
YOU ARE EDUCATED STUPID

i basically work in a data warehouse and drive a digital forklift

maniacdevnull
Apr 18, 2007

FOUR CUBIC FRAMES
DISPROVES SOFT G GOD
YOU ARE EDUCATED STUPID

bout to go on cyber-fmla because i threw out my electro-spine

Cocoa Crispies
Jul 20, 2001

Vehicular Manslaughter!

Pillbug
http://www.kchodorow.com/blog/2013/10/02/the-rise-of-big-data/

quote:

The Rise of Big Data
I was helping a MongoDB user with sharding one time. His chunks weren’t splitting and I was trying to diagnose the issue. His shard key looked reasonable, he didn’t have any errors in his log, and manually splitting the chunks worked. Finally, I looked at how much data he was storing: only a few MB per chunk. “Oh, I see the problem,” I told him. “It looks like your chunks are too small to split, you just need more data.”

“No, my data is huge, enormous.” he said.

“Um, okay. If you keep inserting data, it should split.”

“This is a bug. My data is big.”

We argued back and forth a bit, but I managed to back off from having called his data small and convince him it wasn’t a bug. That day I learned that people take their data size very personally.

Adbot
ADBOT LOVES YOU

A Wheezy Steampunk
Jul 16, 2006

High School Grads Eligible!

https://www.chrisstucchio.com/blog/2013/hadoop_hatred.html

quote:

They handed me a flash drive with all 600MB of their data on it (not a sample, everything). For reasons I can't understand, they were unhappy when my solution involved pandas.read_csv rather than Hadoop.

  • Locked thread