p-lang thread: (now (have you (problems two)))

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > YOSPOS > p-lang thread: (now (have you (problems two)))

«‹›1784 »

Malcolm XML: Aug 8, 2009; I always knew it would end like ｔｈｉｓ．

Kevin Mitnick P.E. posted:

ok walk me through it. lets say i've got 1b json objects totaling 1tb to load nightly and join with 150gb already present to produce a 100m row x 1k column output table. what hardware do i need, whose code can i use to do the hard part, and how is this going to be easier than running a jar on the EMR cluster datapipeline spins up for me

1tb a day is different from 1tb my dog but load that fucker into a postgres table using json and then join hth

for more information you can pay me~~~~~

non poo poo post: do whatever works

# ? Nov 25, 2018 20:01

Adbot: ADBOT LOVES YOU

# ? May 31, 2024 12:16

Malcolm XML: Aug 8, 2009; I always knew it would end like ｔｈｉｓ．

Kevin Mitnick P.E. posted:

glue is poop from a butt

Athena doesn�t do output. also tends to choke on queries that take greater than constant space

glue is bad

alternatively:redshift

# ? Nov 25, 2018 20:02

rjmccall: Sep 7, 2007; no worries friend; Fun Shoe

also a tb of json is probably way way less than that in a reasonable representation

# ? Nov 25, 2018 23:35

ComradeCosmobot: Dec 4, 2004; USPOL July

oh i didn�t see you were processing json with spark lol

yeah get that poo poo into protobuf or avro stat, and THEN do your spark pipeline because compression ain�t gonna help ya

if you absolutely have to load it as json then checkpoint and persist after wrangling it into a sensible compact non-text format

that is rule number 1 for working in spark once you hit memory errors

rule number 2 is �throw away absolutely everything you don�t absolutely need before that checkpoint-and-persist�

after you do all that and you still get OOM errors good luck figuring how out where you should be doing the checkpoint-persist dance elsewhere in your code

# ? Nov 26, 2018 00:06

Nomnom Cookie: Aug 30, 2009

ugh now i'm going to debug this job tomorrow. thanks for forcing me to remember it exists, buttheads

# ? Nov 26, 2018 00:48

The MUMPSorceress: Jan 6, 2012; ^SHTPSTS; Gary’s Answer

Kevin Mitnick P.E. posted:

ugh now i'm going to debug this job tomorrow. thanks for forcing me to remember it exists, buttheads

I predict it takes you at least a week to find and solve the problem
not because of you, because of spark

# ? Nov 26, 2018 01:09

Nomnom Cookie: Aug 30, 2009

tbh im probably just going to increase executor memory again and check back in a week

# ? Nov 26, 2018 02:57

pokeyman: Nov 26, 2006; That elephant ate my entire platoon.

Kevin Mitnick P.E. posted:

tbh im probably just going to increase executor memory again and check back in a week

now that�s job security

# ? Nov 26, 2018 03:03

The MUMPSorceress: Jan 6, 2012; ^SHTPSTS; Gary’s Answer

Kevin Mitnick P.E. posted:

tbh im probably just going to increase executor memory again and check back in a week

now you're cooking with gas

# ? Nov 26, 2018 03:06

pokeyman: Nov 26, 2006; That elephant ate my entire platoon.

Kevin Mitnick P.E. posted:

tbh im probably just going to increase executor memory again and check back in a week

now you�re playing with a full deck

# ? Nov 26, 2018 03:16

JawnV6: Jul 4, 2004; So hot ...

but
why wouldn't you get a backtrace out of an OOM error

# ? Nov 26, 2018 03:18

Nomnom Cookie: Aug 30, 2009

first cause its java so the OOM killer gives you a useless backtrace. second cause EMR doesn't archive syslog so unless i spot the problem and ssh to the node with the failing container prior to data pipeline terminating it, I'll never get to see the log

# ? Nov 26, 2018 03:25

Nomnom Cookie: Aug 30, 2009

third how does it help me to see the allocation that failed, the problem is the ones that succeeded

# ? Nov 26, 2018 03:27

The MUMPSorceress: Jan 6, 2012; ^SHTPSTS; Gary’s Answer

JawnV6 posted:

but
why wouldn't you get a backtrace out of an OOM error

spark nearly always blows away the specific stack trace with one that's just a useless pile of spark and mr framework calls and no reference to the actual application code that ate poo poo

sometimes the container the whole thing is running in also eats poo poo and then all you have is the log that it ate poo poo and it doesn't know why.

I hate spark and I hate the people who write scala spark applications that I then have to debug even more

# ? Nov 26, 2018 03:30

ComradeCosmobot: Dec 4, 2004; USPOL July

mind you, an oom stacktrace is almost certainly not gonna be helpful anyway since it�s probably gonna be some internal allocation that has nothing to do with your program logic

# ? Nov 26, 2018 04:13

Nomnom Cookie: Aug 30, 2009

jit bull transpile posted:

sometimes the container the whole thing is running in also eats poo poo and then all you have is the log that it ate poo poo and it doesn't know why.

oh yeah now i remember. i've just been assuming its the OOM killer zapping executors cause all i get is the exit code saying the thing got SIGKILLed

# ? Nov 26, 2018 04:28

The MUMPSorceress: Jan 6, 2012; ^SHTPSTS; Gary’s Answer

Kevin Mitnick P.E. posted:

oh yeah now i remember. i've just been assuming its the OOM killer zapping executors cause all i get is the exit code saying the thing got SIGKILLed

yeah this means you're overloading an executor somewhere and the jvms gc lost the race

# ? Nov 26, 2018 04:33

CPColin: Sep 9, 2003; Big ol' smile.

Speaking of missing stack traces, that's still one of my favorite Java warts: when you have so many null pointers, they get JIT-compiled away and the stack traces are empty for the remaining life of the JVM.

# ? Nov 26, 2018 05:51

Nomnom Cookie: Aug 30, 2009

theres a -XX flag for that OmitStackTraceInFastThrow or smth

# ? Nov 26, 2018 06:00

The MUMPSorceress: Jan 6, 2012; ^SHTPSTS; Gary’s Answer

I love when it truncates the trace to 20 lines but the line that matters is like number 96

# ? Nov 26, 2018 06:10

CPColin: Sep 9, 2003; Big ol' smile.

Kevin Mitnick P.E. posted:

theres a -XX flag for that OmitStackTraceInFastThrow or smth

Yeah, I remember convincing DevOps to add that one in there so we could finally figure out what was throwing so many loving exceptions. I think they agreed to do it on a single server only, just in case it had some other effect.

# ? Nov 26, 2018 06:24

Nomnom Cookie: Aug 30, 2009

operational maturity is gartner for oh god dont touch it we dont know why its working

# ? Nov 26, 2018 08:09

Sweeper: Nov 29, 2007; The Joe Buck of Posting; Dinosaur Gum

CPColin posted:

Yeah, I remember convincing DevOps to add that one in there so we could finally figure out what was throwing so many loving exceptions. I think they agreed to do it on a single server only, just in case it had some other effect.

making stacks can be expensive if you use the for control flow, omitting the stack is fine then. lots of people do try {} catch (npe) {} I assume

# ? Nov 26, 2018 12:47

feedmegin: Jul 30, 2008

Sweeper posted:

making stacks can be expensive if you use the for control flow, omitting the stack is fine then. lots of people do try {} catch (npe) {} I assume

Isnt using exceptions for regular control flow (as opposed to, you know, exceptional events) generally considered a p bad idea?

# ? Nov 26, 2018 19:12

Doom Mathematic: Sep 2, 2008

Only if they're expensive. It's just a name.

# ? Nov 26, 2018 19:21

Cybernetic Vermin: Apr 18, 2005

i mean, it is pretty much a dynamically targeted goto, so the appropriate use-cases aren't exactly many

otoh relying on the stack trace programmatically is *very* dubious practice, so it is not like suppressing the generation does anything other than specifically mess with diagnostic messages which will often be sort of worthless anyway

# ? Nov 26, 2018 19:22

Shaggar: Apr 26, 2006

feedmegin posted:

Isnt using exceptions for regular control flow (as opposed to, you know, exceptional events) generally considered a p bad idea?

yes. theres no reason to use exceptions for anything other than actual exceptions

# ? Nov 26, 2018 19:23

CPColin: Sep 9, 2003; Big ol' smile.

Cybernetic Vermin posted:

i mean, it is pretty much a dynamically targeted goto, so the appropriate use-cases aren't exactly many

otoh relying on the stack trace programmatically is *very* dubious practice, so it is not like suppressing the generation does anything other than specifically mess with diagnostic messages which will often be sort of worthless anyway

In our case, there were a couple problems. One was that something was causing a flood of null pointer exceptions that were indeed worthless. A bunch of these were swallowed exceptions caused by failing to check values before using them. The other problem was that, later in the life of the JVM, we were seeing an extremely rare null pointer exception that was actually disrupting things and that we couldn't track down, because the earlier exceptions had led to the stack trace being optimized away.

I think there was something dumb that kept us from fixing the first problem right away, so we had to resort to the command-line option so we'd get a stack trace on the rare exceptions, in the meantime.

It was real dumb! That architecture had a lot of problems with non-defensive and indefensible code. When I started introducing @NonNull and @Nullable, I had to do it very slowly, to keep the compiler from making GBS threads itself.

CPColin fucked around with this message at 19:49 on Nov 26, 2018

# ? Nov 26, 2018 19:32

TheFluff: Dec 13, 2006; FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE

Cybernetic Vermin posted:

i mean, it is pretty much a dynamically targeted goto, so the appropriate use-cases aren't exactly many

otoh relying on the stack trace programmatically is *very* dubious practice, so it is not like suppressing the generation does anything other than specifically mess with diagnostic messages which will often be sort of worthless anyway

perl 5 did it right by having break and continue take a label so you could effectively goto out of an inner loop without having to sully your hands by writing goto. that's it, that's the usecase. don't remember if python still wants you to use an exception for that.

# ? Nov 26, 2018 19:37

Carthag Tuek: Oct 15, 2005; Tider skal komme,
tider skal henrulle,
sl�gt skal f�lge sl�gters gang

TheFluff posted:

perl 5 did it right by having break and continue take a label so you could effectively goto out of an inner loop without having to sully your hands by writing goto. that's it, that's the usecase. don't remember if python still wants you to use an exception for that.

use an inline function and return from that instead of deeply nested break/continue bullshit :cmon:

# ? Nov 26, 2018 19:44

TheFluff: Dec 13, 2006; FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE

Krankenstyle posted:

use an inline function and return from that instead of deeply nested break/continue bullshit

a c programmer can and will write c in any language :colbert:

# ? Nov 26, 2018 19:45

TheFluff: Dec 13, 2006; FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE

not that i'm a c programmer, and i don't think i've ever really written nested loops like that in my entire career, but still

# ? Nov 26, 2018 19:46

Soricidus: Oct 21, 2010; freedom-hating statist shill

TheFluff posted:

perl 5 did it right by having break and continue take a label so you could effectively goto out of an inner loop without having to sully your hands by writing goto. that's it, that's the usecase. don't remember if python still wants you to use an exception for that.

java does this too. it�s only very occasionally useful but it�s nice to have

the �pythonic� solution is to wrap the outer loop in a function so you can return from it for early exits, or wrap the inside in a function if you want continue instead. don�t worry about the overhead, if you cared about performance you wouldn�t be writing python

# ? Nov 26, 2018 19:55

Shaggar: Apr 26, 2006

CPColin posted:

In our case, there were a couple problems. One was that something was causing a flood of null pointer exceptions that were indeed worthless. A bunch of these were swallowed exceptions caused by failing to check values before using them. The other problem was that, later in the life of the JVM, we were seeing an extremely rare null pointer exception that was actually disrupting things and that we couldn't track down, because the earlier exceptions had led to the stack trace being optimized away.

I think there was something dumb that kept us from fixing the first problem right away, so we had to resort to the command-line option so we'd get a stack trace on the rare exceptions, in the meantime.

It was real dumb! That architecture had a lot of problems with non-defensive and indefensible code. When I started introducing @NonNull and @Nullable, I had to do it very slowly, to keep the compiler from making GBS threads itself.

npes should never be eaten because they are always code defects.

# ? Nov 26, 2018 19:56

CPColin: Sep 9, 2003; Big ol' smile.

Shaggar posted:

npes should never be eaten because they are always code defects.

Tell me about it. Our interface with the database would have pairs of getWith[Fields]() and getWith[Fields]OrNull() functions for loading stuff from the database. When I first got in there, the getWith() functions would throw a "not found" exception if no value matched the given fields. Fine. The problem was that the getWithOrNull() functions would implement the "or null" part by calling the getWith() functions, catching the exception, and returning null. I switched it so getWith() would call getWithOrNull() and throw the "not found" exception on null, because come on.

We had plenty of this kind of construct too:

Java code:

if (getWithIdOrNull(1) != null) {
   return getWithIdOrNull(1).getWhatever();
}

And these values were not cached. Stuff like this was why I couldn't just crank Eclipse's compiler settings to treat unchecked use of @Nullable values as an error, because poo poo like this was just peppered through the code. Explicitly nullable types are so much safer!

# ? Nov 26, 2018 20:06