terrible programmers: my god, it's full of chars

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > YOSPOS > terrible programmers: my god, it's full of chars

«‹›1446 »

Valeyard: Mar 30, 2012; Grimey Drawer

I was pretty tired last night and almost downloaded this https://www.microsoft.com/Web/webmatrix/node.aspx

# ? Apr 23, 2016 14:06

Adbot: ADBOT LOVES YOU

# ? May 12, 2024 09:09

Valeyard: Mar 30, 2012; Grimey Drawer

I'll gonna start a spare time node project though

# ? Apr 23, 2016 14:22

cinci zoo sniper: Mar 15, 2013

ive come up with a p drat efficient way to implement parallel computing into this python thing i wrote for my uni project. ill just run it over separate datasets on separate computers

# ? Apr 23, 2016 16:24

Corla Plankun: May 8, 2007; improve the lives of everyone

use luigi to ssh into em all and coordinate which ones working on what ftw

# ? Apr 23, 2016 17:11

Soricidus: Oct 21, 2010; freedom-hating statist shill

kalstrams posted:

ive come up with a p drat efficient way to implement parallel computing into this python thing i wrote for my uni project. ill just run it over separate datasets on separate computers

that's pretty much the only way to make python parallel, yes

gently caress the gil forever

# ? Apr 23, 2016 17:38

Valeyard: Mar 30, 2012; Grimey Drawer

just spawn infinite processes

# ? Apr 23, 2016 17:41

MononcQc: May 29, 2007

orchestrate your python scripts from an Erlang node

# ? Apr 23, 2016 18:11

Valeyard: Mar 30, 2012; Grimey Drawer

MononcQc posted:

orchestrate your python scripts from an Erlang node

use SD Erlang

# ? Apr 23, 2016 18:16

more like dICK: Feb 15, 2010; This is inevitable.

MononcQc posted:

orchestrate your python scripts from an Erlang node

A few years ago I saw a talk in SF about doing this with Java. OTP supervision tree with jvms as worker processes. It was cool as hell

# ? Apr 23, 2016 18:27

MononcQc: May 29, 2007

Valeyard posted:

use SD Erlang

http://erlport.org/ is a decent thing. I've used it in toy projects where I wanted to do property-based testing of python code with Erlang tools, works decently enough.

# ? Apr 23, 2016 18:30

MeruFM: Jul 27, 2010

Soricidus posted:

that's pretty much the only way to make python parallel, yes

gently caress the gil forever

multiprocessing starmap works fine

# ? Apr 23, 2016 19:24

Soricidus: Oct 21, 2010; freedom-hating statist shill

my experience of multiprocessing is that it scales really badly due to ipc overhead, but I guess it's probably fine as long as you're doing something that doesn't require passing much data to the subprocesses or getting much data back from them

# ? Apr 23, 2016 19:32

MeruFM: Jul 27, 2010

also breaks if the data cannot be pickled. but hey, it's fine for the example above

large data processing should be done via url or file passing anyways

# ? Apr 23, 2016 19:59

Progressive JPEG: Feb 19, 2003

MononcQc posted:

orchestrate your python scripts from an mesos cluster

disclaimer this is a thing I work on

# ? Apr 23, 2016 20:22

piratepilates: Mar 28, 2004; So I will learn to live with it. Because I can live with it. I can live with it.

oh yeah here's a neat thing if you're stuck with javascript

the typescript team made a javascript language service that tries to expose types in javascript the way typescript does it, but without the need to write typescript. it's called "salsa" and vscode apparently uses it. worth a look I guess!

piratepilates fucked around with this message at 00:33 on Apr 24, 2016

# ? Apr 24, 2016 00:30

fart simpson: Jul 2, 2005; DEATH TO AMERICA

JewKiller 3000 posted:

metrics are for showing off to management how well your team met their goals, or even exceeded them. this should always be the case, because your team's goals should always be set so that even in the worst possible situation, you will easily achieve them. this is called being a good manager, and if your team isn't like this every single quarter, look to update your resume asap, or at least switch teams within the organization if possible

this sounds nice

# ? Apr 24, 2016 01:53

Clockwerk: Apr 6, 2005

sure, but a great manager sets goals that will make his bosses happy should they be achieved, and then blame the team for failing to follow scrum and self-manage well enough to achieve the otherwise realistic goals that they set kill your scrum master and thus still retain their bonus and set themselves up for further promotion through the ranks

# ? Apr 24, 2016 03:09

dick traceroute: Feb 24, 2010; Open the pod bay doors, Hal.; Grimey Drawer

an actual method in an actual codebase I'm working in:

code:

public void KillYourself()

# ? Apr 26, 2016 10:51

gonadic io: Feb 16, 2011; >>=

terrible programmer question: i have a query that returns 22k rows. the index on this data is perfect i.e. mysql only looks are the correct number of rows.

to pull this data from the db, parse it as scala objects, do a little filtering, turn it into json and send it over http is taking 10-15 seconds. the vast majority of this time (7 secs or so) is in pulling the data from the db.

is it purely a matter of transferring that much data (the json is ~5mb) takes that much time and the only thing we can do is throw more cpu at it? i'm trying to reduce the number of fields we pull for each of those rows which seems to speed it up but it's still the bottleneck. any suggestions? the table has 2M rows so i don't think partitioning it will help much. can mysql compress query resutls?

# ? Apr 26, 2016 10:58

ErIog: Jul 11, 2001

Why do you need 22k rows immediately? If you're doing some sort of large rear end report then I could maybe see it being okay to just request the rows and munge them yourself.

If you're putting this into some kind of user-facing thing then just paginate and ask for however many you need on each page.

If this isn't for anything user-facing then consider writing a more complex SQL statement and have the DB server do your work for you. I know it's gonna end up being harder to write than scala, but it will definitely be faster since I doubt you're vomiting those 22k rows out verbatim.

You have hosed up somewhere if you absolutely need specific fields from 22k rows instantaneously. There might be some kind of academic use case I can't think of, but in general it's the case.

ErIog fucked around with this message at 11:10 on Apr 26, 2016

# ? Apr 26, 2016 11:08

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

what sort of hardware does your database have? ssds? spinning metal? or is it beefy enough that the entire dataset lives in memory?

looking up 22k random rows is going to murder you with access latency if you're running it off hard drives.

# ? Apr 26, 2016 11:13

gonadic io: Feb 16, 2011; >>=

this isn't user facing, this is two different servers needing to reconcile their data. so the idea is to send off a http request from A to B, then B dumps the db into the json body response.

the response really is just a db dump, there's 4 queries for different dates/categories (there are 3-4 categories total) which are separate because the index won't work with a bunch of conditional logic for "if category = butt and date = today else if category = fart and date = yesterday" in the query.

this is on an aws instance. no idea about the configuration of the machine or the limits we've paid for.

i guess we could also just up the timeouts since the jobs run asyncronously so as long as A sends out an email being all "yep, the data from B matched what's in my db" i don't care if it takes a minute or two. don't want to keep being all "poo poo it failed, up the timeout again" as the dataset grows though

# ? Apr 26, 2016 11:44

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

gonadic io posted:

the response really is just a db dump, there's 4 queries for different dates/categories (there are 3-4 categories total) which are separate because the index won't work with a bunch of conditional logic for "if category = butt and date = today else if category = fart and date = yesterday" in the query.

ime a competent database engine does a pretty good job of estimating whether using an index will actually be faster than a table scan in straightforward scenarios like what you're indicating. so if you used to have one big query and were like "wtf it's not using the index, let me write it as four separate queries to try and trick the query planner" you weren't necessarily improving things.

# ? Apr 26, 2016 12:08

Sagacity: May 2, 2003; Hopefully my epitaph will be funnier than my custom title.

gonadic io posted:

i guess we could also just up the timeouts since the jobs run asyncronously so as long as A sends out an email being all "yep, the data from B matched what's in my db" i don't care if it takes a minute or two. don't want to keep being all "poo poo it failed, up the timeout again" as the dataset grows though

If you batch the data you can at least upper-bound the timeout? e.g. always send max 20k rows so you know it'll only take ~15sec per batch

# ? Apr 26, 2016 12:09

redleader: Aug 18, 2005; Engage according to operational parameters

are you loading the data into scala objects just to filter them? could you do the filtering in the db to reduce the number of rows?

it kinda sounds like you might be somewhat underprovisioned, especially if your working set doesn't all fit into memory at once.

bad programmer idea: since you're on aws already, dump a backup of B into an s3 bucket somewhere and send A the location. A can then pull the file, restore a copy of B locally and reconcile poo poo without sending an entire database as a json response.

# ? Apr 26, 2016 12:59

Bloody: Mar 3, 2013

why doesnt linq have a proper map function that forces evaluation like at least half the time i would use select itd be for the side effects of the mapping function rather than the returned value and it feels really gross to have to .select.toList a thing

# ? Apr 26, 2016 14:24

Asymmetrikon: Oct 30, 2009; I believe you're a big dork!

the designers wanted linq querying to not have side-effects, apparently: https://blogs.msdn.microsoft.com/ericlippert/2009/05/18/foreach-vs-foreach/

# ? Apr 26, 2016 14:30

CPColin: Sep 9, 2003; Big ol' smile.

gonadic io posted:

to pull this data from the db, parse it as scala objects, do a little filtering, turn it into json and send it over http is taking 10-15 seconds. the vast majority of this time (7 secs or so) is in pulling the data from the db.

Are you pipelining this operation at all? Like, are you sending the rows through your Scala code as they come out of the database? Or do you have to wait until all the rows have been loaded before you can send the data to the next step?

# ? Apr 26, 2016 15:13

jony neuemonic: Nov 13, 2009

Jabor posted:

a competent database engine

gonadic io posted:

mysql

hmm.

# ? Apr 26, 2016 15:34

Brain Candy: May 18, 2006

i see all these questions and they are fine

but why the gently caress aren't you using builtin db features to do this? PostgresSQL has streaming replication, don't use mysql

# ? Apr 26, 2016 15:36

Bloody: Mar 3, 2013

Asymmetrikon posted:

the designers wanted linq querying to not have side-effects, apparently: https://blogs.msdn.microsoft.com/ericlippert/2009/05/18/foreach-vs-foreach/

yeah i get that but itd be nice to have a terminal function (like returning void rather than a transformed collection or something?) so that i wouldnt have to make dumb one or two line foreaches around linq queries just to cleanly execute side effecty code on the results of a series of linq operations

# ? Apr 26, 2016 16:23

Asymmetrikon: Oct 30, 2009; I believe you're a big dork!

oh, like a .ForEach that you can use instead of a .ToList? yeah, that would be useful if you needed to consume the sequence immediately instead of putting it in list form

# ? Apr 26, 2016 16:26

Bloody: Mar 3, 2013

yeah. there's a .ForEach in the PLINQ set but not in the LINQ set and its kind of strange imo

or maybe its ForAll? either way

# ? Apr 26, 2016 16:45

gonadic io: Feb 16, 2011; >>=

CPColin posted:

Are you pipelining this operation at all? Like, are you sending the rows through your Scala code as they come out of the database? Or do you have to wait until all the rows have been loaded before you can send the data to the next step?

This would be very nice.

Brain Candy posted:

i see all these questions and they are fine

but why the gently caress aren't you using builtin db features to do this? PostgresSQL has streaming replication, don't use mysql

Sure let me just migrate all of our infrastructure and servers

Guess I could create the json from inside mysql :v:

# ? Apr 26, 2016 16:46

Shaggar: Apr 26, 2006

theres no reason for a ForEach for IEnumerable because any scenario where you'd want it you'd just do a normal foreach.

# ? Apr 26, 2016 16:47

Bloody: Mar 3, 2013

foreach (var x in logfile.Reverse().Take(20).Reverse()) _logger.Info(x);

would be a lot nicer as

logfile.Reverse().Take(20).Reverse().Foreach(_logger.Info) imo

# ? Apr 26, 2016 16:48