Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
Valeyard
Mar 30, 2012


Grimey Drawer
I was pretty tired last night and almost downloaded this https://www.microsoft.com/Web/webmatrix/node.aspx

Adbot
ADBOT LOVES YOU

Valeyard
Mar 30, 2012


Grimey Drawer
I'll gonna start a spare time node project though

cinci zoo sniper
Mar 15, 2013




ive come up with a p drat efficient way to implement parallel computing into this python thing i wrote for my uni project. ill just run it over separate datasets on separate computers

Corla Plankun
May 8, 2007

improve the lives of everyone
use luigi to ssh into em all and coordinate which ones working on what ftw

Soricidus
Oct 21, 2010
freedom-hating statist shill

kalstrams posted:

ive come up with a p drat efficient way to implement parallel computing into this python thing i wrote for my uni project. ill just run it over separate datasets on separate computers

that's pretty much the only way to make python parallel, yes

gently caress the gil forever

Valeyard
Mar 30, 2012


Grimey Drawer
just spawn infinite processes

MononcQc
May 29, 2007

orchestrate your python scripts from an Erlang node

Valeyard
Mar 30, 2012


Grimey Drawer

MononcQc posted:

orchestrate your python scripts from an Erlang node

use SD Erlang

more like dICK
Feb 15, 2010

This is inevitable.

MononcQc posted:

orchestrate your python scripts from an Erlang node

A few years ago I saw a talk in SF about doing this with Java. OTP supervision tree with jvms as worker processes. It was cool as hell

MononcQc
May 29, 2007


http://erlport.org/ is a decent thing. I've used it in toy projects where I wanted to do property-based testing of python code with Erlang tools, works decently enough.

MeruFM
Jul 27, 2010

Soricidus posted:

that's pretty much the only way to make python parallel, yes

gently caress the gil forever

multiprocessing starmap works fine

Soricidus
Oct 21, 2010
freedom-hating statist shill
my experience of multiprocessing is that it scales really badly due to ipc overhead, but I guess it's probably fine as long as you're doing something that doesn't require passing much data to the subprocesses or getting much data back from them

MeruFM
Jul 27, 2010
also breaks if the data cannot be pickled. but hey, it's fine for the example above

large data processing should be done via url or file passing anyways

Progressive JPEG
Feb 19, 2003

MononcQc posted:

orchestrate your python scripts from an mesos cluster

disclaimer this is a thing I work on

piratepilates
Mar 28, 2004

So I will learn to live with it. Because I can live with it. I can live with it.



oh yeah here's a neat thing if you're stuck with javascript

the typescript team made a javascript language service that tries to expose types in javascript the way typescript does it, but without the need to write typescript. it's called "salsa" and vscode apparently uses it. worth a look I guess!

piratepilates fucked around with this message at 00:33 on Apr 24, 2016

fart simpson
Jul 2, 2005

DEATH TO AMERICA
:xickos:

JewKiller 3000 posted:

metrics are for showing off to management how well your team met their goals, or even exceeded them. this should always be the case, because your team's goals should always be set so that even in the worst possible situation, you will easily achieve them. this is called being a good manager, and if your team isn't like this every single quarter, look to update your resume asap, or at least switch teams within the organization if possible

this sounds nice

Clockwerk
Apr 6, 2005


sure, but a great manager sets goals that will make his bosses happy should they be achieved, and then blame the team for failing to follow scrum and self-manage well enough to achieve the otherwise realistic goals that they set kill your scrum master and thus still retain their bonus and set themselves up for further promotion through the ranks

dick traceroute
Feb 24, 2010

Open the pod bay doors, Hal.
Grimey Drawer
an actual method in an actual codebase I'm working in:
code:
public void KillYourself()

gonadic io
Feb 16, 2011

>>=
terrible programmer question: i have a query that returns 22k rows. the index on this data is perfect i.e. mysql only looks are the correct number of rows.

to pull this data from the db, parse it as scala objects, do a little filtering, turn it into json and send it over http is taking 10-15 seconds. the vast majority of this time (7 secs or so) is in pulling the data from the db.

is it purely a matter of transferring that much data (the json is ~5mb) takes that much time and the only thing we can do is throw more cpu at it? i'm trying to reduce the number of fields we pull for each of those rows which seems to speed it up but it's still the bottleneck. any suggestions? the table has 2M rows so i don't think partitioning it will help much. can mysql compress query resutls?

ErIog
Jul 11, 2001

:nsacloud:
Why do you need 22k rows immediately? If you're doing some sort of large rear end report then I could maybe see it being okay to just request the rows and munge them yourself.

If you're putting this into some kind of user-facing thing then just paginate and ask for however many you need on each page.

If this isn't for anything user-facing then consider writing a more complex SQL statement and have the DB server do your work for you. I know it's gonna end up being harder to write than scala, but it will definitely be faster since I doubt you're vomiting those 22k rows out verbatim.

You have hosed up somewhere if you absolutely need specific fields from 22k rows instantaneously. There might be some kind of academic use case I can't think of, but in general it's the case.

ErIog fucked around with this message at 11:10 on Apr 26, 2016

Jabor
Jul 16, 2010

#1 Loser at SpaceChem
what sort of hardware does your database have? ssds? spinning metal? or is it beefy enough that the entire dataset lives in memory?

looking up 22k random rows is going to murder you with access latency if you're running it off hard drives.

gonadic io
Feb 16, 2011

>>=
this isn't user facing, this is two different servers needing to reconcile their data. so the idea is to send off a http request from A to B, then B dumps the db into the json body response.

the response really is just a db dump, there's 4 queries for different dates/categories (there are 3-4 categories total) which are separate because the index won't work with a bunch of conditional logic for "if category = butt and date = today else if category = fart and date = yesterday" in the query.

this is on an aws instance. no idea about the configuration of the machine or the limits we've paid for.

i guess we could also just up the timeouts since the jobs run asyncronously so as long as A sends out an email being all "yep, the data from B matched what's in my db" i don't care if it takes a minute or two. don't want to keep being all "poo poo it failed, up the timeout again" as the dataset grows though

Jabor
Jul 16, 2010

#1 Loser at SpaceChem

gonadic io posted:

the response really is just a db dump, there's 4 queries for different dates/categories (there are 3-4 categories total) which are separate because the index won't work with a bunch of conditional logic for "if category = butt and date = today else if category = fart and date = yesterday" in the query.

ime a competent database engine does a pretty good job of estimating whether using an index will actually be faster than a table scan in straightforward scenarios like what you're indicating. so if you used to have one big query and were like "wtf it's not using the index, let me write it as four separate queries to try and trick the query planner" you weren't necessarily improving things.

Sagacity
May 2, 2003
Hopefully my epitaph will be funnier than my custom title.

gonadic io posted:

i guess we could also just up the timeouts since the jobs run asyncronously so as long as A sends out an email being all "yep, the data from B matched what's in my db" i don't care if it takes a minute or two. don't want to keep being all "poo poo it failed, up the timeout again" as the dataset grows though
If you batch the data you can at least upper-bound the timeout? e.g. always send max 20k rows so you know it'll only take ~15sec per batch

redleader
Aug 18, 2005

Engage according to operational parameters
are you loading the data into scala objects just to filter them? could you do the filtering in the db to reduce the number of rows?

it kinda sounds like you might be somewhat underprovisioned, especially if your working set doesn't all fit into memory at once.

bad programmer idea: since you're on aws already, dump a backup of B into an s3 bucket somewhere and send A the location. A can then pull the file, restore a copy of B locally and reconcile poo poo without sending an entire database as a json response.

Bloody
Mar 3, 2013

why doesnt linq have a proper map function that forces evaluation like at least half the time i would use select itd be for the side effects of the mapping function rather than the returned value and it feels really gross to have to .select.toList a thing

Asymmetrikon
Oct 30, 2009

I believe you're a big dork!
the designers wanted linq querying to not have side-effects, apparently: https://blogs.msdn.microsoft.com/ericlippert/2009/05/18/foreach-vs-foreach/

CPColin
Sep 9, 2003

Big ol' smile.

gonadic io posted:

to pull this data from the db, parse it as scala objects, do a little filtering, turn it into json and send it over http is taking 10-15 seconds. the vast majority of this time (7 secs or so) is in pulling the data from the db.

Are you pipelining this operation at all? Like, are you sending the rows through your Scala code as they come out of the database? Or do you have to wait until all the rows have been loaded before you can send the data to the next step?

jony neuemonic
Nov 13, 2009

Jabor posted:

a competent database engine


hmm.

Brain Candy
May 18, 2006

i see all these questions and they are fine

but why the gently caress aren't you using builtin db features to do this? PostgresSQL has streaming replication, don't use mysql

Bloody
Mar 3, 2013

Asymmetrikon posted:

the designers wanted linq querying to not have side-effects, apparently: https://blogs.msdn.microsoft.com/ericlippert/2009/05/18/foreach-vs-foreach/

yeah i get that but itd be nice to have a terminal function (like returning void rather than a transformed collection or something?) so that i wouldnt have to make dumb one or two line foreaches around linq queries just to cleanly execute side effecty code on the results of a series of linq operations

Asymmetrikon
Oct 30, 2009

I believe you're a big dork!
oh, like a .ForEach that you can use instead of a .ToList? yeah, that would be useful if you needed to consume the sequence immediately instead of putting it in list form

Bloody
Mar 3, 2013

yeah. there's a .ForEach in the PLINQ set but not in the LINQ set and its kind of strange imo

or maybe its ForAll? either way

gonadic io
Feb 16, 2011

>>=

CPColin posted:

Are you pipelining this operation at all? Like, are you sending the rows through your Scala code as they come out of the database? Or do you have to wait until all the rows have been loaded before you can send the data to the next step?

This would be very nice.

Brain Candy posted:

i see all these questions and they are fine

but why the gently caress aren't you using builtin db features to do this? PostgresSQL has streaming replication, don't use mysql

Sure let me just migrate all of our infrastructure and servers

Guess I could create the json from inside mysql :v:

Shaggar
Apr 26, 2006
theres no reason for a ForEach for IEnumerable because any scenario where you'd want it you'd just do a normal foreach.

Bloody
Mar 3, 2013

foreach (var x in logfile.Reverse().Take(20).Reverse()) _logger.Info(x);

would be a lot nicer as

logfile.Reverse().Take(20).Reverse().Foreach(_logger.Info) imo

Shaggar
Apr 26, 2006
nah the first one is better and also should not be one line.

prefect
Sep 11, 2001

No one, Woodhouse.
No one.




Dead Man’s Band
also there should be braces

akadajet
Sep 14, 2003

Shaggar posted:

nah the first one is better and also should not be one line.

yep.

Adbot
ADBOT LOVES YOU

Luigi Thirty
Apr 30, 2006

Emergency confection port.

prefect posted:

also there should be braces

also there should be dental plan

  • Locked thread