|
I was pretty tired last night and almost downloaded this https://www.microsoft.com/Web/webmatrix/node.aspx
|
# ? Apr 23, 2016 14:06 |
|
|
# ? May 12, 2024 09:09 |
|
I'll gonna start a spare time node project though
|
# ? Apr 23, 2016 14:22 |
ive come up with a p drat efficient way to implement parallel computing into this python thing i wrote for my uni project. ill just run it over separate datasets on separate computers
|
|
# ? Apr 23, 2016 16:24 |
|
use luigi to ssh into em all and coordinate which ones working on what ftw
|
# ? Apr 23, 2016 17:11 |
|
kalstrams posted:ive come up with a p drat efficient way to implement parallel computing into this python thing i wrote for my uni project. ill just run it over separate datasets on separate computers that's pretty much the only way to make python parallel, yes gently caress the gil forever
|
# ? Apr 23, 2016 17:38 |
|
just spawn infinite processes
|
# ? Apr 23, 2016 17:41 |
|
orchestrate your python scripts from an Erlang node
|
# ? Apr 23, 2016 18:11 |
|
MononcQc posted:orchestrate your python scripts from an Erlang node use SD Erlang
|
# ? Apr 23, 2016 18:16 |
|
MononcQc posted:orchestrate your python scripts from an Erlang node A few years ago I saw a talk in SF about doing this with Java. OTP supervision tree with jvms as worker processes. It was cool as hell
|
# ? Apr 23, 2016 18:27 |
|
http://erlport.org/ is a decent thing. I've used it in toy projects where I wanted to do property-based testing of python code with Erlang tools, works decently enough.
|
# ? Apr 23, 2016 18:30 |
|
Soricidus posted:that's pretty much the only way to make python parallel, yes multiprocessing starmap works fine
|
# ? Apr 23, 2016 19:24 |
|
my experience of multiprocessing is that it scales really badly due to ipc overhead, but I guess it's probably fine as long as you're doing something that doesn't require passing much data to the subprocesses or getting much data back from them
|
# ? Apr 23, 2016 19:32 |
|
also breaks if the data cannot be pickled. but hey, it's fine for the example above large data processing should be done via url or file passing anyways
|
# ? Apr 23, 2016 19:59 |
|
MononcQc posted:orchestrate your python scripts from an mesos cluster disclaimer this is a thing I work on
|
# ? Apr 23, 2016 20:22 |
|
oh yeah here's a neat thing if you're stuck with javascript the typescript team made a javascript language service that tries to expose types in javascript the way typescript does it, but without the need to write typescript. it's called "salsa" and vscode apparently uses it. worth a look I guess! piratepilates fucked around with this message at 00:33 on Apr 24, 2016 |
# ? Apr 24, 2016 00:30 |
|
JewKiller 3000 posted:metrics are for showing off to management how well your team met their goals, or even exceeded them. this should always be the case, because your team's goals should always be set so that even in the worst possible situation, you will easily achieve them. this is called being a good manager, and if your team isn't like this every single quarter, look to update your resume asap, or at least switch teams within the organization if possible this sounds nice
|
# ? Apr 24, 2016 01:53 |
|
sure, but a great manager sets goals that will make his bosses happy should they be achieved, and then blame the team for failing to follow scrum and self-manage well enough to achieve the otherwise realistic goals that they set kill your scrum master and thus still retain their bonus and set themselves up for further promotion through the ranks
|
# ? Apr 24, 2016 03:09 |
|
an actual method in an actual codebase I'm working in:code:
|
# ? Apr 26, 2016 10:51 |
|
terrible programmer question: i have a query that returns 22k rows. the index on this data is perfect i.e. mysql only looks are the correct number of rows. to pull this data from the db, parse it as scala objects, do a little filtering, turn it into json and send it over http is taking 10-15 seconds. the vast majority of this time (7 secs or so) is in pulling the data from the db. is it purely a matter of transferring that much data (the json is ~5mb) takes that much time and the only thing we can do is throw more cpu at it? i'm trying to reduce the number of fields we pull for each of those rows which seems to speed it up but it's still the bottleneck. any suggestions? the table has 2M rows so i don't think partitioning it will help much. can mysql compress query resutls?
|
# ? Apr 26, 2016 10:58 |
|
Why do you need 22k rows immediately? If you're doing some sort of large rear end report then I could maybe see it being okay to just request the rows and munge them yourself. If you're putting this into some kind of user-facing thing then just paginate and ask for however many you need on each page. If this isn't for anything user-facing then consider writing a more complex SQL statement and have the DB server do your work for you. I know it's gonna end up being harder to write than scala, but it will definitely be faster since I doubt you're vomiting those 22k rows out verbatim. You have hosed up somewhere if you absolutely need specific fields from 22k rows instantaneously. There might be some kind of academic use case I can't think of, but in general it's the case. ErIog fucked around with this message at 11:10 on Apr 26, 2016 |
# ? Apr 26, 2016 11:08 |
|
what sort of hardware does your database have? ssds? spinning metal? or is it beefy enough that the entire dataset lives in memory? looking up 22k random rows is going to murder you with access latency if you're running it off hard drives.
|
# ? Apr 26, 2016 11:13 |
|
this isn't user facing, this is two different servers needing to reconcile their data. so the idea is to send off a http request from A to B, then B dumps the db into the json body response. the response really is just a db dump, there's 4 queries for different dates/categories (there are 3-4 categories total) which are separate because the index won't work with a bunch of conditional logic for "if category = butt and date = today else if category = fart and date = yesterday" in the query. this is on an aws instance. no idea about the configuration of the machine or the limits we've paid for. i guess we could also just up the timeouts since the jobs run asyncronously so as long as A sends out an email being all "yep, the data from B matched what's in my db" i don't care if it takes a minute or two. don't want to keep being all "poo poo it failed, up the timeout again" as the dataset grows though
|
# ? Apr 26, 2016 11:44 |
|
gonadic io posted:the response really is just a db dump, there's 4 queries for different dates/categories (there are 3-4 categories total) which are separate because the index won't work with a bunch of conditional logic for "if category = butt and date = today else if category = fart and date = yesterday" in the query. ime a competent database engine does a pretty good job of estimating whether using an index will actually be faster than a table scan in straightforward scenarios like what you're indicating. so if you used to have one big query and were like "wtf it's not using the index, let me write it as four separate queries to try and trick the query planner" you weren't necessarily improving things.
|
# ? Apr 26, 2016 12:08 |
|
gonadic io posted:i guess we could also just up the timeouts since the jobs run asyncronously so as long as A sends out an email being all "yep, the data from B matched what's in my db" i don't care if it takes a minute or two. don't want to keep being all "poo poo it failed, up the timeout again" as the dataset grows though
|
# ? Apr 26, 2016 12:09 |
|
are you loading the data into scala objects just to filter them? could you do the filtering in the db to reduce the number of rows? it kinda sounds like you might be somewhat underprovisioned, especially if your working set doesn't all fit into memory at once. bad programmer idea: since you're on aws already, dump a backup of B into an s3 bucket somewhere and send A the location. A can then pull the file, restore a copy of B locally and reconcile poo poo without sending an entire database as a json response.
|
# ? Apr 26, 2016 12:59 |
|
why doesnt linq have a proper map function that forces evaluation like at least half the time i would use select itd be for the side effects of the mapping function rather than the returned value and it feels really gross to have to .select.toList a thing
|
# ? Apr 26, 2016 14:24 |
|
the designers wanted linq querying to not have side-effects, apparently: https://blogs.msdn.microsoft.com/ericlippert/2009/05/18/foreach-vs-foreach/
|
# ? Apr 26, 2016 14:30 |
|
gonadic io posted:to pull this data from the db, parse it as scala objects, do a little filtering, turn it into json and send it over http is taking 10-15 seconds. the vast majority of this time (7 secs or so) is in pulling the data from the db. Are you pipelining this operation at all? Like, are you sending the rows through your Scala code as they come out of the database? Or do you have to wait until all the rows have been loaded before you can send the data to the next step?
|
# ? Apr 26, 2016 15:13 |
|
Jabor posted:a competent database engine gonadic io posted:mysql hmm.
|
# ? Apr 26, 2016 15:34 |
|
i see all these questions and they are fine but why the gently caress aren't you using builtin db features to do this? PostgresSQL has streaming replication, don't use mysql
|
# ? Apr 26, 2016 15:36 |
|
Asymmetrikon posted:the designers wanted linq querying to not have side-effects, apparently: https://blogs.msdn.microsoft.com/ericlippert/2009/05/18/foreach-vs-foreach/ yeah i get that but itd be nice to have a terminal function (like returning void rather than a transformed collection or something?) so that i wouldnt have to make dumb one or two line foreaches around linq queries just to cleanly execute side effecty code on the results of a series of linq operations
|
# ? Apr 26, 2016 16:23 |
|
oh, like a .ForEach that you can use instead of a .ToList? yeah, that would be useful if you needed to consume the sequence immediately instead of putting it in list form
|
# ? Apr 26, 2016 16:26 |
|
yeah. there's a .ForEach in the PLINQ set but not in the LINQ set and its kind of strange imo or maybe its ForAll? either way
|
# ? Apr 26, 2016 16:45 |
|
CPColin posted:Are you pipelining this operation at all? Like, are you sending the rows through your Scala code as they come out of the database? Or do you have to wait until all the rows have been loaded before you can send the data to the next step? This would be very nice. Brain Candy posted:i see all these questions and they are fine Sure let me just migrate all of our infrastructure and servers Guess I could create the json from inside mysql
|
# ? Apr 26, 2016 16:46 |
|
theres no reason for a ForEach for IEnumerable because any scenario where you'd want it you'd just do a normal foreach.
|
# ? Apr 26, 2016 16:47 |
|
foreach (var x in logfile.Reverse().Take(20).Reverse()) _logger.Info(x); would be a lot nicer as logfile.Reverse().Take(20).Reverse().Foreach(_logger.Info) imo
|
# ? Apr 26, 2016 16:48 |
|
nah the first one is better and also should not be one line.
|
# ? Apr 26, 2016 17:00 |
|
also there should be braces
|
# ? Apr 26, 2016 17:06 |
|
Shaggar posted:nah the first one is better and also should not be one line. yep.
|
# ? Apr 26, 2016 17:10 |
|
|
# ? May 12, 2024 09:09 |
|
prefect posted:also there should be braces also there should be dental plan
|
# ? Apr 26, 2016 17:12 |