Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Blaziken386
Jun 27, 2013

I'm what the kids call: a big nerd

Eela6 posted:

You can do this in pretty much any language. I would personally recommend Python 3 as a starter language, especially for something like this. Java is a language with a lot of boilerplate code that beginners often find frustrating. (Also, I just love Python.)

The Anaconda Distribution will give you everything you need to get up and running in Python. It even has a built-in IDE, Spyder, which I find to be excellent for beginner programmers. It makes it easy to see and correct your mistakes.
Started looking into Python on your recommendation. Currently about a third or so through codeacademy's course on it.

Eela6 posted:

So to give you an example of where to start, I took this wiki page on battle network 2:

http://megaman.wikia.com/wiki/List_of_Mega_Man_Battle_Network_2_Battle_Chips
I scraped the data
http://pastebin.com/0T9E8R2s

I cleaned it up using this script
http://pastebin.com/EcW5maSC

and I outputted it to CSV
http://pastebin.com/eGEm36QA
(you might want to start here!)

There's also some other example code in there that might help you on your way.

I would try and work on your project in three steps.

1: Build your database of all the battle network games (probably by scraping and cleaning fan wikis)
2. Build a series of functions that give you the data you're looking for as text
3. Learn how to build a basic web-page and GUI that gives you access to that data (I recommend flask).

Anyways, if you don't end up using it, no big. I enjoyed this little project.
:toot:
2 is mainly what I'm stuck on.
I'll look into flask as well, but making it look nice and fancy will probably be saved until I can get v1 to work in the first place.

Adbot
ADBOT LOVES YOU

Paul MaudDib
May 3, 2006

TEAM NVIDIA:
FORUM POLICE
I need to render 2D and 3D grids of arbitrary size with selectable coloring for each cell.

Please tell me the Python library that does this. I know there has to be one. Or else, recommend another tool.

TooMuchAbstraction
Oct 14, 2012

I spent four years making
Waves of Steel
Hell yes I'm going to turn my avatar into an ad for it.
Fun Shoe

Paul MaudDib posted:

I need to render 2D and 3D grids of arbitrary size with selectable coloring for each cell.

Please tell me the Python library that does this. I know there has to be one. Or else, recommend another tool.

...Excel? Or Google sheets or similar.

Is this interactive? If not, you could install ImageMagick, a commandline image editing tool. It has a textual image format that's easy to generate by programs, which can then be converted into normal PNGs and so on using its `convert` utility. Very slow though. If you need faster, you could look into the SDL, which is a reasonably straightforward library for 2D videogames that does sprite rendering. Make a sprite for each color you want to support (or figure out how to build arbitrary sprites on the fly) and render them at the desired locations.

Probably one of the graphing libraries could also do this for you, but I'm less familiar with those.

If it is interactive, then you're going to need to process mouse input, have some method of choosing colors, etc. and then you're getting into widget libraries and graphical utilities and so on. That's a lot more work.

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



Paul MaudDib posted:

I need to render 2D and 3D grids of arbitrary size with selectable coloring for each cell.

Please tell me the Python library that does this. I know there has to be one. Or else, recommend another tool.

http://bokeh.pydata.org/en/latest/

Trick Question
Apr 9, 2007


Does anyone know where to find the actual executables you use to create server/client stubs for RPC? I've managed to find a few tutorials on the process, but they all reach a point where they're like "now just run rpcgen" or "now just run uuidgen and then run midl" with no further explanation. I've installed the Windows SDK and done a full install of Visual Studio, but they don't seem to have actually registered to the command prompt at the very least.

I'm on Windows 10, if it matters.

UZR IS BULLSHIT
Jan 25, 2004
Are there any parallel programming people in the house, specifically MPI users?

Peristalsis
Apr 5, 2004
Move along.

UZR IS BULLSHIT posted:

Are there any parallel programming people in the house, specifically MPI users?

I took a course using it 12 or 14 years ago on a Beowulf cluster. I don't remember much about it, but I don't recall the basics being very difficult. Do you have specific questions?

UZR IS BULLSHIT
Jan 25, 2004
I posted this in the scientific computing thread as well:

quote:

Who here is an MPI expert?

I have a little program manager I'm writing where I need to run a very large number of serial jobs on a very large number of cores. I'm writing it in such a way that there's a master process that keeps track of which jobs have been run, and once a slave is done with its current job it sends a message to CPU 0 asking which job it should run next. (The actual executable doing the calculations doesn't have the ability to do this natively, and it's a huge piece of institutional code I have no business loving with).

I'm kinda stuck at the pseudo code level. The way it's written now is that after distributing out the first round of jobs to every process, CPU 0 sits and waits for a new message to come in, and responds to whoever comes asking for a new job until there's nothing left to do.

The problem is the slave processes have no way to know when to stop asking for jobs. I'm thinking I could code CPU 0 so that, once it had no more jobs to give out, it waits for one more message from each slave, and send them a -1 to tell them to break out of their loop.

That feels kinda hacky though, is there a more elegant way to do it?

Peristalsis
Apr 5, 2004
Move along.

UZR IS BULLSHIT posted:

I posted this in the scientific computing thread as well:

If I understand this correctly, it sounds like what you really want is HTCondor and/or DAGMan. As I recall, the latter can handle dependencies between multiple sets of programs.

But for your actual question, I think returning a -1 is a fine idea, especially for a first iteration, and especially if you're writing in C or Fortran. I assume you have some control over the slave processes, and they're what call the black-box executable, so you can extend or generalize this logic later if you need to, but people use MPI for speed and efficiency, not for elegance of code.

So, get it working, then worry about making it pretty later on, if you have time (you won't).

Gul Banana
Nov 28, 2003

Trick Question posted:

Does anyone know where to find the actual executables you use to create server/client stubs for RPC? I've managed to find a few tutorials on the process, but they all reach a point where they're like "now just run rpcgen" or "now just run uuidgen and then run midl" with no further explanation. I've installed the Windows SDK and done a full install of Visual Studio, but they don't seem to have actually registered to the command prompt at the very least.

I'm on Windows 10, if it matters.

try running "visual studio command prompt", which adds some stuff to your PATH

Trick Question
Apr 9, 2007


Gul Banana posted:

try running "visual studio command prompt", which adds some stuff to your PATH

That got it working, thanks!

Paul MaudDib
May 3, 2006

TEAM NVIDIA:
FORUM POLICE

Bingo, I can work with Categorical for 2d. Muchas gracias :D

For 3D, the recommendation I got was working directly with Blender through Python to draw cubes. But that's overkill/clunky for 2D.

Paul MaudDib fucked around with this message at 08:06 on Feb 8, 2017

LP0 ON FIRE
Jan 25, 2006

beep boop
I found source for a raycaster made in C++, and something is confusing me about this loop:

code:
for(int x = 0; x < w; x++)
    {
      //calculate ray position and direction
      double cameraX = 2*x/double(w)-1; //x-coordinate in camera space
w is not defined before the for loop as far as I can see. Is it defined when cameraX is defined? What is the value of w?

source: http://lodev.org/cgtutor/files/raycaster_textured.cpp

site: http://lodev.org/cgtutor/raycasting.html

nielsm
Jun 1, 2009



LP0 ON FIRE posted:

I found source for a raycaster made in C++, and something is confusing me about this loop:

code:
for(int x = 0; x < w; x++)
    {
      //calculate ray position and direction
      double cameraX = 2*x/double(w)-1; //x-coordinate in camera space
w is not defined before the for loop as far as I can see. Is it defined when cameraX is defined? What is the value of w?

source: http://lodev.org/cgtutor/files/raycaster_textured.cpp

site: http://lodev.org/cgtutor/raycasting.html

http://lodev.org/cgtutor/quickcg.html#The_Codebase_Global_Variables

It's defined by the graphics library used.

LP0 ON FIRE
Jan 25, 2006

beep boop

Thank you

plasticbugs
Dec 13, 2006

Special Batman and Robin
I'm a terrible sysadmin. I have a toy website which I've neglected that has about 2000 or so users. The site allows users to create a permanent page (text only, very simple) on my site to connect with other users. These pages can be created without an account - which is a feature. The site is hosted on Heroku and I pay the $9 a month to keep the thing running - mostly as a way for employers to see some working code I've created.

I stupidly do not have a captcha on the home page which allows users to create one of these pages.

My site is not indexed by any search engines that respect the robots.txt file, so there is seemingly no benefit to creating a page if you're a spammer.

That said, spammers have automated the creation of pages on my site to the tune of 5,000,000 spammy entries. These bogus entries are pretty much invisible, except for the ballooning size of my database. I don't want to brush up against a row limit that could send me into a higher pricing tier.

Is my only option to crawl through and find spam terms to delete all the spam entries? Other than a captcha, any suggestions for preventing this kind of madness on future toy sites?

B-Nasty
May 25, 2005

plasticbugs posted:

Other than a captcha, any suggestions for preventing this kind of madness on future toy sites?

A captcha probably your best bet. You could go with some enter email, get email, confirm email thing, but that has its own problems with abuse and you already mentioned you don't want accounts. Google's recaptcha is easy enough to work with that it shouldn't take more than an hour or two to implement it.

Boris Galerkin
Dec 17, 2011

I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!

UZR IS BULLSHIT posted:

I posted this in the scientific computing thread as well:

I feel like you're over complicating things by wanting to build it yourself? Why not use Moab or whatever free/alternative version? They were literally written to do what you want to do: distribute jobs across cores. "A large number of serial jobs on many cores." Is possible too. Just request a single core in your job scripts.

Peristalsis
Apr 5, 2004
Move along.
I've just been given a project to map and import data from our legacy system to our half-built, home-grown replacement. I'll need to analyze the data tables in the old system (StarLIMS), figure out how that data is stored on the new system (if it is at all), and suggest changes to the existing and future schemas in the home-grown program to accommodate the import of the old data.

I guess you'd call this a data migration, though it almost seems like an ETL process. In any case, I've hacked at existing ETL processes in previous jobs, but I've never designed something like this from scratch. I'm doing a Google search now, but I'm wondering if anyone has any first-hand advice. Also, I assume this must be a common enough request that there are some existing tools which can automate much of it - are they worth investigating, or do they tend to require so much customization and configuration that you might as well have written it all from scratch?

Jose Cuervo
Aug 25, 2004
I know that I can use the Google Maps Directions API to return a json file containing the directions from point A to point B, including waypoints and alternative routes if so desired. Doing this requires formatting a string to the specifications provided (e.g.: https://maps.googleapis.com/maps/api/directions/json?origin=New+York,NY&destination=Orlando,FL), and then sending a request:

code:
import requests

r = requests.get('https://maps.googleapis.com/maps/api/directions/json?origin=New+York,NY&destination=Orlando,FL')
print r.json()
However, I was wondering if it was possible to set up a simple website that uses a Google Maps Embed API and then somehow take the route selected on the Maps Embed API and get the json for that route? This would allow me to select the precise route I wanted (for instance select a specific alternative route or reroute a small part of the trip by dragging the blue line). If it is possible, any pointers on what I should look into to do this would be appreciated.

Trick Question
Apr 9, 2007


Okay, this is an extremely stupid question, but: Is there any reason that #include <time.h> would fail to work in c?

I'm not getting any compiler errors about that line, but when it hits the line:
code:
time_t rawtime;
it thinks time_t is a label, and a whole bunch of following lines fail as well. Anyone know what might be causing that?

ultrafilter
Aug 23, 2007

It's okay if you have any questions.


Can you reproduce the error in a short program?

csammis
Aug 26, 2003

Mental Institution
Unfortunately that's a vague description of the problem. Please post the complete failing code and the exact compiler errors.

Trick Question
Apr 9, 2007


Yeah, and also it wasn't the problem at all - code worked fine once I moved those declarations outside of a switch structure I had set up, so whatever. It turns out I get fed up enough to ask for help about five minutes before I solve the problem myself.

The Laplace Demon
Jul 23, 2009

"Oh dear! Oh dear! Heisenberg is a douche!"

Trick Question posted:

Yeah, and also it wasn't the problem at all - code worked fine once I moved those declarations outside of a switch structure I had set up, so whatever. It turns out I get fed up enough to ask for help about five minutes before I solve the problem myself.

These are the kinds of questions that most benefit from rubber ducking or adhering to the Stack Overflow question guidelines. Synthesizing a precise and reproducible problem statement is effective as a debugging technique.

nonathlon
Jul 9, 2004
And yet, somehow, now it's my fault ...

Peristalsis posted:

I've just been given a project to map and import data from our legacy system to our half-built, home-grown replacement. I'll need to analyze the data tables in the old system (StarLIMS), figure out how that data is stored on the new system (if it is at all), and suggest changes to the existing and future schemas in the home-grown program to accommodate the import of the old data.

I guess you'd call this a data migration, though it almost seems like an ETL process. In any case, I've hacked at existing ETL processes in previous jobs, but I've never designed something like this from scratch. I'm doing a Google search now, but I'm wondering if anyone has any first-hand advice. Also, I assume this must be a common enough request that there are some existing tools which can automate much of it - are they worth investigating, or do they tend to require so much customization and configuration that you might as well have written it all from scratch?

I deal with something just like this a few years back: porting a large clinical trial from a legacy db to a new db and system (that actually worked). Some distilled wisdom:

* I think quibbling about whether it's a migration or ETL is neither here not there
* The scale of the task is very hard to predict and all depends on the original data. (Mine was this weird hyper-denormalised and versioned format that was super tough to extract.)
* The standard data science approach would be to write a one-shot ETL / transformation script. My feeling is that this will probably be an iterative task: you run it, something bombs out or is wrong in the output, you correct it and run again, rinse and repeat. This places a big emphasis on being able to correct and add to your script and see what it's doing. I wrote mine as a big table of transformations (get data from X, put it in Y after running it through these functions), which made it easy to see what was happening to anything and modify it.
* Never do anything destructive to your original data.
* There's solutions for this sort of stuff out there, but the only one I'm acquainted with is Pentaho which is fairly heavy weight.

Peristalsis
Apr 5, 2004
Move along.

outlier posted:

I deal with something just like this a few years back: porting a large clinical trial from a legacy db to a new db and system (that actually worked). Some distilled wisdom:

* I think quibbling about whether it's a migration or ETL is neither here not there
* The scale of the task is very hard to predict and all depends on the original data. (Mine was this weird hyper-denormalised and versioned format that was super tough to extract.)
* The standard data science approach would be to write a one-shot ETL / transformation script. My feeling is that this will probably be an iterative task: you run it, something bombs out or is wrong in the output, you correct it and run again, rinse and repeat. This places a big emphasis on being able to correct and add to your script and see what it's doing. I wrote mine as a big table of transformations (get data from X, put it in Y after running it through these functions), which made it easy to see what was happening to anything and modify it.
* Never do anything destructive to your original data.
* There's solutions for this sort of stuff out there, but the only one I'm acquainted with is Pentaho which is fairly heavy weight.

Thanks, that's exactly the kind of info I'm looking for.

I already planned to do this iteratively, either an overall migration from test site to test site that I run over and over until I get it right, or a series of exports/migrations/transformations/whatever, with each doing some related portion of the database (seed data in first round, hydrolysate data in another script, etc.). Or maybe some of both. In either case, I think I'm going to plan to have some sort of audit trail so that additional runs of the script(s) can see what rows they've already processed, and which they haven't.

And as far as I know, we can keep the old data around as long as we like - we just won't get maintenance and updates once we stop paying the yearly licensing fees, so I certainly don't intend to modify the original Oracle database at all.

ArcticZombie
Sep 15, 2010
A question about best practice for when to check user inputted data for errors.

A program takes input from a CSV file. The columns and format of the data in this CSV are well-defined. Each row is information about a person. The class responsible for reading the CSV returns the data in the form of an array of maps which map column headings to data for each row. This data is used to create Person objects. In the case of data not matching the specified format, should the program throw any exceptions when the data is read or should it just accept it and throw any exceptions when creation of a Person object fails? E.g. if a field which is expected to be a number contains something else?

TooMuchAbstraction
Oct 14, 2012

I spent four years making
Waves of Steel
Hell yes I'm going to turn my avatar into an ad for it.
Fun Shoe

ArcticZombie posted:

A question about best practice for when to check user inputted data for errors.

A program takes input from a CSV file. The columns and format of the data in this CSV are well-defined. Each row is information about a person. The class responsible for reading the CSV returns the data in the form of an array of maps which map column headings to data for each row. This data is used to create Person objects. In the case of data not matching the specified format, should the program throw any exceptions when the data is read or should it just accept it and throw any exceptions when creation of a Person object fails? E.g. if a field which is expected to be a number contains something else?

Generally you're best off validating input at the time it is read. Error out at the earliest possible point. The alternative (waiting until the data is used) means that the person debugging the problem has a much larger "search space" to figure out what went wrong, since in principle anything that happened prior to the attempt to create the Person could be the cause of the error.

Suggestion: attach a static validate_params() method to Person, which accepts one of those maps and returns whether or not the contents can be used to create a Person.

Eela6
May 25, 2007
Shredded Hen
I am of the opinion errors should be raised as early as possible. This is a problem with the input, so it should be addressed at that point.

Edit: TooMuchAbstraction beat me to it. I agree with his point about a validation method.

The MUMPSorceress
Jan 6, 2012


^SHTPSTS

Gary’s Answer
Also keep in mind that if you're not sanitizing the input before you shunt it off to create objects, there's always a chance you're opening yourself up to some sort of code injection. At the very least you should be sanity-checking for malicious input while you parse the file.

ExcessBLarg!
Sep 1, 2001

ArcticZombie posted:

The columns and format of the data in this CSV are well-defined. Each row is information about a person. The class responsible for reading the CSV returns the data in the form of an array of maps which map column headings to data for each row.
What is the type of the maps? Presumably the key is a string, since it consists of headers, but are the values also all strings? If so, I think it makes sense for the CSV parser to validate the CSV syntax, but it shouldn't care about the specifics of the data.

ArcticZombie posted:

This data is used to create Person objects.
Whatever method is responsible for calling to construct a Person object should be responsible for validating that the fields that define a Person are appropriately typed, and the Person constructor itself should validate that they're appropriately valued.

Think of it this way: if you were to change the definition of a Person you would only want to have to change input validation in one place, and have it apply to all instances in which a Person may be constructed. Similarly, there's contexts in which may want to create a Person where you don't otherwise care about the details of the CSV syntax. That's one way to guide the separation of responsibility.

ArcticZombie
Sep 15, 2010

ExcessBLarg! posted:

What is the type of the maps? Presumably the key is a string, since it consists of headers, but are the values also all strings? If so, I think it makes sense for the CSV parser to validate the CSV syntax, but it shouldn't care about the specifics of the data.

Whatever method is responsible for calling to construct a Person object should be responsible for validating that the fields that define a Person are appropriately typed, and the Person constructor itself should validate that they're appropriately valued.

Think of it this way: if you were to change the definition of a Person you would only want to have to change input validation in one place, and have it apply to all instances in which a Person may be constructed. Similarly, there's contexts in which may want to create a Person where you don't otherwise care about the details of the CSV syntax. That's one way to guide the separation of responsibility.

You're thinking a long the same lines as I am.

Currently input is validated as it is read. I thought that this was somewhat out of the responsibility for what a CSV parser should do and that it should just do the job of reading a CSV file. The columns are not all the same type so for the map they are just kept as strings. The function responsible for creating Person objects casts the data to the correct type before calling the constructor. That function does not currently do any validation, the data it is receiving has already been validated. I think that this is poor cohesion in that it pushes the responsibility of data validation away from the code which actually uses the data.

The data is not being stored for any length of time before being used to create Person objects. The sole purpose of reading the data is to create these objects, the objects are created immediately after the data is read so I don't think that it would add any extra debugging.

I used the example of creating a Person with data not read from a CSV, but my partner's response was that that isn't going to happen in this project (a university assignment) so it isn't a concern. He's right, but I think that we should follow good practice regardless of it having any effect.

Nippashish
Nov 2, 2005

Let me see you dance!
In an ideal world every piece of code would validate exactly all the assumptions it makes about its input. No more and no less.

sarehu
Apr 20, 2007

(call/cc call/cc)
So every program validates its assumptions, every function validates its assumptions, every statement validates its assumptions, and every expression validates its assumptions?

Nippashish
Nov 2, 2005

Let me see you dance!
In real life you don't validate everything everywhere because that would be insane and not actually possible, but thinking about the ideal world is a good way to answer "What should I validate where?" questions.

TooMuchAbstraction
Oct 14, 2012

I spent four years making
Waves of Steel
Hell yes I'm going to turn my avatar into an ad for it.
Fun Shoe
In real life you'd probably have the CSV parser provide an iterator, allowing you to validate input without the parser needing to know specifics of the application, and resulting in code something like this:
code:
parser = CSVParser(filename)
for i, row in parser.read():
  try:
    person = Person(**row)
  except Exception:
    print("Row " + i + " is not properly-formatted", file=sys.stderr)
    break

Sedro
Dec 31, 2008

sarehu posted:

So every program validates its assumptions, every function validates its assumptions, every statement validates its assumptions, and every expression validates its assumptions?

so like, type systems?

Nippashish
Nov 2, 2005

Let me see you dance!

TooMuchAbstraction posted:

In real life you'd probably have the CSV parser provide an iterator, allowing you to validate input without the parser needing to know specifics of the application, and resulting in code something like this:

Exactly. It would be appropriate for the the CSVParser to validate that it really is reading something that looks like whatever it thinks a well formed csv file is, but it would not be appropriate for it to check that a Person's age is a positive number.

Unless your point that is that IRL you'd use a library for parsing a csv file. That is neither here nor there because at some point you will need to write two pieces of code that interact with each other, or you will need to write the library that someone else treats as a black box. CSV parsers are a dime a dozen but there's a general point to be made here and csv parsing just happens to be the example at hand.

Adbot
ADBOT LOVES YOU

ReelBigLizard
Feb 27, 2003

Fallen Rib
Maybe an odd question - Is there any particular language/environment you guys would recommend for modelling a real world system for simulation purposes? Other than "whatever you use for everything else".

I'm contracting to a company that is trying to optimise a short hop air taxi service. We're tying to optimise in terms of keeping planes as full as possible, pilots as un-fatigued as possible, meeting customer demand, etc. I've done similar work optimising stuff like warehouse and logistics operations before, although this one probably has more inputs and variables than all of my previous work combined. Historically I've just written simulation scripts in whatever was handy and analysed the output, but I was wondering if there is more specialised environments available for feeling out new methodologies and system logic in a more broad sense?

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply