Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
JawnV6
Jul 4, 2004

So hot ...
Seems a fine time to link this: http://reproducibility.cs.arizona.edu/

e: and the paper http://reproducibility.cs.arizona.edu/tr.pdf

Adbot
ADBOT LOVES YOU

Plorkyeran
Mar 22, 2007

To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed

HappyHippo posted:

A large portion of scientific code is a one-off script.
Including a lot of things that shouldn't be. A paper written based on the output of a one-off script is pretty much worthless.

Nippashish
Nov 2, 2005

Let me see you dance!

Plorkyeran posted:

Including a lot of things that shouldn't be. A paper written based on the output of a one-off script is pretty much worthless.

This is objectively untrue, no matter how much it disagrees with the idealist view of science.

Plorkyeran
Mar 22, 2007

To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed
I have zero confidence in my ability to write a nontrivial one-off script that has correct results without some reliable way to verify the results. I see no reason to assume that scientists would be significantly better at writing code without bugs.

If you can externally verify the results then that verification should be the focus of the paper, not the initial generation of stuff.

HappyHippo
Nov 19, 2003
Do you have an Air Miles Card?

Plorkyeran posted:

Including a lot of things that shouldn't be. A paper written based on the output of a one-off script is pretty much worthless.

Yeah not really. A lot of scientific "code" is really just doing a numerical calculation of some equation. A "one-off" is appropriate in that kind of situation.

HappyHippo
Nov 19, 2003
Do you have an Air Miles Card?

Plorkyeran posted:

I have zero confidence in my ability to write a nontrivial one-off script that has correct results without some reliable way to verify the results. I see no reason to assume that scientists would be significantly better at writing code without bugs.

If you can externally verify the results then that verification should be the focus of the paper, not the initial generation of stuff.

Bugs in this sense are no different from any other confounding factor that could affect an experiment. Results are "verified" by comparing them to what's expected by the theory, what's seen in actual experiments, what's seen in other papers studying similar topics, etc. A result that's unexpected or significant will be checked over by the same group and other groups.

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

HappyHippo posted:

Bugs in this sense are no different from any other confounding factor that could affect an experiment. Results are "verified" by comparing them to what's expected by the theory, what's seen in actual experiments, what's seen in other papers studying similar topics, etc. A result that's unexpected or significant will be checked over by the same group and other groups.

Assuming it gets replicated at all...

QuarkJets
Sep 8, 2008

GrumpyDoctor posted:

The objective of scientific code isn't to produce reproducible, comprehensible, robust data processing applications, it's to Get Published, which terrible one-offs perfectly facilitate. This is a much larger problem within the scientific community than "oh scientists are just lazy."

And what about the many scientific projects that use someone's terrible legacy "working" one-off code as a foundation? Your vision of publishing only by writing one-offs is a nice one, but that scenario is uncommon.

e: This just goes back to the argument of "I can't comment anything or make my code comprehensible because I just need to get this project done right now and that's all that matters." Sometimes that's fine! But if you're writing code that has a decent chance of being used and modified in the future, even by yourself, then it's not fine because you're just screwing yourself and others for a tiny time saving now. Nearly all fields of scientific research build on prior results, allowing for code re-use and huge time savings if you do simple things like "write a few comments to remind yourself of what's going on" and "don't name your variables obscure things like 'a', 'b', 'c', etc".

e2: And we are lazy as gently caress, don't kid yourself.

QuarkJets fucked around with this message at 18:52 on May 16, 2014

One Eye Open
Sep 19, 2006
Am I awake?

substitute posted:

Graphic designers are the worst.

I've been a scientist, a mathematician and a graphic designer. My code must be hideous!

QuarkJets
Sep 8, 2008

HappyHippo posted:

Yeah not really. A lot of scientific "code" is really just doing a numerical calculation of some equation. A "one-off" is appropriate in that kind of situation.

What are you basing this statement on?

HappyHippo
Nov 19, 2003
Do you have an Air Miles Card?

QuarkJets posted:

What are you basing this statement on?

My research doing scientific calculations, and interactions with other scientists doing the same?

Subjunctive
Sep 12, 2006

✨sparkle and shine✨

Ender.uNF posted:

When all you have is screws, every tool becomes a screwdriver.

(You can write asynchronous code in a wide variety of languages and frameworks; see ASP MVC async controllers)

Sure can. But if you're using Node then you know that everyone was writing code expecting asynchrony, so you don't have some 3rd-party library that decides to do main-thread DNS or database or filesystem access.

HappyHippo
Nov 19, 2003
Do you have an Air Miles Card?
It's not that I'm saying scientific code isn't terrible from the standards of normal programming. It is. The thing is that those standards were developed in a context where code spends far more time being maintained than being written, which is why they emphasize readability, encapsulation, etc. That is not necessarily the context in which scientific code is always written, so you can't just blindly apply the same standards to it without recognizing why they're appropriate in the first place.

It's not that I think the pendulum should be swung all the way to no comments or "a b c d" variable names. A small amount of effort could certainly make for better code, unfortunately a lot of scientists have virtually no formal training in programming. I just think it's silly to expect scientific code to live up to the standards applied to large scale applications when such over-engineering would be inappropriate given the context it's written in.

Zombywuf
Mar 29, 2008

HappyHippo posted:

It's not that I think the pendulum should be swung all the way to no comments or "a b c d" variable names. A small amount of effort could certainly make for better code, unfortunately a lot of scientists have virtually no formal training in programming. I just think it's silly to expect scientific code to live up to the standards applied to large scale applications when such over-engineering would be inappropriate given the context it's written in.

Comprehensible readable code is not over-engineering. Quite the opposite in fact.

Verloc
Feb 15, 2001

Note to self: Posting 'lulz' is not a good idea.
I'll just leave this here and you guys can choose your own adventure. Please note that __Entity is never null checked, and __Entity.CompleteDate is of type DateTime?
code:
try{
this.compPickerDate.SelectedDate = System.Convert.ToDateTime(__Entity.CompleteDate);
}
catch
{
}
:allears:

HappyHippo
Nov 19, 2003
Do you have an Air Miles Card?

Zombywuf posted:

Comprehensible readable code is not over-engineering. Quite the opposite in fact.

Yes, clearly I'm advocating incomprehensible code. Good post.

JawnV6
Jul 4, 2004

So hot ...

HappyHippo posted:

It's not that I'm saying scientific code isn't terrible from the standards of normal programming. It is. The thing is that those standards were developed in a context where code spends far more time being maintained than being written, which is why they emphasize readability, encapsulation, etc. That is not necessarily the context in which scientific code is always written, so you can't just blindly apply the same standards to it without recognizing why they're appropriate in the first place.

It's not that I think the pendulum should be swung all the way to no comments or "a b c d" variable names. A small amount of effort could certainly make for better code, unfortunately a lot of scientists have virtually no formal training in programming. I just think it's silly to expect scientific code to live up to the standards applied to large scale applications when such over-engineering would be inappropriate given the context it's written in.

Can you give an example of a situation where it would be totally appropriate to generate a result from a program, write the paper about it, then discard the program? I'm not doubting one exists, it's just that you haven't given a discipline narrower than "science" that you're talking about and it's a little difficult to engage with that paucity of context.

substitute
Aug 30, 2003

you for my mum
From developer at our ad agency.

php:
<?
    if( isset($_GET['vars']) && $_GET['vars'] != '') {
        $url = $_GET['vars'];
    }
    

    if($url == '') {
        try {
            $conn = new PDO('mysql:host=localhost;dbname=' . DB_NAME, DB_USER, DB_PASSWORD);
            $conn->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);    
             
            $stmt = $conn->prepare('SELECT * FROM articles ORDER BY article_date DESC');
            $stmt->execute();
            $articles_array = $stmt->fetchAll();

            if( count($articles_array) > 0) {
                $article_template = 'articles_listing.php';
            } else {
                header('Location: /articles/');
                exit;
            }
         
        } catch(PDOException $e) {
            header('Location: /articles/');
            exit;
        }

    } else {
        try {
            $conn = new PDO('mysql:host=localhost;dbname=' . DB_NAME, DB_USER, DB_PASSWORD);
            $conn->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);    
             
            $stmt = $conn->prepare('SELECT * FROM articles WHERE url = ?');
            $stmt->execute(array($url));
            $articles_array = $stmt->fetchAll();

            if( count($articles_array) > 0) {

                //SET META TITLE TO ARTICLE TITLE
                $page_title = some_unnecessary_function_copied_from_codeigniter($articles_array[0]['title']);

                /* CREATE PREV/NEXT ARRAY FOR PAGINATION */
                $stmt = $conn->prepare('SELECT * FROM articles ORDER BY article_date DESC');
                $stmt->execute();
                $articles_list_array = $stmt->fetchAll();

                foreach($articles_list_array as $article) {
                    $pagination_array[] = $article['id'];
                    $url_array[$article['id']] = $article['url'];
                }

                $array_key = array_search($articles_array[0]['id'], $pagination_array);

                if(array_key_exists($array_key-1, $pagination_array)) {
                    $prev_link = $url_array[ $pagination_array[$array_key-1] ];
                }
                if(array_key_exists($array_key+1, $pagination_array)) {
                    $next_link = $url_array[ $pagination_array[$array_key+1] ];
                }

                $article_template = 'articles_single.php';
            } else {
                header('Location: /articles/');
                exit;
            }
         
        } catch(PDOException $e) {
            header('Location: /articles/');
            exit;
        }
    }


// then there was some HTML below all of this this, wrapping the included $article_template 

?>

substitute fucked around with this message at 22:57 on May 16, 2014

HappyHippo
Nov 19, 2003
Do you have an Air Miles Card?

JawnV6 posted:

Can you give an example of a situation where it would be totally appropriate to generate a result from a program, write the paper about it, then discard the program? I'm not doubting one exists, it's just that you haven't given a discipline narrower than "science" that you're talking about and it's a little difficult to engage with that paucity of context.

Well what I do involves models which are expressed as differential equations. Since non-trivial systems of differential equations can't be solved exactly, we need to get results numerically. So there's a lot of code that a) sets up arrays of data, b) iterates the equations in time, c) outputs the result. Now some of that is reusable, and I extract those parts into reusable components, and we also use libraries for things like fast fourier transforms and parallel processing, but a lot of it is just the particulars of whatever model and initial conditions you want to simulate, and those parts aren't really general enough to be reused, and it never looks very pretty. You can sort of reuse it, but you tend to rip out the old equations and put new ones in.

QuarkJets
Sep 8, 2008

HappyHippo posted:

My research doing scientific calculations, and interactions with other scientists doing the same?

Why did you frame your answer as a question?

What field are you in? I'm not accusing you of lying, but as an experimental physicist almost none of my code is just simple one off computation of a formula, and even when that's what I am doing it's still part of a larger software project (eg solving a complex integral so that the result can be an input to something else, eventually feeding into some sort of analysis function), so I am curious to see whether most of your work is theoretical or something where pure one off computation for everything may make sense

E: that's what I get for leaving my comment open for so long, you already answered my question. Yes, if you are just solving sets of equations over and over then it's great to just script it and be done, but I still don't know why you believe that this is representative of most scientific computing.

QuarkJets fucked around with this message at 22:35 on May 16, 2014

necrotic
Aug 2, 2005
I owe my brother big time for this!

substitute posted:

From developer at our ad agency.

Not the worst I've seen. At least he's using PDO and prepared statements.

QuarkJets
Sep 8, 2008

As an example of a scientific project reusing code, consider the thousands of analysis projects using large hadron collider data. Starting over from scratch after every new collection period would be a massive waste. In that environment, relatively few parameters change much over time and there is significant effort to validate results before publishing them, so it's a good idea to make code that is easy to comprehend and reusable.

Nippashish
Nov 2, 2005

Let me see you dance!

QuarkJets posted:

As an example of a scientific project reusing code, consider the thousands of analysis projects using large hadron collider data. Starting over from scratch after every new collection period would be a massive waste. In that environment, relatively few parameters change much over time and there is significant effort to validate results before publishing them, so it's a good idea to make code that is easy to comprehend and reusable.

It is never a bad idea to write good code. It is also false to say that no good science comes from bad code.

substitute
Aug 30, 2003

you for my mum

necrotic posted:

Not the worst I've seen. At least he's using PDO and prepared statements.

I think he was told to use PDO or given an example. I could post the "templates" which are just two more files of procedural code looping / printing the content, depending on whichever IF branch. Could have just left it in considering everything, or saved the output to a variable and printed it in the HTML further down (I deleted the HTML). And then there was another DB connection with query / output loop, copied in the homepage to display 3 results there, with an empty catch. Just a mess of repeating yourself all around.

I re-wrote it before go live.

Fellatio del Toro
Mar 21, 2009

Well the worst code I've probably ever seen was helping out some of the scientists here. It's a laboratory environment and their approach seemed to be 'gently caress it we just need to get this data into that controller.'

They were working in some scientific software using some sort of C-based scripting language sending numerical data, formatted in an ASCII string, using a print() function to some lab equipment. The machine would only take a limited amount of data in ASCII format so they needed to convert it to binary and send it that way. They actually managed to successfully get the data into whatever 16bit floating point format the machine wanted but they didn't know how to send it except in a string. So they dumped all their binary data into a big string and tried to print.

Their data happened to include zeroes so every time they ran the print command it would terminate part of the way through the data. I spent like, 20+ minutes trying to explain to them why they couldn't stick that data in a string and expect it to work. They were inexplicably adamant about continuing to try and do it that way and instead just sat around trying to come up with some sort of scheme to manipulate the data so that it just didn't have any zeroes. Instead of using the block data format that the documentation explicitly told them to use.

Coffee Mugshot
Jun 26, 2010

by Lowtax
Well of course the code from seasoned software engineer looks better, they just bicker over the syntax, highlighting, use of white space, readability, etc, and never actually write any lines of code.

ohgodwhat
Aug 6, 2005

HardDisk posted:

Can't you hook up something like Jinja2?

Yeah, let me go rewrite ipython nbconvert, instead of just installing node.js.

raminasi
Jan 25, 2005

a last drink with no ice
oops misread something

Simulated
Sep 28, 2001
Lowtax giveth, and Lowtax taketh away.
College Slice

Subjunctive posted:

Sure can. But if you're using Node then you know that everyone was writing code expecting asynchrony, so you don't have some 3rd-party library that decides to do main-thread DNS or database or filesystem access.

Oh sure, and I don't have any hate for node.js; I plan on using it for my next project. I just think its hilarious how the new hotness exposes junior developers to a concept they were previously unaware of and they think it invented the concept. :damnkids:

QuarkJets
Sep 8, 2008

Nippashish posted:

It is never a bad idea to write good code. It is also false to say that no good science comes from bad code.

I don't think that anyone has claimed that bad code only produces bad science. The bizarre part is where people argue that bad code is always good so long as it produces good results right now, even if the code is impossible to maintain or comprehend. This is fine if the code doesn't actually need to be maintained or re-used

Doctor w-rw-rw-
Jun 24, 2008

Ender.uNF posted:

Oh sure, and I don't have any hate for node.js; I plan on using it for my next project. I just think its hilarious how the new hotness exposes junior developers to a concept they were previously unaware of and they think it invented the concept. :damnkids:
I think "constantly reinventing the wheel" pretty much describes my experience with node and node developers. It's not bad per se, but after a certain threshold of complexity or scalability, it just doesn't feel up to snuff to me.

Nippashish
Nov 2, 2005

Let me see you dance!

QuarkJets posted:

I don't think that anyone has claimed that bad code only produces bad science.

That is how I interpret "A paper written based on the output of a one-off script is pretty much worthless.".

silvergoose
Mar 18, 2006

IT IS SAID THE TEARS OF THE BWEENIX CAN HEAL ALL WOUNDS




Nippashish posted:

That is how I interpret "A paper written based on the output of a one-off script is pretty much worthless.".

Would you accept "is near impossible to verify the results of" rather than "pretty much worthless"?

Subjunctive
Sep 12, 2006

✨sparkle and shine✨

Doctor w-rw-rw- posted:

I think "constantly reinventing the wheel" pretty much describes my experience with node and node developers. It's not bad per se, but after a certain threshold of complexity or scalability, it just doesn't feel up to snuff to me.

I feel the same way about most languages: they all hit limits of some kind. Transpiling ES6 features helps a lot with managing flow complexity, though.

Everyone reinvents the wheel, I think the Node community does it in a more visible way than the billion different smart pointer templates of different C++ projects or whatever. There are definitely lots of "my slight variant on work queues", but I think that's mostly just people feeling their way through a new programming model, and being happy enough with the experience that they share it. It's not like I have to use any of them, so it doesn't really hurt to have them around.

As far as scalability goes, though, at least one very big site uses Node to render React on the server side, and I think it works pretty well for them.

silvergoose posted:

Would you accept "is near impossible to verify the results of" rather than "pretty much worthless"?

Wouldn't you want to verify independently based on the computation as described in the paper, rather than assume that the software was a faithful implementation of it?

Subjunctive fucked around with this message at 15:33 on May 17, 2014

SurgicalOntologist
Jun 17, 2004

ohgodwhat posted:

Yeah, let me go rewrite ipython nbconvert, instead of just installing node.js.

For me, it has a backup implementation it used when I didn't have node. But then I installed node so I don't remember what it was. The output looked the same though.


silvergoose posted:

Would you accept "is near impossible to verify the results of" rather than "pretty much worthless"?

Is verification really always necessary? What if you have a 20-line script that generates your figures, puts the labels in the right places, etc? I can at least understand what verification would entail for (e.g.) a script that reads data and spits out the results of a statistical test, but even then, if scientists can't trust any packages/libraries they're using, nothing would get done.

There are times when one-off scripts are okay and times when they're not okay. Is that really controversial?

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

SurgicalOntologist posted:

There are times when one-off scripts are okay and times when they're not okay. Is that really controversial?

I don't think anyone sane would claim otherwise, but the flip side is also true. You should write code to ease replication, and it often isn't. Of course, the incentives aren't there to encourage this when scientists have to live by publish-or-die.

A lot of the arguments are really around where the line is between one-off script and code you should be providing to others who want to replicate your results.

Science has a replication problem. Hard-to-understand code, and unspecified computing environments are a part of that.

SurgicalOntologist
Jun 17, 2004

Yes, I agree completely. In my field, writing re-usable code is rarely necessary (generally, someone's already written it. We're not breaking new ground computationally), but never done. In other fields it's probably often necessary and rarely done.

Doctor w-rw-rw-
Jun 24, 2008

Subjunctive posted:

I feel the same way about most languages: they all hit limits of some kind. Transpiling ES6 features helps a lot with managing flow complexity, though.

Everyone reinvents the wheel, I think the Node community does it in a more visible way than the billion different smart pointer templates of different C++ projects or whatever. There are definitely lots of "my slight variant on work queues", but I think that's mostly just people feeling their way through a new programming model, and being happy enough with the experience that they share it. It's not like I have to use any of them, so it doesn't really hurt to have them around.
<most of my reply deleted because it rehashes previous discussions>

I find the choice paralysis in learning C++ very unsettling much like I do in Javascript. Still, though, even C++ has the standard lib and Boost as a first line of defense against reinventing the wheel. Javascript has too many answers which aren't wrong enough to ignore, and that's actually not that great. I generally like TOOWTDI, or something as close to it as possible, but Node is so diverse that code reuse and domain knowledge are much harder to share. I don't want to do Node full-time, so that's a deal-breaker for me, personally. It doesn't have C++'s closest-to-the-metal-without-being-C-or-assembly performance redeeming it, either.

Subjunctive posted:

As far as scalability goes, though, at least one very big site uses Node to render React on the server side, and I think it works pretty well for them.

Facebook's limits are not a person's limits and not a startup's limits. Facebook has the benefit of teams dedicated to product infrastructure, world-class talent, lots of performance-tuned servers running in Facebook-built datacenters with Facebook-owned pipes on the bare metal, monitored closely by a dedicated ops staff, with many thousands - possibly even millions - of man-hours of experience and institutional knowledge on how to scale. And the scope of its usage is limited in comparison. Facebook still uses Thrift to use C++, Java, Hack/PHP, or others when it matters. It also has the manpower to kill path dependencies where it identifies a need, i.e. to rebuild some important piece of infrastructure (like comments) from the ground up, better, even when a lot of stuff depends on it.

For startups, defaults matter much more. "The normal way of doing things" matter a lot more. Previous implementation decisions matter a lot more. With fewer engineering resources at hand, the impact of bad decisions is far scarier.

ohgodwhat
Aug 6, 2005

SurgicalOntologist posted:

For me, it has a backup implementation it used when I didn't have node. But then I installed node so I don't remember what it was. The output looked the same though.

Yeah, pandoc I think, which wouldn't work without pywin32 either, and which for *reasons* I can't install. It just seems so asinine to me that ipython, which is otherwise a pretty decent project, would go about this bit in probably the most inconvenient way.

Adbot
ADBOT LOVES YOU

Hughlander
May 11, 2005

SurgicalOntologist posted:


Is verification really always necessary?

Yes. If a third party can't verify it, it's not science.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply