|
Seems a fine time to link this: http://reproducibility.cs.arizona.edu/ e: and the paper http://reproducibility.cs.arizona.edu/tr.pdf
|
# ? May 16, 2014 17:25 |
|
|
# ? May 17, 2024 01:13 |
|
HappyHippo posted:A large portion of scientific code is a one-off script.
|
# ? May 16, 2014 17:40 |
|
Plorkyeran posted:Including a lot of things that shouldn't be. A paper written based on the output of a one-off script is pretty much worthless. This is objectively untrue, no matter how much it disagrees with the idealist view of science.
|
# ? May 16, 2014 17:59 |
|
I have zero confidence in my ability to write a nontrivial one-off script that has correct results without some reliable way to verify the results. I see no reason to assume that scientists would be significantly better at writing code without bugs. If you can externally verify the results then that verification should be the focus of the paper, not the initial generation of stuff.
|
# ? May 16, 2014 18:07 |
|
Plorkyeran posted:Including a lot of things that shouldn't be. A paper written based on the output of a one-off script is pretty much worthless. Yeah not really. A lot of scientific "code" is really just doing a numerical calculation of some equation. A "one-off" is appropriate in that kind of situation.
|
# ? May 16, 2014 18:08 |
|
Plorkyeran posted:I have zero confidence in my ability to write a nontrivial one-off script that has correct results without some reliable way to verify the results. I see no reason to assume that scientists would be significantly better at writing code without bugs. Bugs in this sense are no different from any other confounding factor that could affect an experiment. Results are "verified" by comparing them to what's expected by the theory, what's seen in actual experiments, what's seen in other papers studying similar topics, etc. A result that's unexpected or significant will be checked over by the same group and other groups.
|
# ? May 16, 2014 18:19 |
|
HappyHippo posted:Bugs in this sense are no different from any other confounding factor that could affect an experiment. Results are "verified" by comparing them to what's expected by the theory, what's seen in actual experiments, what's seen in other papers studying similar topics, etc. A result that's unexpected or significant will be checked over by the same group and other groups. Assuming it gets replicated at all...
|
# ? May 16, 2014 18:24 |
|
GrumpyDoctor posted:The objective of scientific code isn't to produce reproducible, comprehensible, robust data processing applications, it's to Get Published, which terrible one-offs perfectly facilitate. This is a much larger problem within the scientific community than "oh scientists are just lazy." And what about the many scientific projects that use someone's terrible legacy "working" one-off code as a foundation? Your vision of publishing only by writing one-offs is a nice one, but that scenario is uncommon. e: This just goes back to the argument of "I can't comment anything or make my code comprehensible because I just need to get this project done right now and that's all that matters." Sometimes that's fine! But if you're writing code that has a decent chance of being used and modified in the future, even by yourself, then it's not fine because you're just screwing yourself and others for a tiny time saving now. Nearly all fields of scientific research build on prior results, allowing for code re-use and huge time savings if you do simple things like "write a few comments to remind yourself of what's going on" and "don't name your variables obscure things like 'a', 'b', 'c', etc". e2: And we are lazy as gently caress, don't kid yourself. QuarkJets fucked around with this message at 18:52 on May 16, 2014 |
# ? May 16, 2014 18:39 |
|
substitute posted:Graphic designers are the worst. I've been a scientist, a mathematician and a graphic designer. My code must be hideous!
|
# ? May 16, 2014 18:44 |
|
HappyHippo posted:Yeah not really. A lot of scientific "code" is really just doing a numerical calculation of some equation. A "one-off" is appropriate in that kind of situation. What are you basing this statement on?
|
# ? May 16, 2014 19:04 |
|
QuarkJets posted:What are you basing this statement on? My research doing scientific calculations, and interactions with other scientists doing the same?
|
# ? May 16, 2014 19:43 |
|
Ender.uNF posted:When all you have is screws, every tool becomes a screwdriver. Sure can. But if you're using Node then you know that everyone was writing code expecting asynchrony, so you don't have some 3rd-party library that decides to do main-thread DNS or database or filesystem access.
|
# ? May 16, 2014 20:01 |
|
It's not that I'm saying scientific code isn't terrible from the standards of normal programming. It is. The thing is that those standards were developed in a context where code spends far more time being maintained than being written, which is why they emphasize readability, encapsulation, etc. That is not necessarily the context in which scientific code is always written, so you can't just blindly apply the same standards to it without recognizing why they're appropriate in the first place. It's not that I think the pendulum should be swung all the way to no comments or "a b c d" variable names. A small amount of effort could certainly make for better code, unfortunately a lot of scientists have virtually no formal training in programming. I just think it's silly to expect scientific code to live up to the standards applied to large scale applications when such over-engineering would be inappropriate given the context it's written in.
|
# ? May 16, 2014 20:10 |
|
HappyHippo posted:It's not that I think the pendulum should be swung all the way to no comments or "a b c d" variable names. A small amount of effort could certainly make for better code, unfortunately a lot of scientists have virtually no formal training in programming. I just think it's silly to expect scientific code to live up to the standards applied to large scale applications when such over-engineering would be inappropriate given the context it's written in. Comprehensible readable code is not over-engineering. Quite the opposite in fact.
|
# ? May 16, 2014 20:51 |
|
I'll just leave this here and you guys can choose your own adventure. Please note that __Entity is never null checked, and __Entity.CompleteDate is of type DateTime? code:
|
# ? May 16, 2014 21:25 |
|
Zombywuf posted:Comprehensible readable code is not over-engineering. Quite the opposite in fact. Yes, clearly I'm advocating incomprehensible code. Good post.
|
# ? May 16, 2014 21:35 |
|
HappyHippo posted:It's not that I'm saying scientific code isn't terrible from the standards of normal programming. It is. The thing is that those standards were developed in a context where code spends far more time being maintained than being written, which is why they emphasize readability, encapsulation, etc. That is not necessarily the context in which scientific code is always written, so you can't just blindly apply the same standards to it without recognizing why they're appropriate in the first place. Can you give an example of a situation where it would be totally appropriate to generate a result from a program, write the paper about it, then discard the program? I'm not doubting one exists, it's just that you haven't given a discipline narrower than "science" that you're talking about and it's a little difficult to engage with that paucity of context.
|
# ? May 16, 2014 21:39 |
|
From developer at our ad agency.php:<? if( isset($_GET['vars']) && $_GET['vars'] != '') { $url = $_GET['vars']; } if($url == '') { try { $conn = new PDO('mysql:host=localhost;dbname=' . DB_NAME, DB_USER, DB_PASSWORD); $conn->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION); $stmt = $conn->prepare('SELECT * FROM articles ORDER BY article_date DESC'); $stmt->execute(); $articles_array = $stmt->fetchAll(); if( count($articles_array) > 0) { $article_template = 'articles_listing.php'; } else { header('Location: /articles/'); exit; } } catch(PDOException $e) { header('Location: /articles/'); exit; } } else { try { $conn = new PDO('mysql:host=localhost;dbname=' . DB_NAME, DB_USER, DB_PASSWORD); $conn->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION); $stmt = $conn->prepare('SELECT * FROM articles WHERE url = ?'); $stmt->execute(array($url)); $articles_array = $stmt->fetchAll(); if( count($articles_array) > 0) { //SET META TITLE TO ARTICLE TITLE $page_title = some_unnecessary_function_copied_from_codeigniter($articles_array[0]['title']); /* CREATE PREV/NEXT ARRAY FOR PAGINATION */ $stmt = $conn->prepare('SELECT * FROM articles ORDER BY article_date DESC'); $stmt->execute(); $articles_list_array = $stmt->fetchAll(); foreach($articles_list_array as $article) { $pagination_array[] = $article['id']; $url_array[$article['id']] = $article['url']; } $array_key = array_search($articles_array[0]['id'], $pagination_array); if(array_key_exists($array_key-1, $pagination_array)) { $prev_link = $url_array[ $pagination_array[$array_key-1] ]; } if(array_key_exists($array_key+1, $pagination_array)) { $next_link = $url_array[ $pagination_array[$array_key+1] ]; } $article_template = 'articles_single.php'; } else { header('Location: /articles/'); exit; } } catch(PDOException $e) { header('Location: /articles/'); exit; } } // then there was some HTML below all of this this, wrapping the included $article_template ?> substitute fucked around with this message at 22:57 on May 16, 2014 |
# ? May 16, 2014 21:47 |
|
JawnV6 posted:Can you give an example of a situation where it would be totally appropriate to generate a result from a program, write the paper about it, then discard the program? I'm not doubting one exists, it's just that you haven't given a discipline narrower than "science" that you're talking about and it's a little difficult to engage with that paucity of context. Well what I do involves models which are expressed as differential equations. Since non-trivial systems of differential equations can't be solved exactly, we need to get results numerically. So there's a lot of code that a) sets up arrays of data, b) iterates the equations in time, c) outputs the result. Now some of that is reusable, and I extract those parts into reusable components, and we also use libraries for things like fast fourier transforms and parallel processing, but a lot of it is just the particulars of whatever model and initial conditions you want to simulate, and those parts aren't really general enough to be reused, and it never looks very pretty. You can sort of reuse it, but you tend to rip out the old equations and put new ones in.
|
# ? May 16, 2014 21:55 |
|
HappyHippo posted:My research doing scientific calculations, and interactions with other scientists doing the same? Why did you frame your answer as a question? What field are you in? I'm not accusing you of lying, but as an experimental physicist almost none of my code is just simple one off computation of a formula, and even when that's what I am doing it's still part of a larger software project (eg solving a complex integral so that the result can be an input to something else, eventually feeding into some sort of analysis function), so I am curious to see whether most of your work is theoretical or something where pure one off computation for everything may make sense E: that's what I get for leaving my comment open for so long, you already answered my question. Yes, if you are just solving sets of equations over and over then it's great to just script it and be done, but I still don't know why you believe that this is representative of most scientific computing. QuarkJets fucked around with this message at 22:35 on May 16, 2014 |
# ? May 16, 2014 22:25 |
|
substitute posted:From developer at our ad agency. Not the worst I've seen. At least he's using PDO and prepared statements.
|
# ? May 16, 2014 22:28 |
|
As an example of a scientific project reusing code, consider the thousands of analysis projects using large hadron collider data. Starting over from scratch after every new collection period would be a massive waste. In that environment, relatively few parameters change much over time and there is significant effort to validate results before publishing them, so it's a good idea to make code that is easy to comprehend and reusable.
|
# ? May 16, 2014 22:41 |
|
QuarkJets posted:As an example of a scientific project reusing code, consider the thousands of analysis projects using large hadron collider data. Starting over from scratch after every new collection period would be a massive waste. In that environment, relatively few parameters change much over time and there is significant effort to validate results before publishing them, so it's a good idea to make code that is easy to comprehend and reusable. It is never a bad idea to write good code. It is also false to say that no good science comes from bad code.
|
# ? May 16, 2014 22:57 |
|
necrotic posted:Not the worst I've seen. At least he's using PDO and prepared statements. I think he was told to use PDO or given an example. I could post the "templates" which are just two more files of procedural code looping / printing the content, depending on whichever IF branch. Could have just left it in considering everything, or saved the output to a variable and printed it in the HTML further down (I deleted the HTML). And then there was another DB connection with query / output loop, copied in the homepage to display 3 results there, with an empty catch. Just a mess of repeating yourself all around. I re-wrote it before go live.
|
# ? May 16, 2014 22:58 |
|
Well the worst code I've probably ever seen was helping out some of the scientists here. It's a laboratory environment and their approach seemed to be 'gently caress it we just need to get this data into that controller.' They were working in some scientific software using some sort of C-based scripting language sending numerical data, formatted in an ASCII string, using a print() function to some lab equipment. The machine would only take a limited amount of data in ASCII format so they needed to convert it to binary and send it that way. They actually managed to successfully get the data into whatever 16bit floating point format the machine wanted but they didn't know how to send it except in a string. So they dumped all their binary data into a big string and tried to print. Their data happened to include zeroes so every time they ran the print command it would terminate part of the way through the data. I spent like, 20+ minutes trying to explain to them why they couldn't stick that data in a string and expect it to work. They were inexplicably adamant about continuing to try and do it that way and instead just sat around trying to come up with some sort of scheme to manipulate the data so that it just didn't have any zeroes. Instead of using the block data format that the documentation explicitly told them to use.
|
# ? May 16, 2014 23:43 |
|
Well of course the code from seasoned software engineer looks better, they just bicker over the syntax, highlighting, use of white space, readability, etc, and never actually write any lines of code.
|
# ? May 17, 2014 00:02 |
|
HardDisk posted:Can't you hook up something like Jinja2? Yeah, let me go rewrite ipython nbconvert, instead of just installing node.js.
|
# ? May 17, 2014 00:37 |
|
oops misread something
|
# ? May 17, 2014 03:32 |
|
Subjunctive posted:Sure can. But if you're using Node then you know that everyone was writing code expecting asynchrony, so you don't have some 3rd-party library that decides to do main-thread DNS or database or filesystem access. Oh sure, and I don't have any hate for node.js; I plan on using it for my next project. I just think its hilarious how the new hotness exposes junior developers to a concept they were previously unaware of and they think it invented the concept. :damnkids:
|
# ? May 17, 2014 05:08 |
|
Nippashish posted:It is never a bad idea to write good code. It is also false to say that no good science comes from bad code. I don't think that anyone has claimed that bad code only produces bad science. The bizarre part is where people argue that bad code is always good so long as it produces good results right now, even if the code is impossible to maintain or comprehend. This is fine if the code doesn't actually need to be maintained or re-used
|
# ? May 17, 2014 06:00 |
|
Ender.uNF posted:Oh sure, and I don't have any hate for node.js; I plan on using it for my next project. I just think its hilarious how the new hotness exposes junior developers to a concept they were previously unaware of and they think it invented the concept. :damnkids:
|
# ? May 17, 2014 06:27 |
|
QuarkJets posted:I don't think that anyone has claimed that bad code only produces bad science. That is how I interpret "A paper written based on the output of a one-off script is pretty much worthless.".
|
# ? May 17, 2014 10:54 |
Nippashish posted:That is how I interpret "A paper written based on the output of a one-off script is pretty much worthless.". Would you accept "is near impossible to verify the results of" rather than "pretty much worthless"?
|
|
# ? May 17, 2014 13:53 |
|
Doctor w-rw-rw- posted:I think "constantly reinventing the wheel" pretty much describes my experience with node and node developers. It's not bad per se, but after a certain threshold of complexity or scalability, it just doesn't feel up to snuff to me. I feel the same way about most languages: they all hit limits of some kind. Transpiling ES6 features helps a lot with managing flow complexity, though. Everyone reinvents the wheel, I think the Node community does it in a more visible way than the billion different smart pointer templates of different C++ projects or whatever. There are definitely lots of "my slight variant on work queues", but I think that's mostly just people feeling their way through a new programming model, and being happy enough with the experience that they share it. It's not like I have to use any of them, so it doesn't really hurt to have them around. As far as scalability goes, though, at least one very big site uses Node to render React on the server side, and I think it works pretty well for them. silvergoose posted:Would you accept "is near impossible to verify the results of" rather than "pretty much worthless"? Wouldn't you want to verify independently based on the computation as described in the paper, rather than assume that the software was a faithful implementation of it? Subjunctive fucked around with this message at 15:33 on May 17, 2014 |
# ? May 17, 2014 15:31 |
|
ohgodwhat posted:Yeah, let me go rewrite ipython nbconvert, instead of just installing node.js. For me, it has a backup implementation it used when I didn't have node. But then I installed node so I don't remember what it was. The output looked the same though. silvergoose posted:Would you accept "is near impossible to verify the results of" rather than "pretty much worthless"? Is verification really always necessary? What if you have a 20-line script that generates your figures, puts the labels in the right places, etc? I can at least understand what verification would entail for (e.g.) a script that reads data and spits out the results of a statistical test, but even then, if scientists can't trust any packages/libraries they're using, nothing would get done. There are times when one-off scripts are okay and times when they're not okay. Is that really controversial?
|
# ? May 17, 2014 16:35 |
|
SurgicalOntologist posted:There are times when one-off scripts are okay and times when they're not okay. Is that really controversial? I don't think anyone sane would claim otherwise, but the flip side is also true. You should write code to ease replication, and it often isn't. Of course, the incentives aren't there to encourage this when scientists have to live by publish-or-die. A lot of the arguments are really around where the line is between one-off script and code you should be providing to others who want to replicate your results. Science has a replication problem. Hard-to-understand code, and unspecified computing environments are a part of that.
|
# ? May 17, 2014 17:01 |
|
Yes, I agree completely. In my field, writing re-usable code is rarely necessary (generally, someone's already written it. We're not breaking new ground computationally), but never done. In other fields it's probably often necessary and rarely done.
|
# ? May 17, 2014 17:12 |
|
Subjunctive posted:I feel the same way about most languages: they all hit limits of some kind. Transpiling ES6 features helps a lot with managing flow complexity, though. I find the choice paralysis in learning C++ very unsettling much like I do in Javascript. Still, though, even C++ has the standard lib and Boost as a first line of defense against reinventing the wheel. Javascript has too many answers which aren't wrong enough to ignore, and that's actually not that great. I generally like TOOWTDI, or something as close to it as possible, but Node is so diverse that code reuse and domain knowledge are much harder to share. I don't want to do Node full-time, so that's a deal-breaker for me, personally. It doesn't have C++'s closest-to-the-metal-without-being-C-or-assembly performance redeeming it, either. Subjunctive posted:As far as scalability goes, though, at least one very big site uses Node to render React on the server side, and I think it works pretty well for them. Facebook's limits are not a person's limits and not a startup's limits. Facebook has the benefit of teams dedicated to product infrastructure, world-class talent, lots of performance-tuned servers running in Facebook-built datacenters with Facebook-owned pipes on the bare metal, monitored closely by a dedicated ops staff, with many thousands - possibly even millions - of man-hours of experience and institutional knowledge on how to scale. And the scope of its usage is limited in comparison. Facebook still uses Thrift to use C++, Java, Hack/PHP, or others when it matters. It also has the manpower to kill path dependencies where it identifies a need, i.e. to rebuild some important piece of infrastructure (like comments) from the ground up, better, even when a lot of stuff depends on it. For startups, defaults matter much more. "The normal way of doing things" matter a lot more. Previous implementation decisions matter a lot more. With fewer engineering resources at hand, the impact of bad decisions is far scarier.
|
# ? May 17, 2014 17:32 |
|
SurgicalOntologist posted:For me, it has a backup implementation it used when I didn't have node. But then I installed node so I don't remember what it was. The output looked the same though. Yeah, pandoc I think, which wouldn't work without pywin32 either, and which for *reasons* I can't install. It just seems so asinine to me that ipython, which is otherwise a pretty decent project, would go about this bit in probably the most inconvenient way.
|
# ? May 17, 2014 17:49 |
|
|
# ? May 17, 2024 01:13 |
|
SurgicalOntologist posted:
Yes. If a third party can't verify it, it's not science.
|
# ? May 17, 2014 18:13 |