Coding Horrors: You can gather all your technical debt into one easy framework!

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Coding Horrors: You can gather all your technical debt into one easy framework!

«‹›3 »

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

such a nice boy posted:

What's wrong with that code? Is it just that
code:
if (_a > _b)
would have been better?

That's not exactly the same... he would need something like

code:

(abs(_a) > abs(_b)) && (sign(_a) == sign(_b))

although your code would work for the unsigned case.

# ¿ Mar 22, 2008 04:19

Adbot: ADBOT LOVES YOU

# ¿ Apr 28, 2024 20:57

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

That Turkey Story posted:

(in other words he thought return 1 + 3 could potentially behave something like (return 1) + 3 where "return 1" returned the value 1 and had a value in the encapsulating expression of 1)

In Haskell, this is broadly how the order of operations actually works, because return is actually a value-lifting monad constructor function and function application has higher precedence than any infix operator.

# ¿ Apr 3, 2008 16:49

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

Standish posted:

So exactly how insecure are keys generated by Debian, is it actually a realistic vulnerability or is it a case of "should have taken billions of years to brute-force, now takes thousands of years"?

It is a very real and serious vulnerability; keys built on Debian systems over the last two years have not had any realistic source of entropy.

# ¿ May 13, 2008 18:56

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

Standish posted:

Yeah but what exactly are the risks and attack vectors and how easy are they going to be to exploit, for example if I have a server which isn't affected by the bug itself but has a couple of pubkeys in authorized_keys2 which might have been generated on someone's Ubuntu desktop machine, is that server going to be part of a botnet by the time I get into work tomorrow morning?

On Debian and Debian-derived systems, keys were being generated over a period of two years by a very large number of people, with a small-to-nonexistent entropy source. It is highly likely that there are currently people with deployed keys that are identical to your own. It's a little unclear how small the entropy pool actually way; I'm currently looking at some evidence that it may be as small as 32 to 48 bits. An initial guess was that there were a total of only 262144 keys generated over that entire period - 18 bits (!!!) - and the pool does appear to be heavily biased towards these keys on x86, at least.

It might not be this bad; but the way it looks right now is scary.

# ¿ May 13, 2008 20:02

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

That Turkey Story posted:

I completely disagree with this, unless you mean checks that only exist when compiling a debug build such as an assert. If by context your data can never be in a particular state, checking for that state in a release build every time that bit of code is run is just silly. For example, if I'm writing an iterator for something like a linked list, I'm not going to check at every increment or decrement if the pointer I'm dereferencing is NULL. Write proper unit tests rather than flooding your applications with redundant checks that you often can't do anything appropriate about if they fail anyway. If you really want to just inform the reader of the code that the data can't be NULL there as you claim, then use a comment rather than introducing a runtime check.

Why? If you're about to dereference a pointer in a dynamically-allocated structure, then there's a very good chance you've lost memory locality and are about to stall on memory access anyway, so the check is essentially free. It inflates code size slightly, so I wouldn't do it inside a tight loop, but I'd be trying not to traverse large structures in a tight loop anyway.

# ¿ May 14, 2008 07:17

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

Steampunk Mario posted:

So you might prefix something with 'idx' or some such to indicate that it's an index

The only reason that isn't type information is because languages like C don't have strong typing on typedefs. Being able to do typedef unsigned int index_t; typedef unsigned int age_t; and have the compiler warn when you use an index_t where it expects an age_t would completely remove even that flimsy justification for Hungarian notation, as you would encode all of your "usage context" in the type system where it belongs.

# ¿ Oct 4, 2008 07:54

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

rjmccall posted:

Haskell makes that substantially easier, but converting between types is still a pain in the rear end.

I've never found converting between types in Haskell to be very painful - I just pattern match to get at the contained type I care about, or make my newtypes instances of the typeclasses I care about.

But we're getting off-topic here. To contribute, I'm dealing with code generated by automated proof systems that is sort of painful. Here's a short example, the extracted definition of sum over a list of natural numbers:

code:

sum :: (List Nat) -> Nat
sum l = foldr plus 0 l

Reasonable so far, it's folding addition over the list to make a sum. Let's see how foldr is written...

code:

foldr :: (a1 -> a2 -> a2) -> a2 -> (List a1) -> a2
foldr f a0 l = list_rec a0 (\a l0 iHl -> f a iHl) l

Okay... there's some sort of list recursion primitive or something, and it's passing in the base-case value and what appears to be an inductive rule that takes three parameters and applies the folding function to two of them...

code:

list_rec :: a2 -> (a1 -> (List a1) -> a2 -> a2) -> (List a1) -> a2
list_rec f f0 l = list_rect f f0 l

list_rect :: a2 -> (a1 -> (List a1) -> a2 -> a2) -> (List a1) -> a2
list_rect f f0 l = case l of
  Nil -> f
  Cons a l0 -> f0 a l0 (list_rect f f0 l0)

Two functions later, we finally discover the actual recursive step. Apparently those mystrey parameters earlier are the current element in the list, the remainder of the list, and the accumulated value we've created so far working through the list. Also, for some reason there's a function stuck in the middle here that does absolutely nothing.

I'm sure all these layers of indirection represent something useful within the context of the proof-checker, but when the code it's spitting out is so far removed from what any human would write even on simple examples like this, it's very hard to correctly use extracted code on complicated models, where it would be most useful.

# ¿ Oct 4, 2008 09:46

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

Steampunk Mario posted:

I'm not some hungarian evangelist - I'm just saying its current usage is unambiguously worthless, but its original intent had potential to make more readable, self-documenting code. Not that it matters since most of us don't get to decide on the coding style for our particular codebase anyway.

"Purpose and usage" of a variable are a kind of typing; even if the compiler doesn't enforce them, typedefs and the like are a better way to encode that information than naming conventions. Hungarian notation sort of vaguely makes sense if you are stuck with a type system less adequate than C's and don't even have something like a preprocessor available; even then, I would advise changing toolchains or supplying descriptive comments at variable declarations instead.

# ¿ Dec 29, 2008 07:31

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

twodot posted:

Hungarian notation isn't anything more than a standardized method of encoding information about a variable into its name. This is something that is always done when a variable name is created, standardizing it isn't intrinsically bad.

Types aren't anything more than a method of encoding information about a variable in the same place the variable is created. They have many advantages over name-based methods: they are easier for tools to parse and reason about, many languages feature some form of checking that expressions are made of compatible types, many compilers are able to use them to decide on optimal storage representations, and you don't have to retype them over and over whenever you use the variable.

But we're getting way afield of the topic of the thread; if someone wants to debate this more, maybe they should start another "hungarian notation isn't that bad really" thread for it.

# ¿ Dec 29, 2008 09:07

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

nebby posted:

I'll admit I'm naive here, but I think the weakness of MD5 is a little more constrained than that at this point. Maybe someone who knows more about it can chime in, but I thought the general state of MD5 is "if you're using it today it's not quite an open door yet, but you should think about migrating" not "oh poo poo everyone who has MD5 hashes in their database is hosed!"

At present, MD5 is only publicly known to be subject to an extremely easy collision generator attack. This attack is very fast and bodes very poorly for the health of the algorithm; it would not be beyond belief for governmental bodies to already have stronger attacks.

There are real-world applications for which MD5 is currently completely compromised by the present attacks (third-party signatures being an incredibly relevant example, see recent X.509 CA shenanigans), but traditional first-party signatures and password hashes are not known to be broken. It is probably best for everyone to migrate away from MD5 for security purposes at this point.

# ¿ Jan 10, 2009 08:44

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

Ryouga Inverse posted:

There is literally no, I repeat, no way that you can defend against someone who has stolen your db. If someone has hosed your poo poo that thoroughly you can't stop them from logging in as anyone.

Uh, this is blatantly not true. There are ways to verify that a party holds a piece of information that you do not have. Trivial passwords, for example: A server stores a hash of the password, the client sends the password, the server verifies that it has the expected hash. If an attacker has the hash, he needs to solve a cryptographically hard problem before he can generate a token that will authenticate him to the server.

# ¿ Jan 11, 2009 06:08

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

hexadecimal posted:

Can you talk about the md5 exploit? Given an md5 hash, how hard is it to generate a string with a certain size limit that will generate same md5 hash? If that is no longer a hard problem, then if one stole your DB including md5 hashes, you are pretty much hosed right?

With the current public state of the art, given an MD5 hash, it is computationally infeasible to generate a preimage. However, it is strongly believed that this will change within the next few years, and there may be parties with nonpublic information (such as large governments) who can already form preimages with tractable complexity.

Edit: And yes, once preimages become tractable, MD5 hashes of passphrases are no longer secure. Salting, length-limiting, and character-set-limiting all help a little here, but not enough to be realistic deterrents.

# ¿ Jan 11, 2009 06:18

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

hexadecimal posted:

What is, in your opinion, the best hash algorithm right now that is not likely to be broken in the future?

Currently I would design to a SHA-2 family hash, planning to migrate to SHA-3 when it is standardized. But I'm a traditionalist, and tend to avoid the lesser-used hash functions simply because they have seen much less cryptanalysis, not because I have any reasonable doubts of their security.

# ¿ Jan 11, 2009 06:21

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

Sorry for responding out of order; I missed this post somehow.

Ryouga Inverse posted:

Well, except then you can't fight network sniffing.

Trivial password authentication is vulnerable to sniffing, yes. If you can perform computations on the client you can get around this several ways: Either wrap the conversation in SSL or a similar transport security mechanism, or execute a protocol such as A-EKE to perform a zero-knowledge password proof. It's a reasonably well-understood problem in the field to authenticate using a passphrase such that the verifier does not know the passphrase, the conversation cannot be usefully intercepted or proxied, and authentication to a false verifier does not compromise the secret.

# ¿ Jan 11, 2009 06:32

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

zergstain posted:

I found a paper on A-EKE, and if it was implemented over HTTP, I'm still not sure how it wouldn't be vulnerable to the same kind of attack, unless it was digitally signed or something, and how do you do that kind of stuff without using HTTPS? I tried to find a web-based implementation of this so I could gently caress around and see where I'm wrong, and I have to be, as this poo poo was written by people way the gently caress smarter than I am.

I don't understand why you think HTTP is relevant. I mean, it's not like it's magically different from every other communication channel.

EKE and A-EKE are essentially digital signature algorithms, designed to operate on key material rather than static data. They are used in key-setup systems like SSL; by analogy, they operate by mutually establishing identity by way of a shared secret as is done at the start of an encrypted connection, proving to both parties that the session was established correctly, then discarding the generated session key and returning to unencrypted communication.

# ¿ Jan 11, 2009 08:49

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

zergstain posted:

By HTTP, I meant in a browser via html/javascript.

You'd have to use AJAX or something to handle multiple-exchange authentication techniques, but other than that there's nothing particularly interesting.

zergstain posted:

The paper didn't explain the implementation of the predicate T(H(P),F(P,K),K) where H and F are one-way hash functions and K is a session key, and why F(H(P),K) won't work. How does the server verify F(P,K) without knowledge of P?

The three-way predicate doesn't work with traditional hashing functions. The idea is that given that the server only stores H(P) and is sent F(P,K) from the client, it should be able to combine them somehow to arrive at some G(K) which it can then verify, confirming that the F(P,K) it was sent could only have been made with full knowledge of P and K simultaneously. If the server was only sent F(H(P),K), then the client could have a precomputed H(P) (for example, by gaining readonly access to the server's database).

The most common way I've seen it implemented is with factor-based public keys derived from the passphrase instead of from a random generator; when the account is created, the server memorizes the public component of the generated key and uses that to verify the augmentation signature, while the client regenerates the private component of the key every time the user logs in, by repeating the calculation to derive it from the passphrase. There are alternatives, such as composable hash functions (which are really incredibly hard to make secure). Or passphrase partitioning methods, which require a longer passphrase but can provide similar protections. The augmented SPEKE algorithm family provides a similar solution to A-EKE, but using different primitives you may find easier to understand.

# ¿ Jan 11, 2009 18:14

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

zergstain posted:

The private key is derived just from the passphrase, not the passphrase and the session key, correct?

The purpose of the three-way predicate is to make the final authentication step depend on both the passphrase and the session key. If this isn't done, there's a MITM attack on the second phase of the authentication.

zergstain posted:

Can neither H nor F be tradition hash functions?

As far as I know, there isn't a system with the right algebraic structure that has either of those traditional, but I've never really thought about it.

zergstain posted:

This all sounds like it would be really slow if it were done over javascript. I'd hate to wait 5 minutes to login to some site because of a combination of waiting for next part to come back from the server, and the computations being done over slow-rear end javascript.

In C implementations the same number of network trips are required, and typically EKE protocols finish in under a few hundred milliseconds. Even full-on S-DH which has to have blinding delays added in for a secure implementation will take less than a second from start to finish. Javascript may be slow, and HTTP may impose a lot of per-message overhead, but it's seriously not going to take that long.

Edit: A far more significant problem would be that you'd be implementing a crypto protocol by hand, which Javascript or no is hand-down one of the single dumbest things you could ever do when programming and there is pretty much never, ever a good excuse for.

ShoulderDaemon fucked around with this message at 22:40 on Jan 11, 2009

# ¿ Jan 11, 2009 22:38

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

crazypenguin posted:

Without SSL you have no assurances you're talking to who you think you're talking to. An attacker just goes from sniffing to injecting a different .js file that "accidentally" broadcasts your password in plaintext. It prevents passive attacks, but not active ones.

Of course, just about every website in existence puts their login prompts on unsecure pages and just submits to secure ones, which completely destroys the point, and you might as well be using javascript based encryption. (facebook )

Yeah, that is true; most of the protocols are only secure if you assume that the client is not actively trying to compromise itself, which HTTP is remarkably bad at.

# ¿ Jan 12, 2009 01:01

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

Sartak posted:

That was my original solution too. It's fine for positive integers, but if you want to match negative numbers or rationals then you need something more complex. I'd just use Scalar::Util::looks_like_number which also matches scientific notation.

They multiplied by 1 because strings that do not look like numbers evaluate to zero in numeric contexts. So "post" * 1 is zero.

Honestly, as an experienced Perl programmer I'd probably use $var + 0 eq $var, which is only trivially different from the horror. It's fast, it's fairly clear what it's doing, it works for every number format that perl supports, and it doesn't pull in a module. If I was already using Scalar::Util for something else, I'd use looks_like_number unless it's doing something incredibly stupid.

# ¿ Jan 27, 2009 06:00

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

Sartak posted:

$var + 0 eq $var misses a lot of numbers you should probably handle, most importantly zero-padded numbers like 007.

I guess I'd specifically not want to handle zero-padded numbers because when I see those I expect them to be parsed as octal, and if the program isn't going to do that I'd prefer it to give me a format error rather than silently mis-handle the number. But maybe that's just me.

# ¿ Jan 27, 2009 06:35

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

Zombywuf posted:

amivalid@123.23.3

I see what you were trying to do, but for what it's worth, that is a valid email address. Direct delivery to IP addresses is acceptable, and that is a valid (although not normalized) IP address. No-one ever does IP address validation correctly.

# ¿ Feb 12, 2009 18:56

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

Zombywuf posted:

Also can 1.2.3.4 be a valid domain name?

Strictly speaking I think it can be a valid domain name, but none of the toplevels start with numerals, and I don't expect ICANN to ever change that.

# ¿ Feb 12, 2009 19:19

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

Erasmus Darwin posted:

I don't think amivalid@123.23.3 is direct delivery to an IP address. It's been a while since I read RFC822, but I seem to recall that you need to stick brackets around the IP address.

Yeah, I think you're right. Silly me.

# ¿ Feb 12, 2009 19:57

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

Ryouga Inverse posted:

What exactly is it about the for-switch loop that causes it to spread so widely?

People frequently internalize processes as a sequence of steps which are, mentally, highly related. When they code this, they promptly put down a looping construct, because it seems natural to code a sequence as a loop. Then, while realizing the loop, they wind up specializing each step, because the language they are in does not operate the same way their mental processes do. If they're not the kind of person who looks over their own code immediately after writing it to take out stupid mistakes, they will never notice that the for loop is now completely unneeded; it does not even occur to them that the loop in their mind does not correspond to a loop in the code unless they are prompted to reexamine their own code.

Edit: In my experience, this very commonly happens when the code is being written inside a function which will itself be called many times in a loop; the aspiring programmer is in a "loop mindset" of sorts and winds up duplicating looping constructs because they are unclear in their mental model of exactly what the control flow they want is. This seems especially to happen if the control flow is being changed because they are still revising their overall design.

ShoulderDaemon fucked around with this message at 02:30 on Mar 17, 2009

# ¿ Mar 17, 2009 02:28

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

Ugg boots posted:

How would you conditionally branch with goto?

Short-circuiting boolean operators.

# ¿ Apr 12, 2009 19:52

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

RussianManiac posted:

In any case the do/while(false) is not any less efficient than similar alternatives, and is about as easy to read/comprehend as anything else.

I disagree. Consider the relatively common problem of progressively constructing a resource, which can fail and different stages, requiring different amounts of cleanup. Furthermore, sometimes we can't detect failure without going a conditional or two deeper:

code:

do {
  // ...
  if ( ... ) {
    // ...
    if ( ... )
      break;
    // ...
  };
  do {
    // ...
    if ( ... ) {
      // ...
    };
    // ...
    if ( ... )
      break;
    do {
      // ...
      if ( ... )
        break;
      // ...
    } while (0);
    // cleanup 3
  } while (0);
  // cleanup 2
} while (0);
// cleanup 1

Notice that for any given break, we have to scan up to work out what do we're breaking out of, then scan down to find the matching while that we jump to. Because our first failure case is itself in some conditional code, the first and second break are at the same indentation level, even though indentation is the only direct indication of where the break is going to go; at a first glance, it is very easy to assume they are going to the same place. Finally, notice how the "normal" codepath gradually moves farther and farther to the right.

In contrast, the version with gotos:

code:

  // ...
  if ( ... ) {
    // ...
    if ( ... ) {
      goto CLEANUP_1;
    // ...
  };
  // ...
  if ( ... ) {
    // ...
  };
  // ...
  if ( ... )
    goto CLEANUP_2;
  // ...
  if ( ... )
    goto CLEANUP_3;
  // ...
CLEANUP_3:
  // ...
CLEANUP_2:
  // ...
CLEANUP_1:
  // ...

It's very clear at each failure condition exactly where the control flow moves. The normal codepath is left-flushed and isn't interrupted by do/while structures.

And this isn't even considering if there are legitimate uses for do/while in your algorithm, which would make the dual use of do/while for exceptions even worse for someone reading the code.

# ¿ Jul 27, 2009 04:14

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

Zombywuf posted:

It is demonstably false, as can be seen from making the set S the union of the set of integers and pointers.

But if we take S as the union set, then we have to admit + as a partial function, as (pointer + pointer) is a nonsense operation.

# ¿ Sep 20, 2009 20:29

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

wrok posted:

That's bullshit. They're only a mindfuck because you're* coming from a whole lifetime of being drilled with procedural stuff. If you did not grok programming at all you'd probably find functional stuff MUCH more welcoming, logical and sensical than 'traditional' procedural programming.

As someone who's first real introduction to programming was declarative and functional, I'll stand by this statement here. I still don't really understand why anyone would ever choose to use object-oriented programming unless it's all they know or they were somehow forced to, but I've learned this isn't an argument I'm likely to win.

# ¿ Dec 1, 2009 02:20

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

Free Bees posted:

I couldn't tell you, but it's a sentiment I've heard from other people stuck firmly in the static typing world.

Most of the big type inference systems prior to C# used the Hindley-Milner algorithm, which chooses most-general types. All of the programmers used to it have grown to expect type inference to work in a certain way, as a result. So when C# introduced its alternative type inference algorithm, it managed to simultaneously anger the non-type-inference crowd who think it's spooky witchcraft that is somehow "weak" or "dynamic" typing AND anger the type-inference crowd who get irritated that they still have to include some type annotations to make their program typecheck, and they don't have a good intuition for why those should be required. The end result seems to be a feature that is only used in relatively few simple cases, as neither group really trusts it to do what they want.

# ¿ Jan 28, 2010 19:39

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

jandrese posted:

Isn't that kind of like asking "Why can't we make a compiler that takes a statement in the form of 'I want a calculator that looks like a TI-85 but operates in base 11', and make it for me? Why do I have to write all of this 'code' stuff?"

Haskell compilers seem to manage it nowadays, although they don't scale quite as well as you might hope because there's still a fair amount of contention inside the runtime.

Frankly there just isn't as much parallelism in most algorithms as some people think before you have to go speculative anyway. The programmer still has to be intentionally writing parallel algorithms if they want to scale up well. They just don't have to bother with creating threads or passing messages or all that muck.

# ¿ Feb 1, 2010 21:38

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

Janin posted:

Haskell compilers don't parallelize automatically; the programmer has to provide annotations to describe which sections should be parallel.

You can provide annotations, but the modern GHC runtime will create sparks on its own if you don't.

Janin posted:

Apparently when they tried to do it automatically, it was really tricky to figure out what computations took a lot of time, so GHC would spawn way more threads than is useful.

Yes, it is really tricky, and ghc does not do an amazing job. The modern implementation uses a fixed number of threads, typically equal to the number of cores on the system, and every thread performs the highest unclaimed strict block it can on the code graph, resulting in speculative parallelism that preserves sharing and benefits well-enough from strictness analysis and fusion which we generally believe we can do well. It has real contention issues, and probably needs to move to something like workstealing with a queue-per-thread, but it's slowly getting better. Recently, for example, garbage collection was restructured to be a mostly per-thread activity instead of taking a global lock every time.

# ¿ Feb 2, 2010 02:44

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

Janin posted:

How modern do you mean? The GHC 6.12.1 docs indicate that parallel code must still be explicitly annotated using the par and pseq combinators, and I haven't heard anything about it being included in 6.14.

I'd like to play around with it, but the only links I can find are basically "neat idea; doesn't work yet".

As I recall, you need to be running PARALLEL_HASKELL and not just the threaded GHC runtime, which means you may need to do a little source-diving and recompile to get your toolchain ready. At that point, any legal thunk should be a potential spark. You'll need to ask the RTS to use more than one thread (obviously) to get actual parallelism.

And yeah, it doesn't really work yet. Last time I seriously played with it was right before the 6.10 release when I was trying (and failing) to make the locking overhead marginally lower. The sparks are really small unless your algorithm is massively-serial, and if it is they don't help anyway because they tend to speculate randomly and just waste memory. At the time I was observing <10% speedups at -N2, and no gains for -N3 and higher, although that was before the garbage collector was fixed.

# ¿ Feb 2, 2010 03:24

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

Mustach posted:

I suspect that this is a horror, but my knowledge of hashing isn't strong enough that I can prove my hunch, so I ask CoC for some assistance:
code:
// Java
public int hashCode(){
  return (int)(field1 | field2);
}
Both fields are longs. If this is as bad as I think it is, please give me as many reasons as you can think of.

If both field1 and field2 are uniformly distributed over the range of an int, and they are uncorrelated to eachother, then field1 ^ field2 would be an acceptable quick-and-dirty hash function. Using field1 | field2 even in that optimal case will be strongly biased towards returning 1s in the hash, giving a very poor hash function. If field1 and field2 are not uniformly distributed, or are correlated to eachother, then either approach will probably give a poor hash.

# ¿ Mar 3, 2010 20:10

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

GrumpyDoctor posted:

I hope this is just a typo!

Haskell can make 2 complete turns.

# ¿ Apr 15, 2010 10:01

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

Avenging Dentist posted:

Neither does any feature once you have Turing completeness.

This isn't strictly true, as there are programming language features which are not related to the primitive computational model, and thus aren't always reachable through pure Turing programming. For example, the SK combinator language is Turing-complete, but has no ability to perform IO with the outside world, so adding additional combinators which allow IO is a genuine increase in expressivity.

That said, lambdas are not such a feature for Java.

# ¿ May 30, 2010 21:28

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

shrughes posted:

What features are these? I was going to assume that you're a moron, but first, I'd like to know how you are a moron.

Isn't it obvious? Mutable references, looping constructs, and non-Church-encoded data structures. They're clearly just pandering to the hardcore iterative assembly programmers, without any thought given to the added complexity forced upon your average, everyday pure functional programmer.

# ¿ Jun 1, 2010 21:21

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

Janin posted:

Strictness optimizations like foldl' vs foldl aren't part of the compiler, though; there's no magical transformation which will make "sum = foldl (+) 0" work, because the definition of foldl precludes them.

You are aware that sum is defined as a left fold, and does in practice wildly change its behaviour depending on if optimizations were enabled? Strictness analysis and stream fusion are extremely general optimizations and Haskell very much depends on them in many common cases. The foldl to foldl' transformation is something that ghc really can do all on its own.

Janin posted:

It's not like somebody ever looks at a piece of Haskell code and wonders "gee, how much memory will this use?", and it certainly doesn't depend on compiler optimizations. The only compiler optimization which can change big-O behavior is stream fusion, which is performed only when requested by the code.

Ahaha wondering about the memory/time complexity of various pieces of Haskell is how I spend most of my days. When you have a long-running program that runs in constant memory on ghc 6.10, but grows at a constant rate in memory on ghc 6.12, it's a bit of a mystery. That turned out to be the common subexpression optimization getting a little bit more capable and deciding to never throw away some results that it might need in the future. Of course, turning off CSE resulting in both 6.10 and 6.12 never completing the task because I was missing a single let-binding I needed and without CSE the program wasn't sharing that value and would have to constantly recompute it.

I've worked with Haskell for years and I am still regularly surprised by various memory/time properties of large pieces of code. I can generally explain them eventually, and the profiling and debugging tools have been getting a lot better at helping programmers answer questions about performance, but it's still extremely hard to develop a good intuition for performance in Haskell.

Janin posted:

Most of the features you complain about are compiler-specific extensions, or have been removed from the language. This is the actual list of features, which I don't find particularly complex:

quote:
do notation, list comprehensions, pattern guards

do-notation is brain-dead simple to understand. List comprehensions and pattern guards are a bit more unusual, but hardly anywhere near LINQ.

Well, n+k patterns, bang notation in types, lazy patterns, and scoped type variables are all Haskell98. And they're all at least marginally crazy. Not to mention that the layout rules, especially when combined with do notation, are not all all predictable for new users to the language; there's no way you can look at the syntax for a if statement in a do block and tell me that the right decision was made there.

And if you want to use the extremely common State monad, you need to deal with undecidable/overlapping instances and functional dependencies. If you want any of the generic programming stuff, you'll probably want standalone deriving support. If you do anything with large data types you'll most likely find yourself using either record punning or GADTs (or both!). Programming Haskell without extensions in the current community is a joke.

# ¿ Jun 2, 2010 17:52

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

Janin posted:

'sum' is only defined as a left fold in the H'98 report -- there's no reason to implement it using foldl, and I'm not aware of any compiler which does. Almost any Haskell code will break if optimizations are disabled, because so many are required for usable space/time performance.

You have to implement as something equivalent to foldl, because otherwise it has the wrong strictness properties - in particular, compilers are required to support Num instances with non-strict (+) operators correctly, even though there aren't any such instances in common usage. The optimization is just a special-case based on feedback from the strictness checker.

And yeah, optimizations change the correctness and complexity of Haskell programs, which is sort of what I was complaining about here. It should be possible to determine the correctness of a program independently of whatever magic your compiler does behind your back.

Janin posted:

This is more of a bug in the compiler implementation (which is chock-full of them). GHC, in my opinion, shouldn't be performing any CSE until they can figure out how to make it work (they never will).

Granted. The various automatic inlining rules are also chock-full of this kind of crap that they will never be able to make work "correctly".

Janin posted:

Scoped type variables and bang notation are GHC-specific language extensions. Lazy patterns are simply part of pattern matching; there's nothing complex there. n+k patterns were removed from the language.

Nope. Creating typeclasses or typeclass instances creates local scopes for some type variables which are larger than a single declaration, although fortunately not in a manner which is particularly likely to be confusing. Bang notation in constructors is Haskell98. n+k patterns are Haskell98, and if you get to dismiss them for not being part of the next version of Haskell, then you also have to acknowledge all the extensions that will be part of the next version of Haskell.

Janin posted:

Could you elaborate on what's wrong with do-if syntax? I find it quite readable, especially compared to (for example) Ruby or LISP.
code:
foo x = do
    startFoo
    if someTest x
        then fooLoudly
        else fooQuietly
    doneFoo

This is a syntax error in Haskell98:

code:

foo = do
  if x then
    actionOne
  else
    actionTwo

Programming languages should not forbid you from using the if-then-else indentation style that 99% of programming languages in the world use, especially if they only forbid that syntax in some constructs while allowing and encouraging it in pure code. Fortunately, this is being corrected.

Janin posted:

'State' doesn't require any language extensions, and certainly not anything as dangerous as undecidable/overlapping instances. You'll need type families to define a 'MonadState m' class/instance, but those can be understood after about a minute of reading.

You can't write automatic instances for MonadState of the form "(MonadState s m) => MonadState s (t m)" for some transformer t, which are the most common form of instance you will want to write if you are building transformer stacks, without undecidable instances. You can't build or import the mtl on a compiler which does not allow undecidable instances or overlapping instances, because the mtl itself creates a number of these instances so that things like RWS work correctly. In short, because data families weren't introduced early enough, many of the more critical modules in the Haskell world now depend directly or indirectly on unsafe bypasses of the type checker.

Janin posted:

Generic programming in Haskell is a catastrophe; from where I stand, it looks like a bunch of type-system wankery and bad ideas. Anybody trying to implement it in Haskell via a half-dozen unsafe extensions deserves their migraines.

Granted, but you frequently cannot escape it if you are using other peoples' code.

# ¿ Jun 2, 2010 19:53

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

Athas posted:

It is a snippet of Bourne shell script for checking whether a tarball ($1) extracts to more than a single file/directory.

I do not think your snippet works on this.

# ¿ Jun 7, 2010 22:00

Adbot: ADBOT LOVES YOU

# ¿ Apr 28, 2024 20:57

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

Jabor posted:

Perl is designed so that common tasks can be achieved really easily, even if it results in unintuitive behaviour when taken out of context.

So in this case, instead of being erroring out when you compare a string to a number, it converts the string to a number and then compares them. If you want a string-based comparison, convert the number to a string and then compare.

Perl requires the programmer to explicitly specify if they want numeric comparison or string comparison for each and every comparison. It never decides based on the values that happen to be being compared, and the programmer can always look at any given comparison and say if it will involve coercing the values to numbers.

Jabor posted:

Perl also assumes that the programmer knows what they're doing - if you convert a string that is only partially numeric to a number, it converts the numeric part and ignores the rest. If you want to do something different if the string is not just a number, explicitly test for that.

When Perl converts a string to a number, if the conversion discarded some of the string (execpt in a small number of very specific cases) it will throw a warnings, rather than proceeding silently.

What PHP is doing is very nearly completely indefensible. If you have a comparison of two variables in PHP, it is very difficult (if not impossible) for the programmer to look at that comparison and decide which of the byzantine comparison rules will be used, as it is heavily dependent on the internal representation of both of the values involved. Because equality in PHP is non-transitive, this is potentially a correctness-altering property, and it's incredibly weird that there is action at a distance here.

# ¿ Sep 26, 2010 01:46

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Coding Horrors: You can gather all your technical debt into one easy framework!

«‹›3 »