Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Foxfire_
Nov 8, 2010

British Fact: They decided to add a y to the word 'tire' in the 1900s because they though it looked cool.

Adbot
ADBOT LOVES YOU

Foxfire_
Nov 8, 2010

code:
// This place is not a place of honor.  
// No highly esteemed deed is commemorated here… nothing valued is here.
//
// What is here is dangerous and repulsive to us. 
// The danger is still present, in your time, as it was in ours.

Foxfire_
Nov 8, 2010

auto* = "Deduce me a type and it should be a pointer or I screwed something up"

It may not be parallel to the other constructions depending on your point of view, but it's certainly useful for making code more readable and less error prone.

Foxfire_
Nov 8, 2010

The types seem appropriateish to me if the function is "Take a int-sized value yanked out of dwarf fortress and test if it has one of these known enumeration values". Casting to an enum would be worse if the code doesn't actually know the full range of possible values DF might put in it. Better to have an int than have an enum that might hold non-listed values.

Foxfire_
Nov 8, 2010

A tuple is usually just shorthand for a perfectly ordinary type. If I have a C++ function that takes a std::tuple<int, float, string> parameter and I try to call if with a std::tuple<int, float, float>, I'll get a compile error because those types don't match. I could have just as well defined defined a struct type with those fields, the std::tuple type is just shorthand.

C/C++-style variadic functions defer the type-checking to runtime, if it exists at all. If I call printf("int=%d, float=%f, string=%s\n", 5, 1.0f, 2.0f), I won't get a compile-time error* and in C/C++, there won't be a runtime type check, it's just undefined behavior. It acts something like a python-esque language where there's no static type checking, but also without guaranteed runtime errors for wrong types

Java printf() effectively takes an array of Object, just with some syntactic sugar to create the array and box anything that wasn't Object already. That only works since it has a conversion from any type -> Object, and all Object have a ToString(), which is all printf() needs to use. You can't make a function that takes any number of nonhomogeneous types (i.e. something like python's struct.pack() that that takes a format string and some data and returns a byte array with those values packed into it)


* for printf, you probably do get a warning on modern compilers, but that's just because they have special case code to recognize printf() formats specifically.

Foxfire_
Nov 8, 2010

I was being dumb above; Java gets variadic's of heterogeneous types exactly the same way python does:
- Take all the excess parameters and dump them into a single array of generic objects
- Do a runtime type test when they are accessed and fail then if they're wrong

It abandons static type safety just like the C version (python never had static safety to begin with). C/C++ doesn't have a way to implement the runtime type test and has uglier syntax, but is otherwise the same.

How do Rust/Haskell/etc.. that have static type-safe variadic's work?

Foxfire_
Nov 8, 2010

That seems like a good solution; no variadics in the actual language, only in a not-sucky metaprogramming layer (where the metaprogramming's run time is the overall program's compile time)

Foxfire_
Nov 8, 2010

The clang justification for that optimization is some amusing pedantry.

- When compiled without ffreestanding, clang knows what malloc() does.
- Specifically, it knows that that its specification does not require any visible side-effects
- It is therefore permissible to swap in a different malloc()/free() implementation for that one pair.
- If it can prove you never access through the returned pointer, the new implementation never actually needs to allocate memory
- New implementation never fails on any size allocation so it never returns NULL

Foxfire_
Nov 8, 2010

That code seems fine to me. The integer is not a pointer.

Original tweet is 'could be called' because there is no guarantee that the two calls return different values.

code:
void Foo()
{
   void* a = malloc(10000000);
   void* b = malloc(10000000);

    uintptr_t a_num = (uintptr_t)a;
    uintptr_t b_num = (uintptr_t)b;

    free(a);
    free(b)
    
    // a&b are dead, a_num & b_num are not
    
    // These ones are UB though
    // uintptr_t bad = (uintptr_t)a;

    if (a == b)
    {
         printf("dickbutt");
    }
}
is still permissibly optimized into a no-op though.

Foxfire_
Nov 8, 2010

xtal posted:

Strictly UB yes, not because it's guaranteed to compile or not, but just undefined. Imagine if you were reading bytes from urandom instead of memory addresses. Depending on the way you use them the program may still work, but converting random bytes to a string doesn't make them more meaningful, they're still opaque to you.

edit: efb but also glad someone else compared to randomness

The program has some unpredictable output. It does not have undefined behavior.

code:
#include <cstdio>
#include <cstdin>

int main()
{
    void* foo = malloc(4);
    uintptr_t bar = (uintptr_t)foo;
    free(foo);

    printf("The pointer was %d\n", (int)bar);
    return 0;
}
is a well defined program. It will print something of form 'The pointer was <some number>\n' then terminate with exit code 0.

An undefined behavior program might:
- Also print that
- Crash
- Print nothing
- Infinite loop
- Return some other exit code
etc...

Foxfire_
Nov 8, 2010

yeah, if you stick a return nullptr at the end, gcc inserts the comparison.

Nothing to do with the malloc, it's going:
- Falling off the end of a function that doesn't return void is undefined behavior and I may assume it never happens
- Therefore the if can't ever be false because then I would fall off the end
- Therefore I can skip the comparison because I already know the answer.
- call call() [and it better never return]

If you swap the pointerintegers whose values came from some pointers test for a test on a unknown global bool, it will still be trimmed:
https://godbolt.org/z/tZ6Rwc


(also the void* return wasn't in the original tweet, I think you typo'ed it in)

Foxfire_ fucked around with this message at 04:22 on Jul 9, 2020

Foxfire_
Nov 8, 2010

more falafel please posted:

Is casting a pointer to an integer type ever not UB? What about casting an integer type to a pointer?

For a big enough integer, both directions are always defined. The values you get out may not be very useful though:

C++ 7.6.1.9 posted:

4) A pointer can be explicitly converted to any integral type large enough to hold all values of its type. The mapping function is implementation-defined. [Note: It is intended to be unsurprising to those who know the addressing structure of the underlying machine. — end note] A value of type std::nullptr_­t can be converted to an integral type; the conversion has the same meaning and validity as a conversion of (void*)0 to the integral type. [Note: A reinterpret_­cast cannot be used to convert a value of any type to the type std::nullptr_­t. — end note]

5) A value of integral type or enumeration type can be explicitly converted to a pointer. A pointer converted to an integer of sufficient size (if any such exists on the implementation) and back to the same pointer type will have its original value; mappings between pointers and integers are otherwise implementation-defined. [Note: Except as described in [basic.stc.dynamic.safety], the result of such a conversion will not be a safely-derived pointer value. — end note]

Language lawyering is fun!

Foxfire_
Nov 8, 2010

If -ffreestanding isn't specified, clang knows what malloc does.

The only effects of malloc that are specified are the pointer it returns and errno if it failed. A real heap implementation has to manipulate other state, but the caller can't see or rely on it (you can't legally expect that any allocation will always succeed/fail).

The 2nd pointer is provably never written through, so it is free to pretend that the malloc always succeeded, always failed, or sometimes succeeded.

Suppose you got an already-compiled copy of this program and all you could do was run it. Even if it did a real malloc and comparison, it would either run call() or not depending on whether malloc succeeded or not. You might suspect something after you run it a billion times and always see the same result, but maybe you're just very lucky. You can never prove it's not doing the real call.

Foxfire_
Nov 8, 2010

rjmccall posted:

Well, arguably the compiler isn’t allowed to optimize a call to malloc to fail if the actual malloc function might not have have failed. But otherwise yes.

code:
void* malloc(size_t size) 
{ 
    errno = ENOMEM;
    return nullptr; 
}
is a perfectly valid malloc implementation. It's one that even actually exists in practice in embedded where you do not have a heap but link to something that expects the symbol to exist.

Foxfire_
Nov 8, 2010

The Clang reasoning is:

- Command line switches are telling it to assume malloc/free come from libc and have known effects
- For the purposes of this particular malloc/free pair, use an alternative implementation that is also standards compliant instead of using the one in libc
- Specifically, this implementation will allocate memory from infinite magical space beyond time. It will never fail. This is compliant with the API & standard (but not actually implementable). It is analogous to doing something like swapping a strlen() implementation for a compile-time one or swapping malloc for one that allocated on the stack instead
- Now optimize based on that
- Optimization eliminated all actual uses of the magical malloc, so it doesn't matter that it couldn't have actually been implemented

Foxfire_ fucked around with this message at 00:49 on Jul 11, 2020

Foxfire_
Nov 8, 2010

code:
void hmm(size_t sz) {
  void* p = malloc(sz);
  uintptr_t ip = (uintptr_t)p;

  func(p);

  void* q = malloc(sz);
  uintptr_t iq = (uintptr_t)q;

  if (ip != iq)
    call(p);
}
- Pointer from 1st malloc escapes and is potentially written through.
- 1st malloc must be real libc implementation, so it may fail
- Pointer from 2nd malloc never escapes. It is never written through
- Implement the 2nd malloc with magical never-fail implementation
- Live pointers that are returned from malloc are never equal, so the comparison is always true and can be removed

code:
void hmm(size_t sz) {
  void* p = malloc(sz);
  uintptr_t ip = (uintptr_t)p;

  func(p);

  void* q = malloc(sz);
  uintptr_t iq = (uintptr_t)q;

  if (ip != iq)
    call(q);
}
- Pointer from 1st malloc escapes and is potentially written through.
- 1st malloc must be real libc implementation, so it may fail
- Pointer from 2nd malloc escapes via call(). call() may write through it.
- 2nd malloc must be real libc implementation, so it may fail
- Comparison may or may not be true, so must do it


e: I would also say that this isn't a horror, despite it being complicated to figure out why this transformation is legal. It's hidden in the compiler output. If all you had was the C code and running the program, everything would work like you expected. The only possible exception would be "Why doesn't my program fail when I try to allocate (but not use) more memory than I have?"

Foxfire_ fucked around with this message at 02:35 on Jul 11, 2020

Foxfire_
Nov 8, 2010

Another way to think of it: Everywhere a malloc/free pair appears in the program, the compiler gets to pick an implementation to use. It doesn't have to make the same choice everywhere.

Some of its options:
1) normal malloc that can fail
2) stack malloc that points into the current stack frame and free() does nothing
3) fake malloc that doesn't do anything besides returning a unique pointer

(3) can't be used if anything actually accesses through the pointer, but is okay otherwise. You could think of it as a transformation of (2) where there was a never-touched stack-allocated backing array that got elided afterwards.

If it can prove that (3) or (2) are safe, it will prefer those since they are simpler, faster, and can lead to further optimizations.

The malloc specification doesn't forbid it from succeeding 100% of the time. Subbing in a malloc that always failed would be technically valid too, it's just not useful.


If you consider malloc (2) or (3) first, snip all the unreachable code because it can never fail, then check if anything ever tried to access through the pointer, nothing does. So you don't need to fallback to malloc (1).

Foxfire_ fucked around with this message at 04:39 on Jul 11, 2020

Foxfire_
Nov 8, 2010

Yes. If you have some code to handling malloc failing, and it picks an implementation that never does, that code is unreachable and can be removed.

The way this is actually useful is for code like
code:
void Foo()
{
    void* ptr = malloc(sz);
    if (ptr == nullptr)
        // Handle allocation failure somehow
    else
    {
        // do a bunch of stuff that actually uses that memory
        // possibly across many function calls that got inlined together
        // fully of many branches, loops, etc..
 
       // But all paths eventually provably call
       free(ptr);
    }
}
can be turned into
code:
void Foo()
{
    char ptr[sz];
    // do a bunch of stuff that actually uses that memory
    // possibly across many function calls that got inlined together
    // fully of many branches, loops, etc..
}
where now it doesn't need to run heap code at all

Foxfire_
Nov 8, 2010

Tei posted:

I believe when you declare space instead of begging the OS for it, then you will get that space when the program load in memory. And it will be pages marked has data. You program when it load, will use several pages, most of them pages marked as code, but a few will be marked as data.

If you declare that way a giant array that don't fit in memory, I guess most OS will just fail to start the program. But a few may use a lot of virtual pages and let the virtual paging system deal with it.

Stack usage for a function happens when the function is called. Think about something recursive; maximum stack depth isn't known beforehand. If you can't map out the entire call stack at compile time (you usually can't, virtual calls make this hard even without recursion), you can't come up with a maximum stack usage.

Ideally, programmer choice of stack vs heap for a variable is dictated by its lifetime. Annoyingly, sometimes practical concerns limit this and lead to rules like 'big things in heap'.

- Very old operating systems would reserve real memory for some maximum stack size for every process. Every process always uses at least that much RAM, so it has to be small.
- Better things use virtual memory for it. When the program touches an address that's off the end of the stack, it page faults. The operating system allocates a physical memory page to those addresses, and retries the faulted instruction. If a process doesn't ever touch a lot of stack, no actual RAM is used. Big stacks are easy to do here.
- But if you add multiple threads with multiple stacks, it's hard again. Their virtual addresses have to be contiguous, so you've got to figure out how to arrange them in the address space so they won't bump into each other as they grow. In practice, applications tell the threading routines a max stack size to separate them by.

Detecting collisions is also hard since the application code is just manipulating the stack pointer and doesn't know anything about the overall organization. If you do an access like old stack pointer+50MB and that happens to collide with some other address allocation instead of being in unmapped space, it will just run and smash something without triggering a page fault.

Foxfire_
Nov 8, 2010

Yep

https://godbolt.org/z/daesqG

The stack starts at high addresses and grows down towards lower addresses. rsp is a register pointing at the top (lowest address) of the stack. At the start of the function, the sub rsp, 10000 is moving the stack pointer down by 10000 bytes to make room for the local array. Then at the end, it adds 10000 to it to pop the array of the stack

Foxfire_
Nov 8, 2010

Your colleague was right?

Things with stack-lifetime ought to be allocated on the stack. It avoids global locks, avoids fragmentation, can't leak, and is generally simpler.

The downsides to it I can think of are:
- It won't run on systems with low stack limits, which are common for historical reasons
- There's no interface for unmapping pages, so each thread will use it's high water mark for usage, even if that only happened briefly.

If you're making a numerical program to run on a specific system, it's a perfectly reasonable choice. Like if you need a couple hundred MBs of temp space total scattered across a bunch of functions to compute a timestep, taking it from the stack makes more sense than either doing a bunch of heap calls every step (forcing every thread to contend on a lock) or building some thing to cache memory allocations across timesteps. System administration shouldn't care about anything besides how much RAM you're using, not whether you call it heap or stack.

What problem is a low stack limit trying to solve?

Foxfire_
Nov 8, 2010

None, Report1, Report2, Both is fine since there's an argument for having an enum value always be one of the enumerated items instead of being a bitfield. Yes/No don't make any sense though.

Foxfire_
Nov 8, 2010

What did you need to use EBCDIC for?


IEEE-754 because floating point numbers do not behave like real numbers and many, many, many software bugs are from people thinking they do

2's complement you can get away with not knowing if you accept the min/max of signed integral types as being magic. It's still useful for things like looking at a binary file with a hex editor, looking at memory with a debugger, or doing stupid bit manipulation tricks in embedded.

Foxfire_
Nov 8, 2010

Jaded Burnout posted:

In the case of two's complement I'd say that making us learn how to actually do it with pen and paper was not the most valuable use of our time, but then again they thought that prolog was a good call for the main language we'd learn, so, academia.

Was the degree 'Computer Science' or 'Computer Engineering/Programming'? They aren't the same thing and any worthwhile CS program should include functional/logic programming. Saying it's not practically useful is like a statistician saying "I've never used group theory in my career, so it was a waste of time including it in my generic Mathematics degree". A CS degree is not a prep program for a job as a programmer.

(also two's complement is not a hard thing and shouldn't even take a full lecture. Yeah you can probably skip it and lots of people will be fine, but it's small enough to be weird. It'd be like skipping while loops in a programming class because you can get by with just for loops)


Jaded Burnout posted:

As for the specifics of floating point being useful all over the place, I mean, of the two, sure, but I've worked on two fully-fledged accounting applications that didn't require in depth knowledge of it, because that's a problem we solved a long time ago. The one and only time even a vague understanding of how floating point numbers are represented in software has been helpful was when I was writing a government's calculator for benefit payments, because there was a lot of division and rounding going on. Still, 15 seconds reading the docs on the language's stdlib for floating point was enough to cover all the bases.

If those programs usage of floats was anything besides, "Don't use them for anything besides display" (which is what accounting programs should be doing), they are probably either buggy in corner cases or functioning by happening to stay in a region where float vs real doesn't matter. Doing numerical things correctly with floats just inherently isn't a 15s type thing to learn. It's a thing like threading with deep dangerous waters and it's easy to make something that seems fine but is actually subtly wrong.

Foxfire_ fucked around with this message at 20:26 on Sep 28, 2020

Foxfire_
Nov 8, 2010

ultrafilter posted:

I agree with you completely but this is a bad example. "What is a group?" is basic math that everyone in a statistics department would be expected to know and it shows up often enough that you can't completely forget it.

I was thinking like someone working in drug trials, which I imagine is mostly knowing what tests to use and when, but I don't actually know anything about day-to-day.

Foxfire_
Nov 8, 2010

All integers between -2^53 and +2^53 are exactly representable in IEEE754 doubles, so integer math that stays in that range works out without rounding or precision loss. Some integers outside that range are also exactly representable, but there are holes

Foxfire_
Nov 8, 2010

JavaScript: The function named 'parseInt' returns a floating point number

Foxfire_
Nov 8, 2010

It should be though. How often does someone purposefully use JavaScript letting you not pass in all the parameters for a function compared to that being by accident? Less error prone ways to make optional parameters aren't a mystery

Foxfire_
Nov 8, 2010

JavaScript JITs are honestly a marvel of turd polishing. There's a huge amount of effort to smash its objects back into normal types, track the actual types of everything, and undo all the dynamic stuff the syntax lets you do.

Foxfire_
Nov 8, 2010

Tei posted:

And if a language theres 4 ways to do something, is actually worse than a language with 1 way to do something.

Despite this being a common sentiment, I don't think it's actually true. Having more than one way to do something allows you to pick a way that is more suitable and clean for what you are trying to do.

Analogy: If I want to put a screw in a board, having a manual screwdriver, an offset screwdriver, and a drill with a screw bit are going to let me do a better job than only having one way to do it.

Programming specific examples:
- for loops and while loops are redundant with each other. A language with both is still better and more clear than one with only one.
- If I want to make a TCP server in python, I can do it with bare sockets or its built-in socketserver library. I can pick the most appropriate method for whatever I am doing exactly.

Foxfire_
Nov 8, 2010

OddObserver posted:

What uses a register calling convention on x86 anyway? Or is it x86-64 using 32-bit registers (don't know what the ABI is for it..)
Anything where the compiler can see the code being called and the call site and decides it's a good idea. No rule that all calls to the same function have to go to the same implementation.

Foxfire_
Nov 8, 2010

Volte posted:

Does Windows 10 really use a hierarchical code signing model where any process that gets spawned by a signed binary is considered signed by the certificate that signed the parent executable?
It does not.
Leaving Cyberpunk aside, did you know that cmd.exe is installed on your computer right now, is signed by Microsoft, and will run whatever command you pass to it! :kingsley:

HACKERS! posted:

cmd.exe /C calc

Foxfire_
Nov 8, 2010

C# is willing to sacrifice ideological purity to be a more useful tool. That's not unusual, pretty much every general-purpose systems language makes the same decision. Even Rust lets you do the same thing.

Deffon posted:

Really disliked C# enums, it's a half-baked abstraction that's useful for c-api integration, but less useful for writing modern code.

While it's sometimes the best solution, it's not usually a good idea to create an enum with the idea of adding new values over time.
Adding new values is a pain because you have to update all call sites, some of which may be out of your control.

This is where interfaces really shine, all you have to do is create a new implementation, and you are required to implement all operations and they are all in the same place.
On the flip-side, adding a new operation to an interface is hard, and adding a new operation for an enum is easy.

:confused: enums and interfaces are completly orthogonal concepts that don't have anything to do with each other

Foxfire_
Nov 8, 2010

I guess you could try to shove all the logic that a thing uses into some interface'd thing, but it seems to me like it'd be usually be a mess and more confusing than just doing the simple thing. 99% of enums are going to be things like:

- This serial receiver has a state that is one of IDLE, RX_NORMAL, or RX_ESCAPE_BYTE
- This window is either MINIMIZED, MAXIMIZED, FLOATING, or FULLSCREEN
- The game's difficulty is one of EASY, NORMAL, or HARD

Doing something like having WindowState be an interface with functions for every operation you're going to ever do that cares about a window is going to be grouping logic from lots of unrelated things and make reading any particular thing a lot harder.

e:
I guess kind of are you more likely to be asking 'what does this particular thing do for various enumerated values?' vs 'how do the various enumerations change the entire universe of stuff?'. I could see the second being useful occasionally, like you might want to see all the code that changes in a game by difficulty in one place, but usually the first seems more likely.

Foxfire_ fucked around with this message at 00:19 on Feb 14, 2021

Foxfire_
Nov 8, 2010

Xarn posted:

And if the safe parts of your language then have to safeguard against bool having 256 potential states/representations, you are doing it wrong.
That's impossible to avoid. If you could algorithmically prove that some unsafe construct didn't break any other state and always produced valid bit patterns in everything it touched, it wouldn't need to be an unsafe construct to begin with.

It's also kind of a separate issue. C# enums could have been defined so that it wasn't permitted to have an unlisted integer value in one (with normal code doing runtime checks to enforce that). Then a switch wouldn't need paths for handling unlisted values since the runtime could assume they never happened. If you used unsafe code to make one anyway, you'd crash the runtime, but that's not different from all the other ways you can use unsafe code to violate assumptions and crash the runtime (put random garbage in a pointer/reference).

Boolwise, C++ is still worse since it doesn't prescribe any particular bit representation. It mandates the conversions, (true->1, false->0, 0->false, nonzero->true), but not what bits are actually stored in the bool. If you memcpy() to or from a bool, you technically have no portable expectation for what they mean. memcpy()-ing stuff in could also plausibly break real compilers if they implement an equality test by subtracting and then testing a zero flag.

Foxfire_
Nov 8, 2010

Xerophyte posted:

I wonder if there's an actual json library that awful or if they just rolled their own crappy parser. My heart wants me to believe it's the latter, but the former would absolutely not surprise me.

If I were writing a JSON parser, making it performant when parsing a 10MB string would not be high on my list of cases to optimize for.

Foxfire_
Nov 8, 2010

I'd expect that most JSON would be short and optimize for that + simplicity for fewer bugs since I wouldn't expect JSON-parsing to be generally bottlenecking vs IO. That might be a poor choice for a general-purpose library, depending on how it's actually used

Also, the thing that's bad is actually the C runtime code, not the JSON library. sscanf() [parse some tokens from the start of a string] is calling strlen() [how long is this entire string] and it doesn't really have any reason to. The standards don't promise any particular complexity for either, but it's turning a function that ought to be proportional to the size of the tokens being matched into something proportional to the size of the entire string. My guess for why it's not slow on some systems is that they're using different runtime versions.

You would think sscanf(someString, "%d,", &variable) would only be examining the string up to either the first , or the end-of-string, not that it touches every single byte always

Foxfire_ fucked around with this message at 02:00 on Mar 2, 2021

Foxfire_
Nov 8, 2010

You say that, but it would also not surprise me if the person who originally wrote it was not anticipating the 'possible microtransactions' list to be 63,000 things long. I feel bad about condemning them for that

Foxfire_
Nov 8, 2010

more falafel please posted:

What JSON parsing library is using sscanf? That seems like a terrible idea for a vaguely EBNF-style language. The whole thing smacks of hand rolled.

Apparently real-world atof() and strtod() implementations call sscanf(), which calls strlen(). The RapidYAML issue someone linked earlier is the same thing where once they figured out the next thing in the input string was a float, they called atof() on it, with goes badly if there's a lot of trailing string. The nlohmann json library other people in this thread liked does basically the same thing (but strtof()), except it's happened to have copied the content being parsed into a short temporary first so it doesn't explode. RapidJSON rolled their own strtod() implementation instead of using a stdlib one, so they don't have the problem at all.

Adbot
ADBOT LOVES YOU

Foxfire_
Nov 8, 2010

Fails code review

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply