Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
pseudorandom name
May 6, 2007

You're treating "undefined" the way compiler writers treat "undefined", is if it were some kind of inscrutable holy concept and Thou Shalt Not, as opposed to the way that actual programmers treat "undefined", meaning this is the precise behavior of implementation which may be unpredictable and random based on circumstances because we aren't doing anything to control that randomness, but may on the other hand be completely predictable and utterly non-random and we can take advantage of what the implementation actually does (especially when we're the ones doing the implementation), as upposed to the behavior of Abstract Machine written by some eggheads on the Standards Committee.

I don't know why they're doing if data[size()] != '\0' data[size()] = '\0', instead of just unconditionally writing the 0. I skipped the beginning of the video, but I assume they're trying to avoid invalidating cache lines on other CPU cores via unnecessary writes -- if there is already a 0 at that memory location than they don't need to write a 0 at that location again, which means the current CPU core doesn't need to take exclusive ownership of that cache line and cause every other CPU core to evict that line from their cache.

And I don't think he describes the scenario correctly, because it seems like he's talking about a situation where the second page is freed using madvise(MADV_DONTNEED) i.e. the kernel is informed that the process doesn't need that page anymore and enough happens on the kernel-side that future reads from that page end up coming from the zero page, but when the process gets around to writing to that page again the original page still hasn't been fully freed and the kernel just slots that original page back in, which contains the non-zero data. But in order for jemalloc to mark a page as DONTNEED, it'd have to not be used by anything in the first place, including this std::string which ostensibly jemalloc knows spans pages A and B.

The situation I think he's attempting to describe is as follows:

1) jemalloc allocates page B, the program writes data to it, then the program frees the data in page B and jemalloc marks it as DONTNEED because no allocations are using it.
2) the kernel wires up the zero page in at page B, but never gets around to actually freeing the original page B
3) the program allocates the std::string and just the right circumstances result in the string spanning pages A and B, but the string data is too short for it to write to page B
4) c_str() gets called, the std::string reads the first byte of page B and sees that it is 0, so doesn't bother to write the 0 byte as the aforementioned cache optimization
5) the char* gets passed around the program, assumed to be safely NUL terminated
6) the program mallocs more memory, jemalloc returns the unused portion of page B
7) the program writes to the newly-allocated memory in page B, a page fault happens, the kernel does its bookkeeping and sees that page B was marked DONTNEED but is still around, it puts the original page B back into that location
8) the original page B had a non-zero value at byte offset zero due to it being used in to store data in step 1, so that cache optimization back in step 4 was wrong
9) now that char* isn't NUL terminated

pseudorandom name fucked around with this message at 19:32 on Aug 23, 2022

Adbot
ADBOT LOVES YOU

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

pseudorandom name posted:

I don't know why they're doing if data[size()] != '\0' data[size()] = '\0', instead of just unconditionally writing the 0. I skipped the beginning of the video, but I assume they're trying to avoid invalidating cache lines on other CPU cores via unnecessary writes -- if there is already a 0 at that memory location than they don't need to write a 0 at that location again, which means the current CPU core doesn't need to take exclusive ownership of that cache line and cause every other CPU core to evict that line from their cache.

This is apparently exactly what was happening, the build group was calling c_str() on const strings shared by many threads, so the unconditional write was murdering performance.



quote:

The situation I think he's attempting to describe is as follows:

1) jemalloc allocates page B, the program writes data to it, then the program frees the data in page B and jemalloc marks it as DONTNEED because no allocations are using it.
2) the kernel wires up the zero page in at page B, but never gets around to actually freeing the original page B
3) the program allocates the std::string and just the right circumstances result in the string spanning pages A and B, but the string data is too short for it to write to page B
4) c_str() gets called, the std::string reads the first byte of page B and sees that it is 0, so doesn't bother to write the 0 byte as the aforementioned cache optimization
5) the char* gets passed around the program, assumed to be safely NUL terminated
6) the program mallocs more memory, jemalloc returns the unused portion of page B
7) the program writes to the newly-allocated memory in page B, a page fault happens, the kernel does its bookkeeping and sees that page B was marked DONTNEED but is still around, it puts the original page B back into that location
8) the original page B had a non-zero value at byte offset zero due to it being used in to store data in step 1, so that cache optimization back in step 4 was wrong
9) now that char* isn't NUL terminated

This looks like a flow that could actually cause it with the specific behaviors of jemalloc on Linux! The Stack overflow post I linked mentions that this bug couldn't happen if allocating the string is the first time page B is being used, because jemalloc returns 0-ed pages anyway right now, so to make the bug happen you'd need a page that was already being used and had some junk in it, and then the front of that page would have to be freed before being allocated as part of data.

quote:

Now, jemalloc's only mmap() call uses the flags MAP_PRIVATE | MAP_ANONYMOUS with the occasional inclusion of MAP_FIXED, and in particular it doesn't use MAP_UNINITIALIZED. This means that pages are always zero-initialized when allocated.

Additionally, even madvise() with MADV_DONTNEED will, for anonymous mappings, return "zero-fill-on-demand pages" for anonymous mappings, which I read as "zero-initialized pages."

Foxfire_
Nov 8, 2010

pseudorandom name posted:

You're treating "undefined" the way compiler writers treat "undefined", is if it were some kind of inscrutable holy concept and Thou Shalt Not, as opposed to the way that actual programmers treat "undefined", meaning this is the precise behavior of implementation which may be unpredictable and random based on circumstances because we aren't doing anything to control that randomness, but may on the other hand be completely predictable and utterly non-random and we can take advantage of what the implementation actually does (especially when we're the ones doing the implementation), as upposed to the behavior of Abstract Machine written by some eggheads on the Standards Committee.
You absolutely care about the way compiler & OS writers interpret "undefined behavior" (as "It is a bug for any program to do this") because they do not care about whether any potential transformations preserve what the output is 'supposed' to be for buggy programs.

The exact meaning of 'unspecified value' and whether repeated reads to them must return the same value has practical concerns here. If it does, Facebook's code is correct and jemalloc is buggy. If it doesn't, Facebook's code is buggy and jemalloc is correct.

pseudorandom name
May 6, 2007

Reading from memory outside of the bounds returned by malloc or new is undefined behavior and reading memory that has never been written is also undefined behavior. The compiler is perfectly allowed to put an abort() where such reads occurred. Sometimes compiler writers are assholes who actually do this kind of thing, and that's why programmers want to stab them.

In this situation, reading outside the bounds of what was previously written is a perfectly reasonable thing to do, because we know how computers actually work even if compiler writers are confused by the concept. We know that memory is sub-divided into pages, we know that it is safe to read anywhere within the bounds of a page if that page is mapped, even if we have never written any data to some or all parts of that page. It will naturally contain *some* value, by virtue of how computers work, even if the eggheads say doing this is undefined. We're the ones defining what actually happens, after all.

The bug here doesn't have anything to do with undefined behavior, it was a simple failure to take into account the possibility that MADV_DONTNEED could be in use and that it would cause data that we own to spontaneously change in value even if the program was correct and nothing else ever wrote to it.

Subjunctive
Sep 12, 2006

✨sparkle and shine✨

Debugging that must have been an intense experience.

cheetah7071
Oct 20, 2010

honk honk
College Slice
I like the joke compiler that starts nethack any time a program does undefined behavior

Foxfire_
Nov 8, 2010

You (& the facebook person that wrote that code) have adopted the view that reading from an unspecified never-been-written location must always return the same value. jemalloc's authors adopted the view that reading from it may return different values each read. Both views are potentially reasonable.

If only there was some sort of consensus document that picked one or the other so that we could write code or change libraries without digging into all of the implementation details of every thing in the system.

Subjunctive
Sep 12, 2006

✨sparkle and shine✨

Odds are pretty good that Jason wrote that jemalloc code while at Facebook, so really they did this to themselves.

pseudorandom name
May 6, 2007

Foxfire_ posted:

You (& the facebook person that wrote that code) have adopted the view that reading from an unspecified never-been-written location must always return the same value. jemalloc's authors adopted the view that reading from it may return different values each read. Both views are potentially reasonable.

If only there was some sort of consensus document that picked one or the other so that we could write code or change libraries without digging into all of the implementation details of every thing in the system.


No, I adopted the view that reading from a mapped page will return the value at that offset in that page, and the value can change without any writes to that location if the underlying page mapping changes.

The language spec isn't even aware that pages exist, and the twist here is that the jemalloc implementation can randomly cause the page mapping to change unexpectedly without any system calls taking place.

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
IIRC there are kinds of memory hardware that behave this way — they don’t really hold a reliable value until they process a store.

Anyway, don’t blame the standard here, jemalloc’s behavior here is clearly legal and this code is clearly wrong. It’s not the standard’s fault that some programmer “knew better” but then didn’t.

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
Anyway, if I ever needed to optimize c_str, I’d just start fixing the interfaces that required a nul-terminated string to take a pointer and length instead, and I wouldn’t stop until I no longer cared about c_str.

roomforthetuna
Mar 22, 2005

I don't need to know anything about virii! My CUSTOM PROGRAM keeps me protected! It's not like they'll try to come in through the Internet or something!

rjmccall posted:

Anyway, if I ever needed to optimize c_str, I’d just start fixing the interfaces that required a nul-terminated string to take a pointer and length instead, and I wouldn’t stop until I no longer cared about c_str.
Fix them to take a string_view, you mean. (Even in C the function could take an equivalent struct, right?)

Computer viking
May 30, 2011
Now with less breakage.

I did once handcode my own string view style structs in plain C to work with subsets of a mmap-ed file, and that felt easy enough - so yeah, you can absolutely do it in C if you like.

leper khan
Dec 28, 2010
Honest to god thinks Half Life 2 is a bad game. But at least he likes Monster Hunter.

roomforthetuna posted:

Fix them to take a string_view, you mean. (Even in C the function could take an equivalent struct, right?)

code:

struct string_view {
char *ptr;
unsigned int len;
}

Sweeper
Nov 29, 2007
The Joe Buck of Posting
Dinosaur Gum

leper khan posted:

code:
struct string_view {
char *ptr;
unsigned int len;
}

For small strings at low addresses you could embed the length in the upper values of the pointer, this struct is not sufficiently complicated I’m sorry

leper khan
Dec 28, 2010
Honest to god thinks Half Life 2 is a bad game. But at least he likes Monster Hunter.

Sweeper posted:

For small strings at low addresses you could embed the length in the upper values of the pointer, this struct is not sufficiently complicated I’m sorry

It's a struct. To do the thing you're talking about you'd need a union to support multiple types.

leper khan
Dec 28, 2010
Honest to god thinks Half Life 2 is a bad game. But at least he likes Monster Hunter.
Err.. union wouldn't bit pack that way. Not sure it's well represented without macros in C

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man

leper khan posted:

Err.. union wouldn't bit pack that way. Not sure it's well represented without macros in C
C code:
typedef union {
    struct sizepacked {
        uint32_t size : 8
        uint32_t str : 24
    };
    struct sizepacked sp;
    char* ptr;
} string_view_smalladdr;
?

you'd have to do some casting but you could put that in helper functions. also not sure if you really get anything out of this "optimization" because either it has to take up the same pace as a pointer-size pair or you always have to pointer-pass it and dynamically cast

Jabor
Jul 16, 2010

#1 Loser at SpaceChem
It's a joke. Varying the size of the struct based on the runtime contents is not something that actually works - consider any case where a struct is currently in the "small" state, and then someone writes a new value that requires it to change to the "big" state, and how you would accomplish that if that full size hadn't been allocated and kept free for use from the beginning.

It also makes knowing references to other things you run into in C and C++, like bitpacking flags into unused pointer bits, and the short-string optimizations commonly seen in C++ implementations.

Plorkyeran
Mar 22, 2007

To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed
I have actually encountered a scenario where I had views into strings that were known to always be less than 4095 bytes long and so I could encode the string's length into the extra bits of the pointer.

It ended up being slower than just doing sane things so it thankfully didn't ship.

roomforthetuna
Mar 22, 2005

I don't need to know anything about virii! My CUSTOM PROGRAM keeps me protected! It's not like they'll try to come in through the Internet or something!

Plorkyeran posted:

I have actually encountered a scenario where I had views into strings that were known to always be less than 4095 bytes long and so I could encode the string's length into the extra bits of the pointer.

It ended up being slower than just doing sane things so it thankfully didn't ship.
It's not *totally* unreasonable. If you're memory/storage/network constrained and not processing-speed constrained then doing bonkers packing things makes sense. But that's pretty rare circumstances these days, maybe some sort of very unusual embedded context.

Plorkyeran
Mar 22, 2007

To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed
Making data more cache friendly can have pretty dramatic performance benefits on basically all modern platforms. In this case I just failed at profiling. It was spending lots of time on cache misses and had lots of arrays that were between one and two cachelines in size, so making them all fit in one would be beneficial, right?

It then turned out that we basically never actually made it to the end of the longer arrays so eliminating a cache miss when reading the last few elements wasn't useful. Also basically all the cache misses were on the string data being pointed to.

Beef
Jul 26, 2004
Prefetch is your friend, guy

Computer viking
May 30, 2011
Now with less breakage.

I was vaguely remembering that Java did something weird with pointers, and looking for it I ran into this lovely quote from the parallel universe that is Java internals:

quote:

One, possibly compressed, Klass word, which represents a pointer to class metadata. Before Java 7, they were pointing to the Permanent Generation, but from the Java 8 onward, they are pointing to the Metaspace
(From here)

I, too, look forward to the migration from the permanent generation to the metaspace.

As for Java pointers, it seems that by default they align their objects to 64 bits, and since that means all pointers end in (at least) three zeroes, they shift them three bits to the right and get 35 bits of useful addressing in 32 bits of what they refer to as compressed pointers. Honestly more reasonable than I remembered.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
Does anyone have recommendations for a static analysis tool that can help identify when transitive #includes are being used within a file, and also mark unused #includes for elimination. Maybe clang-tidy does this? I'll give that a shot to start. I haven't run clang-tidy in a while and remember it being a bit finicky to set up, what's the cleanest way to run it?

I've got a bunch of hosed up scientific computing code that's been depending on transitive includes and specifically ordered includes for years, and I'd like a tool assisted way to rapidly unfuck that problem.

Also, does anyone have a textbook recommendation for cmake? I've just been fumbling my way around, and most of what's out there seems to use old, deprecated ways of doing things and I'd love to get up to date with modern cmake.

leper khan
Dec 28, 2010
Honest to god thinks Half Life 2 is a bad game. But at least he likes Monster Hunter.

Twerk from Home posted:

Does anyone have recommendations for a static analysis tool that can help identify when transitive #includes are being used within a file, and also mark unused #includes for elimination. Maybe clang-tidy does this? I'll give that a shot to start. I haven't run clang-tidy in a while and remember it being a bit finicky to set up, what's the cleanest way to run it?

I've got a bunch of hosed up scientific computing code that's been depending on transitive includes and specifically ordered includes for years, and I'd like a tool assisted way to rapidly unfuck that problem.

Also, does anyone have a textbook recommendation for cmake? I've just been fumbling my way around, and most of what's out there seems to use old, deprecated ways of doing things and I'd love to get up to date with modern cmake.

CLion marks unused includes, but doesn't note transitive includes (at least not by default)

Not sure where it's scraping the analysis from.

I usually just FAFO with cmake until things work. Then I forget everything about it again.

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

Twerk from Home posted:

Does anyone have recommendations for a static analysis tool that can help identify when transitive #includes are being used within a file, and also mark unused #includes for elimination. Maybe clang-tidy does this? I'll give that a shot to start. I haven't run clang-tidy in a while and remember it being a bit finicky to set up, what's the cleanest way to run it?

I've got a bunch of hosed up scientific computing code that's been depending on transitive includes and specifically ordered includes for years, and I'd like a tool assisted way to rapidly unfuck that problem.

Also, does anyone have a textbook recommendation for cmake? I've just been fumbling my way around, and most of what's out there seems to use old, deprecated ways of doing things and I'd love to get up to date with modern cmake.

Google made a tool for this called “include what you use” which seems to still be maintained, probably because it was pushed by a very senior employee who didn’t seem to have much else going on. Very nice guy.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

rjmccall posted:

Google made a tool for this called “include what you use” which seems to still be maintained, probably because it was pushed by a very senior employee who didn’t seem to have much else going on. Very nice guy.

Thanks!! This is exactly up my alley. The first time I read your post, I thought you were sarcastically telling me that Google engineers were too smart to make this mistake, and a specific person would yell at people who did.

https://github.com/include-what-you-use/include-what-you-use

This is going to be a time saver once I get it set up. These kinds of problems exist in a ton of the software we use, and I've already found 3 projects so far that got blown up by this change that went into GCC 11 that means that including <algorithm> no longer includes <limits>, among other things. Bugs like this are what made conda-forge decide to stay on a CentOS 6 sysroot instead of risking the change to Centos 7.

I had been poking at things manually with g++ -H, which was already finding lots of places that have only been working due to the order of #includes.

Edit: Sweet, it's even packaged in Debian. Thanks LLVM team! https://packages.debian.org/bullseye/iwyu

Twerk from Home fucked around with this message at 04:33 on Aug 30, 2022

Xarn
Jun 26, 2015
IWYU is useful, but takes too much manual shepherding for me. It doesn't properly model stdlib, so you will have it tell you to include <iosfwd> to get size_t, include bits/foo (internal libstdc++ header) and so on.

My projects also doesn't use Google's include style, so for every #include <my-project/foo/bar.hpp> include it eill spam out suggestion to remove it because it is superfluous, oh and also add missing #include "my-project/foo/bar.hpp".

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Xarn posted:

IWYU is useful, but takes too much manual shepherding for me. It doesn't properly model stdlib, so you will have it tell you to include <iosfwd> to get size_t, include bits/foo (internal libstdc++ header) and so on.

Wow, you were not kidding about this. There's no way that this is how it's supposed to be working, I'm doing to read docs / config a bit in the hopes of being able to use its automated fixup tools on a whole lot of projects. There must be a way to make it understand libstdc++'s public API.

Its current insane suggestions are:
code:
#include <__algorithm/copy.h>            // for copy
#include <__algorithm/find.h>            // for find
#include <__algorithm/max.h>             // for max
#include <__algorithm/min.h>             // for min
#include <__functional/operations.h>     // for plus
#include <__memory/unique_ptr.h>         // for unique_ptr
#include <__utility/forward.h>           // for forward
#include <__utility/integer_sequence.h>  // for index_sequence, make_index_s...
#include <__string>                      // for char_traits<>::pos_type
#include <__tuple>                       // for tuple_size
It did find a whole bunch of missing includes as well though!

Maybe this is a mismatch of iwyu and my local clang environment? I'm on clang-13 at the moment, but using iwyu 0.18. I'll try fixing that first.

Twerk from Home fucked around with this message at 15:58 on Aug 30, 2022

Jabor
Jul 16, 2010

#1 Loser at SpaceChem
Isn't there a mapping file for the stdlib that tells IWYU which headers are the public ones that should be included vs. which ones are implementation details that should just be transitively included via the public headers?

giogadi
Oct 27, 2009

I do dev on an old laptop and my compilation times in c++ are just excruciating. The worst cases of course are if I change a declaration in a header file that a lot of translation units refer to, which causes all those files to recompile.

Just thinking aloud: the alternative to using a common header file would be to define those util functions as free functions with external linkage and then let the linker handle it instead. I realize there are issues with this approach, but it would probably mean way fewer recompiles when those util functions change, right? Or does the time saved in compilation get eaten up at link time or something?

pseudorandom name
May 6, 2007

If you actually are changing the declarations then that won't help you because the declaration matter to everything that uses them.

If you meant to say definitions (for inline functions), then, yes, absolutely.

giogadi
Oct 27, 2009

Sorry, by “changing declarations” I meant harmless stuff like adding new symbols or deleting unused ones. Even changing an existing declaration would only require changing the files that use that declaration, which would already be a huge win (theoretically) for recompile times

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
Only if you religiously put every function declaration in its own header which you only include from exactly the files that will use it.

C build systems and compilers are not designed to avoid work at a granularity less than “this file or something it includes changed, so rebuild it completely”.

Absurd Alhazred
Mar 27, 2010

by Athanatos

rjmccall posted:

Only if you religiously put every function declaration in its own header which you only include from exactly the files that will use it.

C build systems and compilers are not designed to avoid work at a granularity less than “this file or something it includes changed, so rebuild it completely”.

MSVC will sometimes not recompile things entirely if the changes you made were insignificant enough, but that process has its own overhead.

Subjunctive
Sep 12, 2006

✨sparkle and shine✨

Need -f function-sections but for editors.

giogadi
Oct 27, 2009

Sorry, I think I was being unclear. This is what I'm suggesting:

code:
// foo.cpp
int HowdyPardner() {
    return 5;
}
code:
// main.cpp
extern int HowdyPardner();

int main() {
    int x = HowdyPardner();
    return x;
}
This avoids the use of a header file for main.cpp to use the functions in foo.cpp. If I add more functions to foo.cpp, it won't require recompiles of main.cpp. Like I said, this of course has issues like requiring to keep the extern declarations in-sync with the definitions. My question is: are there any other downsides to this approach that I'm not aware of? Obviously it becomes more problematic with lots of users of the code in foo.cpp, because if a function header changes, I'd have to go through and change not only callsites, but also extern declarations.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
A couple of solutions to this, in decreasing sanity order would be:

1. Finer grained headers. It sounds like you may be growing a gigantic header?
2. The module system introduced in C++20, specifically designed to fix this: https://en.cppreference.com/w/cpp/language/modules
3. Just locally forward declare things that you don't need to know the full signature of.

Adbot
ADBOT LOVES YOU

giogadi
Oct 27, 2009

Hmmmm modules indeed look like what I really want. I’ve heard people complain about them so I haven’t looked at them yet, but it might be worth at least experimenting with them a bit

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply