|
You're treating "undefined" the way compiler writers treat "undefined", is if it were some kind of inscrutable holy concept and Thou Shalt Not, as opposed to the way that actual programmers treat "undefined", meaning this is the precise behavior of implementation which may be unpredictable and random based on circumstances because we aren't doing anything to control that randomness, but may on the other hand be completely predictable and utterly non-random and we can take advantage of what the implementation actually does (especially when we're the ones doing the implementation), as upposed to the behavior of Abstract Machine written by some eggheads on the Standards Committee. I don't know why they're doing if data[size()] != '\0' data[size()] = '\0', instead of just unconditionally writing the 0. I skipped the beginning of the video, but I assume they're trying to avoid invalidating cache lines on other CPU cores via unnecessary writes -- if there is already a 0 at that memory location than they don't need to write a 0 at that location again, which means the current CPU core doesn't need to take exclusive ownership of that cache line and cause every other CPU core to evict that line from their cache. And I don't think he describes the scenario correctly, because it seems like he's talking about a situation where the second page is freed using madvise(MADV_DONTNEED) i.e. the kernel is informed that the process doesn't need that page anymore and enough happens on the kernel-side that future reads from that page end up coming from the zero page, but when the process gets around to writing to that page again the original page still hasn't been fully freed and the kernel just slots that original page back in, which contains the non-zero data. But in order for jemalloc to mark a page as DONTNEED, it'd have to not be used by anything in the first place, including this std::string which ostensibly jemalloc knows spans pages A and B. The situation I think he's attempting to describe is as follows: 1) jemalloc allocates page B, the program writes data to it, then the program frees the data in page B and jemalloc marks it as DONTNEED because no allocations are using it. 2) the kernel wires up the zero page in at page B, but never gets around to actually freeing the original page B 3) the program allocates the std::string and just the right circumstances result in the string spanning pages A and B, but the string data is too short for it to write to page B 4) c_str() gets called, the std::string reads the first byte of page B and sees that it is 0, so doesn't bother to write the 0 byte as the aforementioned cache optimization 5) the char* gets passed around the program, assumed to be safely NUL terminated 6) the program mallocs more memory, jemalloc returns the unused portion of page B 7) the program writes to the newly-allocated memory in page B, a page fault happens, the kernel does its bookkeeping and sees that page B was marked DONTNEED but is still around, it puts the original page B back into that location 8) the original page B had a non-zero value at byte offset zero due to it being used in to store data in step 1, so that cache optimization back in step 4 was wrong 9) now that char* isn't NUL terminated pseudorandom name fucked around with this message at 19:32 on Aug 23, 2022 |
# ? Aug 23, 2022 19:29 |
|
|
# ? Jun 7, 2024 09:07 |
|
pseudorandom name posted:I don't know why they're doing if data[size()] != '\0' data[size()] = '\0', instead of just unconditionally writing the 0. I skipped the beginning of the video, but I assume they're trying to avoid invalidating cache lines on other CPU cores via unnecessary writes -- if there is already a 0 at that memory location than they don't need to write a 0 at that location again, which means the current CPU core doesn't need to take exclusive ownership of that cache line and cause every other CPU core to evict that line from their cache. This is apparently exactly what was happening, the build group was calling c_str() on const strings shared by many threads, so the unconditional write was murdering performance. quote:The situation I think he's attempting to describe is as follows: This looks like a flow that could actually cause it with the specific behaviors of jemalloc on Linux! The Stack overflow post I linked mentions that this bug couldn't happen if allocating the string is the first time page B is being used, because jemalloc returns 0-ed pages anyway right now, so to make the bug happen you'd need a page that was already being used and had some junk in it, and then the front of that page would have to be freed before being allocated as part of data. quote:Now, jemalloc's only mmap() call uses the flags MAP_PRIVATE | MAP_ANONYMOUS with the occasional inclusion of MAP_FIXED, and in particular it doesn't use MAP_UNINITIALIZED. This means that pages are always zero-initialized when allocated.
|
# ? Aug 23, 2022 19:45 |
|
pseudorandom name posted:You're treating "undefined" the way compiler writers treat "undefined", is if it were some kind of inscrutable holy concept and Thou Shalt Not, as opposed to the way that actual programmers treat "undefined", meaning this is the precise behavior of implementation which may be unpredictable and random based on circumstances because we aren't doing anything to control that randomness, but may on the other hand be completely predictable and utterly non-random and we can take advantage of what the implementation actually does (especially when we're the ones doing the implementation), as upposed to the behavior of Abstract Machine written by some eggheads on the Standards Committee. The exact meaning of 'unspecified value' and whether repeated reads to them must return the same value has practical concerns here. If it does, Facebook's code is correct and jemalloc is buggy. If it doesn't, Facebook's code is buggy and jemalloc is correct.
|
# ? Aug 23, 2022 19:59 |
|
Reading from memory outside of the bounds returned by malloc or new is undefined behavior and reading memory that has never been written is also undefined behavior. The compiler is perfectly allowed to put an abort() where such reads occurred. Sometimes compiler writers are assholes who actually do this kind of thing, and that's why programmers want to stab them. In this situation, reading outside the bounds of what was previously written is a perfectly reasonable thing to do, because we know how computers actually work even if compiler writers are confused by the concept. We know that memory is sub-divided into pages, we know that it is safe to read anywhere within the bounds of a page if that page is mapped, even if we have never written any data to some or all parts of that page. It will naturally contain *some* value, by virtue of how computers work, even if the eggheads say doing this is undefined. We're the ones defining what actually happens, after all. The bug here doesn't have anything to do with undefined behavior, it was a simple failure to take into account the possibility that MADV_DONTNEED could be in use and that it would cause data that we own to spontaneously change in value even if the program was correct and nothing else ever wrote to it.
|
# ? Aug 23, 2022 20:08 |
|
Debugging that must have been an intense experience.
|
# ? Aug 23, 2022 20:27 |
|
I like the joke compiler that starts nethack any time a program does undefined behavior
|
# ? Aug 23, 2022 20:32 |
|
You (& the facebook person that wrote that code) have adopted the view that reading from an unspecified never-been-written location must always return the same value. jemalloc's authors adopted the view that reading from it may return different values each read. Both views are potentially reasonable. If only there was some sort of consensus document that picked one or the other so that we could write code or change libraries without digging into all of the implementation details of every thing in the system.
|
# ? Aug 23, 2022 20:38 |
|
Odds are pretty good that Jason wrote that jemalloc code while at Facebook, so really they did this to themselves.
|
# ? Aug 23, 2022 20:44 |
|
Foxfire_ posted:You (& the facebook person that wrote that code) have adopted the view that reading from an unspecified never-been-written location must always return the same value. jemalloc's authors adopted the view that reading from it may return different values each read. Both views are potentially reasonable. No, I adopted the view that reading from a mapped page will return the value at that offset in that page, and the value can change without any writes to that location if the underlying page mapping changes. The language spec isn't even aware that pages exist, and the twist here is that the jemalloc implementation can randomly cause the page mapping to change unexpectedly without any system calls taking place.
|
# ? Aug 23, 2022 21:24 |
|
IIRC there are kinds of memory hardware that behave this way — they don’t really hold a reliable value until they process a store. Anyway, don’t blame the standard here, jemalloc’s behavior here is clearly legal and this code is clearly wrong. It’s not the standard’s fault that some programmer “knew better” but then didn’t.
|
# ? Aug 24, 2022 23:26 |
|
Anyway, if I ever needed to optimize c_str, I’d just start fixing the interfaces that required a nul-terminated string to take a pointer and length instead, and I wouldn’t stop until I no longer cared about c_str.
|
# ? Aug 24, 2022 23:30 |
|
rjmccall posted:Anyway, if I ever needed to optimize c_str, I’d just start fixing the interfaces that required a nul-terminated string to take a pointer and length instead, and I wouldn’t stop until I no longer cared about c_str.
|
# ? Aug 25, 2022 00:26 |
|
I did once handcode my own string view style structs in plain C to work with subsets of a mmap-ed file, and that felt easy enough - so yeah, you can absolutely do it in C if you like.
|
# ? Aug 25, 2022 00:59 |
|
roomforthetuna posted:Fix them to take a string_view, you mean. (Even in C the function could take an equivalent struct, right?) code:
|
# ? Aug 25, 2022 01:01 |
|
leper khan posted:
For small strings at low addresses you could embed the length in the upper values of the pointer, this struct is not sufficiently complicated I’m sorry
|
# ? Aug 25, 2022 01:18 |
|
Sweeper posted:For small strings at low addresses you could embed the length in the upper values of the pointer, this struct is not sufficiently complicated I’m sorry It's a struct. To do the thing you're talking about you'd need a union to support multiple types.
|
# ? Aug 25, 2022 01:30 |
|
Err.. union wouldn't bit pack that way. Not sure it's well represented without macros in C
|
# ? Aug 25, 2022 01:31 |
|
leper khan posted:Err.. union wouldn't bit pack that way. Not sure it's well represented without macros in C C code:
you'd have to do some casting but you could put that in helper functions. also not sure if you really get anything out of this "optimization" because either it has to take up the same pace as a pointer-size pair or you always have to pointer-pass it and dynamically cast
|
# ? Aug 25, 2022 01:41 |
|
It's a joke. Varying the size of the struct based on the runtime contents is not something that actually works - consider any case where a struct is currently in the "small" state, and then someone writes a new value that requires it to change to the "big" state, and how you would accomplish that if that full size hadn't been allocated and kept free for use from the beginning. It also makes knowing references to other things you run into in C and C++, like bitpacking flags into unused pointer bits, and the short-string optimizations commonly seen in C++ implementations.
|
# ? Aug 25, 2022 01:51 |
|
I have actually encountered a scenario where I had views into strings that were known to always be less than 4095 bytes long and so I could encode the string's length into the extra bits of the pointer. It ended up being slower than just doing sane things so it thankfully didn't ship.
|
# ? Aug 25, 2022 04:20 |
|
Plorkyeran posted:I have actually encountered a scenario where I had views into strings that were known to always be less than 4095 bytes long and so I could encode the string's length into the extra bits of the pointer.
|
# ? Aug 25, 2022 04:23 |
|
Making data more cache friendly can have pretty dramatic performance benefits on basically all modern platforms. In this case I just failed at profiling. It was spending lots of time on cache misses and had lots of arrays that were between one and two cachelines in size, so making them all fit in one would be beneficial, right? It then turned out that we basically never actually made it to the end of the longer arrays so eliminating a cache miss when reading the last few elements wasn't useful. Also basically all the cache misses were on the string data being pointed to.
|
# ? Aug 25, 2022 04:47 |
|
Prefetch is your friend, guy
|
# ? Aug 25, 2022 07:41 |
|
I was vaguely remembering that Java did something weird with pointers, and looking for it I ran into this lovely quote from the parallel universe that is Java internals:quote:One, possibly compressed, Klass word, which represents a pointer to class metadata. Before Java 7, they were pointing to the Permanent Generation, but from the Java 8 onward, they are pointing to the Metaspace I, too, look forward to the migration from the permanent generation to the metaspace. As for Java pointers, it seems that by default they align their objects to 64 bits, and since that means all pointers end in (at least) three zeroes, they shift them three bits to the right and get 35 bits of useful addressing in 32 bits of what they refer to as compressed pointers. Honestly more reasonable than I remembered.
|
# ? Aug 25, 2022 10:56 |
|
Does anyone have recommendations for a static analysis tool that can help identify when transitive #includes are being used within a file, and also mark unused #includes for elimination. Maybe clang-tidy does this? I'll give that a shot to start. I haven't run clang-tidy in a while and remember it being a bit finicky to set up, what's the cleanest way to run it? I've got a bunch of hosed up scientific computing code that's been depending on transitive includes and specifically ordered includes for years, and I'd like a tool assisted way to rapidly unfuck that problem. Also, does anyone have a textbook recommendation for cmake? I've just been fumbling my way around, and most of what's out there seems to use old, deprecated ways of doing things and I'd love to get up to date with modern cmake.
|
# ? Aug 29, 2022 21:20 |
|
Twerk from Home posted:Does anyone have recommendations for a static analysis tool that can help identify when transitive #includes are being used within a file, and also mark unused #includes for elimination. Maybe clang-tidy does this? I'll give that a shot to start. I haven't run clang-tidy in a while and remember it being a bit finicky to set up, what's the cleanest way to run it? CLion marks unused includes, but doesn't note transitive includes (at least not by default) Not sure where it's scraping the analysis from. I usually just FAFO with cmake until things work. Then I forget everything about it again.
|
# ? Aug 29, 2022 21:41 |
|
Twerk from Home posted:Does anyone have recommendations for a static analysis tool that can help identify when transitive #includes are being used within a file, and also mark unused #includes for elimination. Maybe clang-tidy does this? I'll give that a shot to start. I haven't run clang-tidy in a while and remember it being a bit finicky to set up, what's the cleanest way to run it? Google made a tool for this called “include what you use” which seems to still be maintained, probably because it was pushed by a very senior employee who didn’t seem to have much else going on. Very nice guy.
|
# ? Aug 29, 2022 23:42 |
|
rjmccall posted:Google made a tool for this called “include what you use” which seems to still be maintained, probably because it was pushed by a very senior employee who didn’t seem to have much else going on. Very nice guy. Thanks!! This is exactly up my alley. The first time I read your post, I thought you were sarcastically telling me that Google engineers were too smart to make this mistake, and a specific person would yell at people who did. https://github.com/include-what-you-use/include-what-you-use This is going to be a time saver once I get it set up. These kinds of problems exist in a ton of the software we use, and I've already found 3 projects so far that got blown up by this change that went into GCC 11 that means that including <algorithm> no longer includes <limits>, among other things. Bugs like this are what made conda-forge decide to stay on a CentOS 6 sysroot instead of risking the change to Centos 7. I had been poking at things manually with g++ -H, which was already finding lots of places that have only been working due to the order of #includes. Edit: Sweet, it's even packaged in Debian. Thanks LLVM team! https://packages.debian.org/bullseye/iwyu Twerk from Home fucked around with this message at 04:33 on Aug 30, 2022 |
# ? Aug 30, 2022 04:30 |
|
IWYU is useful, but takes too much manual shepherding for me. It doesn't properly model stdlib, so you will have it tell you to include <iosfwd> to get size_t, include bits/foo (internal libstdc++ header) and so on. My projects also doesn't use Google's include style, so for every #include <my-project/foo/bar.hpp> include it eill spam out suggestion to remove it because it is superfluous, oh and also add missing #include "my-project/foo/bar.hpp".
|
# ? Aug 30, 2022 05:23 |
|
Xarn posted:IWYU is useful, but takes too much manual shepherding for me. It doesn't properly model stdlib, so you will have it tell you to include <iosfwd> to get size_t, include bits/foo (internal libstdc++ header) and so on. Wow, you were not kidding about this. There's no way that this is how it's supposed to be working, I'm doing to read docs / config a bit in the hopes of being able to use its automated fixup tools on a whole lot of projects. There must be a way to make it understand libstdc++'s public API. Its current insane suggestions are: code:
Maybe this is a mismatch of iwyu and my local clang environment? I'm on clang-13 at the moment, but using iwyu 0.18. I'll try fixing that first. Twerk from Home fucked around with this message at 15:58 on Aug 30, 2022 |
# ? Aug 30, 2022 15:49 |
|
Isn't there a mapping file for the stdlib that tells IWYU which headers are the public ones that should be included vs. which ones are implementation details that should just be transitively included via the public headers?
|
# ? Aug 30, 2022 15:58 |
|
I do dev on an old laptop and my compilation times in c++ are just excruciating. The worst cases of course are if I change a declaration in a header file that a lot of translation units refer to, which causes all those files to recompile. Just thinking aloud: the alternative to using a common header file would be to define those util functions as free functions with external linkage and then let the linker handle it instead. I realize there are issues with this approach, but it would probably mean way fewer recompiles when those util functions change, right? Or does the time saved in compilation get eaten up at link time or something?
|
# ? Sep 2, 2022 19:01 |
|
If you actually are changing the declarations then that won't help you because the declaration matter to everything that uses them. If you meant to say definitions (for inline functions), then, yes, absolutely.
|
# ? Sep 2, 2022 19:09 |
|
Sorry, by “changing declarations” I meant harmless stuff like adding new symbols or deleting unused ones. Even changing an existing declaration would only require changing the files that use that declaration, which would already be a huge win (theoretically) for recompile times
|
# ? Sep 2, 2022 19:12 |
|
Only if you religiously put every function declaration in its own header which you only include from exactly the files that will use it. C build systems and compilers are not designed to avoid work at a granularity less than “this file or something it includes changed, so rebuild it completely”.
|
# ? Sep 2, 2022 19:19 |
|
rjmccall posted:Only if you religiously put every function declaration in its own header which you only include from exactly the files that will use it. MSVC will sometimes not recompile things entirely if the changes you made were insignificant enough, but that process has its own overhead.
|
# ? Sep 2, 2022 19:21 |
|
Need -f function-sections but for editors.
|
# ? Sep 2, 2022 19:22 |
|
Sorry, I think I was being unclear. This is what I'm suggesting:code:
code:
|
# ? Sep 2, 2022 19:28 |
|
A couple of solutions to this, in decreasing sanity order would be: 1. Finer grained headers. It sounds like you may be growing a gigantic header? 2. The module system introduced in C++20, specifically designed to fix this: https://en.cppreference.com/w/cpp/language/modules 3. Just locally forward declare things that you don't need to know the full signature of.
|
# ? Sep 2, 2022 19:30 |
|
|
# ? Jun 7, 2024 09:07 |
|
Hmmmm modules indeed look like what I really want. I’ve heard people complain about them so I haven’t looked at them yet, but it might be worth at least experimenting with them a bit
|
# ? Sep 2, 2022 19:39 |