|
Adhemar posted:This is routinely done in the HFT space, but it’s possible because the exchange protocol spec makes strong guarantees about how the wire data is laid out. It’s just a matter of writing/generating your structs to match. C++ does *not* make sufficiently strong guarantees about how types are laid out.
|
# ? Jul 14, 2019 04:06 |
|
|
# ? Jun 8, 2024 23:22 |
|
Ralith posted:C++ does *not* make sufficiently strong guarantees about how types are laid out.
|
# ? Jul 14, 2019 04:46 |
|
Jeffrey of YOSPOS posted:Right, but it's still what you do if you don't want to copy the data. You do have to actually know what your compiler does. It's not like attribute((packed)) doesn't exist, spec or no spec.
|
# ? Jul 14, 2019 05:35 |
|
Ralith posted:It is absolutely not the only way to avoid copying data; it's just a particularly lazy one. See e.g. the flyweight pattern I discussed a few posts ago for an alternative. Unpredictable memory layout isn't the only danger here either; you're most likely violating strict aliasing. I don't know what HFT codebases look like, but in my field (embedded) -fno-strict-aliasing is routine
|
# ? Jul 14, 2019 06:06 |
|
Qwertycoatl posted:I don't know what HFT codebases look like, but in my field (embedded) -fno-strict-aliasing is routine
|
# ? Jul 14, 2019 07:35 |
|
I overlaid structs for years using reinterpret_cast when I worked in HFT, never had issues. I remember having to use #pragma pack. Things are simpler when you have 100% control over your compiler, OS, and hardware. You’re not writing general purpose code that has to run on any permutation of those. Speed is the only thing that matters.
|
# ? Jul 14, 2019 07:40 |
|
speed is the only thing that matters, which is why i turn off optimizations instead of writing stuff that's just as fast but also works with the optimizations turned on
|
# ? Jul 14, 2019 07:59 |
|
Ralith posted:It is absolutely not the only way to avoid copying data; it's just a particularly lazy one. See e.g. the flyweight pattern I discussed a few posts ago for an alternative. Unpredictable memory layout isn't the only danger here either; you're most likely violating strict aliasing. Jabor posted:speed is the only thing that matters, which is why i turn off optimizations instead of writing stuff that's just as fast but also works with the optimizations turned on Flyweight seems great if you’re in C++, but pretty cumbersome if you are working in C without member functions, and where I’m not sure the compiler can/will boil away the extra indirection. Have you seen a version of it work for free in plain-if-modern C?
|
# ? Jul 14, 2019 14:46 |
|
Ralith posted:If you want safe binary deserialization without copying your entire object, you can use the flyweight pattern, i.e. an object that wraps a pointer to your buffer and has accessors to pull out individual fields. That's how most encoding scheme tooling like flatbuffers or capnp work. Downside is lots of boilerplate if you're not using tools to generate it for you, upside is it's defined behavior and you can lay things out on the wire differently than a C++ struct. Over here is a discussion that said most of what has been said in this thread, where even the commenters mostly seem to agree that using memcpy is the least disastrous option. :/
|
# ? Jul 14, 2019 15:31 |
|
Jabor posted:speed is the only thing that matters, which is why i turn off optimizations instead of writing stuff that's just as fast but also works with the optimizations turned on You’re conflating posters, I never said anything about turning off optimizations. Anyway, I was only talking about a very specific use case. In general I wouldn’t recommend doing this as the performance gain is rarely worth the complexity.
|
# ? Jul 14, 2019 17:36 |
|
Subjunctive posted:Flyweight seems great if you’re in C++, but pretty cumbersome if you are working in C without member functions, and where I’m not sure the compiler can/will boil away the extra indirection. Have you seen a version of it work for free in plain-if-modern C? roomforthetuna posted:Do you have a reference for what you mean by flyweight? I found this which seems to be talking about something quite different involving caches and such. I think you've probably explained what you meant sufficiently inline in the sentence, I'm more just questioning whether the term "flyweight" actually refers to what you described. Adhemar posted:You’re conflating posters, I never said anything about turning off optimizations.
|
# ? Jul 14, 2019 22:03 |
|
Ralith posted:It might be an unconventional use of the term, but I think it's consistent with the definition given on Wikipedia. In particular, you're sharing the underlying buffer between arbitrarily many flyweight objects in the case of e.g. nested structures. If anyone can think of a clearer term, I'm all ears. The clearer term for me would have been if you had just not said "flyweight pattern" and just said the sentence you already said that described what you meant that you said anyway. I couldn't find any term for the "accessors into buffer" pattern though. I did find one person saying "I'm not a comp-sci person but I think cap'n proto is flyweight", while searching for what the cap'n proto guy himself would call this pattern. Edit: Thinking about it a bit more, I think this is kind of the converse of the flyweight pattern as described by wikipedia. That's about having many small objects referring to a set of static-ish immutable larger objects (eg. thousands of 7-bit ascii characters being used to refer to <128 typescript glyphs), whereas this accessor thing is more like a small number of static immutable accessor objects (or sets of functions) that can each be used to operate upon any number of buffers. Anyway, it doesn't seem to be something that has a name. roomforthetuna fucked around with this message at 22:50 on Jul 14, 2019 |
# ? Jul 14, 2019 22:38 |
|
Ralith posted:Violating strict aliasing safely necessarily requires disabling the optimizations that rely on it. I never violated it because I’m only aliasing between char* (the socket buffer) and integers of various sizes (the message fields). To my knowledge this is allowed. At least it was never an issue using the compilers I used at the time.
|
# ? Jul 14, 2019 22:48 |
|
It's nice to have a shorthand, because it's a widely useful pattern (and indeed applies to encoding messages as well as decoding them). We're trying to avoid repetition in the sense of not making copies of the data in the buffer, but instead exposing zero cost wrappers that kinda act like a copy. But it's definitely a bit of a stretch, I agree. Maybe a dedicated term needs to be coined.roomforthetuna posted:
To clarify, I'm talking about objects containing a single pointer data member and with many accessor methods. You could have large numbers of these referencing the same buffer, since they're trivially copyable and might refer to different parts of the buffer. Ralith fucked around with this message at 22:54 on Jul 14, 2019 |
# ? Jul 14, 2019 22:48 |
|
Adhemar posted:I never violated it because I’m only aliasing between char* (the socket buffer) and integers of various sizes (the message fields). To my knowledge this is allowed. At least it was never an issue using the compilers I used at the time. It's (in certain cases) okay to cast a valid T* to a char*, but not vis versa. UB is often difficult to detect; that's why it's so insidious.
|
# ? Jul 14, 2019 22:50 |
|
Ralith posted:It's (in certain cases) okay to cast a valid T* to a char*, but not vis versa. UB is often difficult to detect; that's why it's so insidious. How is code using the Flyweight pattern implemented then? Seems you would have the same problem there.
|
# ? Jul 15, 2019 02:58 |
|
hmm, sounds like the C standard should be updated to solve this extremely serious flaw in the langugage design. after all it's a tool that is supposed to make machines easier to program, not harder, so not supporting memory overlays is a pretty big problem.
|
# ? Jul 15, 2019 03:29 |
|
my god. considering the ubiquity of embedded devices it would be shocking if the primary language used to program them was so grossly deficient
|
# ? Jul 15, 2019 03:32 |
|
Adhemar posted:How is code using the Flyweight pattern implemented then? Seems you would have the same problem there.
|
# ? Jul 15, 2019 05:55 |
|
So if you're doing the not-quite-flyweight thing of having a buffer and a bunch of accessors and using memcpy in the accessors to avoid problems with strict aliasing, is there a recommended way to make sure the buffer is aligned to allow for maximum optimization? Will things just naturally fall on well aligned boundaries if you don't actively go out of your way to ruin it? I mean assuming your data structure itself is designed with padding so values are internally aligned. For example, is there a difference in layout between code:
code:
roomforthetuna fucked around with this message at 15:17 on Jul 16, 2019 |
# ? Jul 16, 2019 15:12 |
|
roomforthetuna posted:So if you're doing the not-quite-flyweight thing of having a buffer and a bunch of accessors and using memcpy in the accessors to avoid problems with strict aliasing, is there a recommended way to make sure the buffer is aligned to allow for maximum optimization? Will things just naturally fall on well aligned boundaries if you don't actively go out of your way to ruin it? I mean assuming your data structure itself is designed with padding so values are internally aligned. Aligning the buffer itself is easy; C++ has the alignas specifier since C++11. char arrays in larger structures can easily be unaligned without that, because if you were using them just as char arrays there'd be no benefit. There may be cases where you additionally want to specify the alignment of the pointer into the buffer to the compiler, e.g. perhaps by saying "using aligned alignas(8) = char;" then using "aligned*" where appropriate. This doesn't seem to have any effect on x86 where unaligned loads Just Work, though. IIRC operator new always returns memory aligned to 8 bytes or better on most platforms, but to be sure you can use the C++17 version which has an alignment argument.
|
# ? Jul 16, 2019 15:59 |
|
std::vector<bool> was a mistake.
|
# ? Jul 17, 2019 10:26 |
|
Ralith posted:This doesn't seem to have any effect on x86 where unaligned loads Just Work, though. That’s not universally true, is it? I thought that some of the SIMD instructions would fault on unaligned addresses.
|
# ? Jul 17, 2019 12:22 |
|
Subjunctive posted:That’s not universally true, is it? I thought that some of the SIMD instructions would fault on unaligned addresses. Yeah, for example, the XMM register loads/stores/gathers require 16-byte (4 32 bit float) alignment. What's nice is that libraries like GLM will choose the SSE-based implementation of matrix operations for properly aligned types.
|
# ? Jul 17, 2019 13:08 |
|
Misalignment will also generally have some performance impact, although recent microarchitectures are a lot better at hiding this.
|
# ? Jul 17, 2019 18:49 |
|
rjmccall posted:Misalignment will also generally have some performance impact, although recent microarchitectures are a lot better at hiding this.
|
# ? Jul 17, 2019 18:51 |
|
I think the current recommendation with SSE/AVX is to always use unaligned loads, but I don’t remember why I believe this.
|
# ? Jul 17, 2019 19:00 |
|
If the processor supports both unaligned and alignment-enforced variants of an operation, then the latter is almost certainly just the former but with an extra check which triggers a trap. So there's no penalty for using the unaligned instruction. But that doesn't mean the operation won't be faster if the pointer is actually aligned.
|
# ? Jul 17, 2019 19:11 |
|
rjmccall posted:If the processor supports both unaligned and alignment-enforced variants of an operation, then the latter is almost certainly just the former but with an extra check which triggers a trap. So there's no penalty for using the unaligned instruction. But that doesn't mean the operation won't be faster if the pointer is actually aligned.
|
# ? Jul 17, 2019 20:03 |
|
rjmccall posted:If the processor supports both unaligned and alignment-enforced variants of an operation, then the latter is almost certainly just the former but with an extra check which triggers a trap. So there's no penalty for using the unaligned instruction. But that doesn't mean the operation won't be faster if the pointer is actually aligned. That is making me very angry.
|
# ? Jul 17, 2019 23:42 |
|
Well, if you’re going to trap on alignment violations, you have to check those bits, and if you check those bits and you have the circuitry to make the operation work when they’re nonzero then I guess you could have an operation which just ignores those bits completely, like ARM64 TBI but low bits, but hoo boy does that sound like nothing but a source of crazy bugs. It’s not like it would be faster.
|
# ? Jul 17, 2019 23:52 |
|
Oh, you also generallu get weaker memory-ordering / atomicity guarantees with unaligned accesses, of course.
|
# ? Jul 17, 2019 23:53 |
|
Jeffrey of YOSPOS posted:Yeah I gotta imagine spanning a cache line still has performance issues even if other unaligned accesses are unnoticeable. The point is that the same code is generated regardless of whether you promise at compile time that the pointer will always be aligned. The dynamic alignment still matters, of course.
|
# ? Jul 18, 2019 02:56 |
|
I have a lovely library that returns raw pointers managed by the library through reference counting methods AddRef and RemRef. I want to wrap them in a smart pointer, preferably std::shared_ptr, and call AddRef/RemRef when copies are made/destructed. The latter seems easy enough with a custom deleter, but I'm struggling with making the right call to AddRef when it's copied.
|
# ? Jul 20, 2019 00:30 |
|
boost::intrusive_ptr or something like it is what you actually want. It's also like 10 lines of code to implement yourself.
|
# ? Jul 20, 2019 01:38 |
|
Yes it is. Thanks!
|
# ? Jul 20, 2019 02:02 |
|
I will not have people besmirching the good name of COM!
|
# ? Jul 20, 2019 04:33 |
|
Plorkyeran posted:boost::intrusive_ptr or something like it is what you actually want. It's also like 10 lines of code to implement yourself. I might suggest looking into my proposed (and currently on track for C++23) retain_ptr instead, since you don't have to worry about the "retain by default" semantic and having to pass "false" everywhere you want to adopt a pointer https://github.com/slurps-mad-rips/retain-ptr/ There's a bug with the builtin mixin type, but just specialize a retain_traits<YourType> and you're good to go
|
# ? Jul 20, 2019 06:29 |
|
You could use shared_ptr in this situation though, right? Custom deleters aren't called until the last shared_ptr destructs, so it doesn't matter that the shared_ptrs can't call AddRef when copied, just so long as the ref count is 1 when it first goes into the shared_ptr. You'd basically be nesting shared_ptr's ref count under the lovely library's ref count, which would just stay 1 the whole time.
|
# ? Jul 20, 2019 12:37 |
|
|
# ? Jun 8, 2024 23:22 |
|
Lime posted:You could use shared_ptr in this situation though, right? Custom deleters aren't called until the last shared_ptr destructs, so it doesn't matter that the shared_ptrs can't call AddRef when copied, just so long as the ref count is 1 when it first goes into the shared_ptr. You'd basically be nesting shared_ptr's ref count under the lovely library's ref count, which would just stay 1 the whole time. Bit of a waste, since shared_ptr allocates a control block. Now you've got two ref counts, *and* you're wasting a lot of extra space for just one pointer. 🤷♀️
|
# ? Jul 20, 2019 15:27 |