Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Plorkyeran
Mar 22, 2007

To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed
It would be very bad if optimizer hints which the optimizer can't use produced warnings or errors. It would make supporting more than exactly one version of one compiler a nightmare.

Adbot
ADBOT LOVES YOU

roomforthetuna
Mar 22, 2005

I don't need to know anything about virii! My CUSTOM PROGRAM keeps me protected! It's not like they'll try to come in through the Internet or something!

Plorkyeran posted:

It would be very bad if optimizer hints which the optimizer can't use produced warnings or errors. It would make supporting more than exactly one version of one compiler a nightmare.
As a general thing that seems appropriate, but for "the compiler recognizes this token but this is explicitly documented as the wrong place for it" it seems like a warning would probably be more useful than ignoring it in case a different compiler might use the token there.

Jabor
Jul 16, 2010

#1 Loser at SpaceChem
I feel like there should be at least an optional diagnostic for that sort of mistake. It's not like it'd be the only warning that doesn't get turned on by -Wall.

You can just turn it off if you're compiling with multiple compiler versions and some of them don't understand a subset of your optimizer hints.

ultrafilter
Aug 23, 2007

It's okay if you have any questions.


A linter might catch that but I don't know if you actually want it to be an official warning.

Plorkyeran
Mar 22, 2007

To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed

roomforthetuna posted:

As a general thing that seems appropriate, but for "the compiler recognizes this token but this is explicitly documented as the wrong place for it" it seems like a warning would probably be more useful than ignoring it in case a different compiler might use the token there.

[[likely]] and [[unlikely]] are part of the C++ standard, so another compiler doing something useful with it appearing in places where clang does not is not exactly wildly implausible. Last I checked each of the three major compilers do in practice have different rules for what the annotations actually do.

Subjunctive
Sep 12, 2006

✨sparkle and shine✨

You do get a warning and have the hints ignored if you give conflicting hints for the sides of a branch.

Xarn
Jun 26, 2015
[[likely]] and [[unlikely]] are complete rear end and their design sucks and should've never made it into the standard. :colbert:

Also C++ attributes are stupid, but that is a much longer and involved discussion.

Sweeper
Nov 29, 2007
The Joe Buck of Posting
Dinosaur Gum

Xarn posted:

[[likely]] and [[unlikely]] are complete rear end and their design sucks and should've never made it into the standard. :colbert:

Also C++ attributes are stupid, but that is a much longer and involved discussion.

I’ll continue to use builtin expect macros which work great, not sure why I’d want to plop weird brackets things into my ifs instead, harder to read imo

Beef
Jul 26, 2004
You prefer writing `__attribute__((__foo__))` to `【[foo]]` ?

Absurd Alhazred
Mar 27, 2010

by Athanatos
Macros are horrible and inconsistent and hard to parse.

Also here are 300+ additional uses of existing punctuation marks in weird ways to sometimes provide slight improvements to your code and output if your programmers bother learning about it and your toolchain gets to supporting it.

Sweeper
Nov 29, 2007
The Joe Buck of Posting
Dinosaur Gum

Beef posted:

You prefer writing `__attribute__((__foo__))` to `【[foo]]` ?

I prefer
code:
#define likely(x)       __builtin_expect((x),1)
#define unlikely(x)     __builtin_expect((x),0)

if (likely(mything)) {}

qsvui
Aug 23, 2003
some crazy thing
Sheesh, at least use all caps for those

Xarn
Jun 26, 2015

Beef posted:

You prefer writing `__attribute__((__foo__))` to `【[foo]]` ?

I assume this is about my assertion that C++ attributes are stupid in general. The issue isn't with the syntax (although it could've been better), the issue is that there isn't a real agreement on what attributes actually mean, and what should be their standard semantics.

The original design for attributes is that they should be ignorable and thus forward compatible, with the idea being that if a compiler doesn't understand attributes from newer standard, that's fine and code that is well-formed with the attribute should also be well-formed without it. But this means that it is much easier to add in more attributes than it is to add keywords, and people started backdooring in keywords as attributes with some furious handwaving about how they are technically still attributes (and thus ignorable). This culminated with C++20's [[no_unique_address]], which really stretches the ignorability of attributes, because people want to use it as replacement for costly EBO-based compresible-pair impls. But if your code can be used with compiler where the attr might be ignored, then your code suddenly becomes a lot shittier, and likely violates bunch of assumption you've made about it.

Also to make things really suck, MSVC understands, but silently ignores [[no_unique_address]]... to get the correct behaviour, you need [[msvc::no_unique_address]]]. The reason behind this boils down to ABI stability :ignorance:

Xarn
Jun 26, 2015

Sweeper posted:

I prefer
code:
#define likely(x)       __builtin_expect((x),1)
#define unlikely(x)     __builtin_expect((x),0)

if (likely(mything)) {}

Same, at least this version has well-defined semantics :v:

pseudorandom name
May 6, 2007

*ahem*

code:
#define likely(x)  __builtin_expect(!!(x), 1)
#define unlikely(x) __builtin_expect(!!(x), 0)

Zopotantor
Feb 24, 2013

...und ist er drin dann lassen wir ihn niemals wieder raus...

pseudorandom name posted:

*ahem*

code:
#define likely(x)  __builtin_expect(!!(x), 1)
#define unlikely(x) __builtin_expect(!!(x), 0)

How wasteful.
code:
#define likely(x)  __builtin_expect(!(x), 0)
#define unlikely(x) __builtin_expect(!(x), 1)

pseudorandom name
May 6, 2007

OK, whatever, but the example I was replying to doesn't work correctly.

b0lt
Apr 29, 2005

Zopotantor posted:

How wasteful.
code:
#define likely(x)  __builtin_expect(!(x), 0)
#define unlikely(x) __builtin_expect(!(x), 1)

code:
#define likely(x) true
#define unlikely(x) false

Jeffrey of YOSPOS
Dec 22, 2005

GET LOSE, YOU CAN'T COMPARE WITH MY POWERS

b0lt posted:

code:
#define likely(x) ((x),true)
#define unlikely(x) ((x),false)
It's pretty rude not to evaluate x - I fixed this.

Foxfire_
Nov 8, 2010

Zopotantor posted:

How wasteful.
code:
#define likely(x)  __builtin_expect(!(x), 0)
#define unlikely(x) __builtin_expect(!(x), 1)
Gotta get rid of the magic numbers to pass code review
code:
#define ZERO 0
#define ONE 1

#define likely(x)  __builtin_expect(!(x), ZERO)
#define unlikely(x) __builtin_expect(!(x), ONE)

leper khan
Dec 28, 2010
Honest to god thinks Half Life 2 is a bad game. But at least he likes Monster Hunter.

Foxfire_ posted:

Gotta get rid of the magic numbers to pass code review
code:
#define ZERO 0
#define ONE 1

#define likely(x)  __builtin_expect(!(x), ZERO)
#define unlikely(x) __builtin_expect(!(x), ONE)

Style guide indicates that these should be meaningful.

code:
#define ZERO 0
#define ONE 1
#define LIKE ZERO
#define UNLIKE ONE

#define likely(x)  __builtin_expect(!(x), LIKE)
#define unlikely(x) __builtin_expect(!(x), UNLIKE)

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
I'm back with more built-in questions, specifically about https://gcc.gnu.org/onlinedocs/gcc/x86-Built-in-Functions.html

https://gcc.gnu.org/onlinedocs/gcc/x86-Built-in-Functions.html posted:

If you specify command-line switches such as -msse, the compiler could use the extended instruction sets even if the built-ins are not used explicitly in the program. For this reason, applications that perform run-time CPU detection must compile separate files for each supported architecture, using the appropriate flags. In particular, the file containing the CPU detection code should be compiled without these options.

This suggests that if I use something like -mavx2, the compiler may emit AVX2 instructions as optimizations for code even in places that I am not explicitly using AVX2 intrinsics myself, right? That makes it seem like to create a binary that uses AVX2 or AVX512 if available, I am required to compile multiple binaries and have an entrypoint binary that detects CPU feature support and launches the correct binary based on that.

Am I reading this right? How does anything get packaged correctly such that it uses features if available, but is able to run on CPUs that don't support those instructions?

Plorkyeran
Mar 22, 2007

To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed
Most stuff doesn't, but no, you don't need separate binaries. Suppose you currently have the following:

foo.h:
C++ code:
void foo();
foo.cpp:
C++ code:
void foo() {
  // do stuff
}
You need to turn it into something like the following:

foo_impl.h:
C++ code:
void FOO_NAME() {
  // do stuff
}
foo_impl.h:
C++ code:
void FOO_NAME() {
  // do stuff
}
foo_sse.cpp
C++ code:
#define FOO_NAME foo_sse
#include "foo_impl.h"
foo_avx.cpp
C++ code:
#define FOO_NAME foo_avx
#include "foo_impl.h"
foo.cpp
C++ code:
void foo_avx();
void foo_sse();
void foo() {
    if (avx_supported) {
        foo_avx();
    }
    else {
        foo_sse();
    }
}
You then compile foo_avx.cpp with -mavx and everything else with -msse.

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
Or you use something like multiversioning to explicitly enable architecture support only in specific functions.

But yes, -mavx is the “I guarantee my target CPU supports AVX” switch, not the “stop complaining if I use AVX intrinsics, but it’s my fault if I use them wrong” switch. Instruction set extensions are frequently useful for normal code generation.

Beef
Jul 26, 2004
Compiling from source with -march=native makes my programs go brrrrr.

Grep for xmm and ymm in your objdump and you will see the compiler using it everywhere.
E.g. pass structs args in the wider vector registers and just vmoving instead of calling memcpy.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
My mental model of compiler SIMD flags was extremely primitive and wrong. Now I'm curious how effectively the compiler will auto-vectorize tight numerical loops with -march=native -mtune=native -O3 (or is Linus right and -O2 is better in most situations due to smaller code generated?)

I'm guessing that the compiler still isn't willing to do things like pad vectors or have a vector loop followed by a scalar loop to get the leftovers, but I hadn't even thought of being able to fit small structs entirely in 256 or even 512 bit registers.

It looks like the default target is only SSE2 if you don't pass any of this flags, but also it looks like passing say, -mavx2 implies AVX1, SSE4.2, POPCNT and all.

My interest in this is more than academic, we've got a cluster of varying vintages and also extremely vectorizable workloads. The sooner we're able to more effectively use AVX-512 while still being able to execute on Broadwell, the better.

Edit: Also, it looks like mtune is pretty important. I'm seeing that the default -mtune generic produces some code that leaves a good amount of performance on the table for modern Intel or AMD CPUs: https://stackoverflow.com/questions/52626726/why-doesnt-gcc-resolve-mm256-loadu-pd-as-single-vmovupd . I can think of several applications off the top of my head that are using -mavx2 -mfma without -mtune.

Twerk from Home fucked around with this message at 05:00 on Jul 8, 2022

nielsm
Jun 1, 2009



Plorkyeran posted:

Most stuff doesn't, but no, you don't need separate binaries.

[snip]

You then compile foo_avx.cpp with -mavx and everything else with -msse.

One thing to be careful of when doing this: If the source files you compile with additional -m flags include any headers with inline functions or templates, those functions could be instantiated with AVX opcodes in them, and linked with the rest of the program due to the One Definition Rule. Then your entire program is dependent on AVX regardless.

So you either have to avoid including headers that have templates or inline functions in your source files using intrinsics, or alternatively only enable the machine flags on a per-function basis.
OpenTTD got a bug report about this issue recently, and it was solved with the per-function machine flag.

Microsoft C++ lets you use any intrinsics regardless of compiler flags, and don't have this issue.

Plorkyeran
Mar 22, 2007

To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed

Twerk from Home posted:

My interest in this is more than academic, we've got a cluster of varying vintages and also extremely vectorizable workloads. The sooner we're able to more effectively use AVX-512 while still being able to execute on Broadwell, the better.

If you're just running programs on machines you control and not distributing software to third-parties then it seems like it'd be easiest to just build a separate version for each architecture and push the complexity to however you're deploying the software on your cluster (unless your deployment mechanism is an utter nightmare or something I guess).

Beef
Jul 26, 2004
Compilers can give you vectorization advice, in the form of annotations added to your code. It can be really helpful. I used to rely on the Intel compiler and the vector advisor tool, but gcc was producing pretty good vectorized code too.

You don't have to split your own loops into canonical form etc. The compiler does that for you. The vectorization feedback can help you make slight tweaks to your code so the compiler can do that job better.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Beef posted:

Compilers can give you vectorization advice, in the form of annotations added to your code. It can be really helpful. I used to rely on the Intel compiler and the vector advisor tool, but gcc was producing pretty good vectorized code too.

You don't have to split your own loops into canonical form etc. The compiler does that for you. The vectorization feedback can help you make slight tweaks to your code so the compiler can do that job better.

Yeah, my thought through all of this is that I really would prefer to just try to keep the hot regions of code as tight loops without branching, and seeing if the compiler can get to a good-enough result. GCC's target_clones looks like a great approach towards that. In fact, while poking around a bit in Godbolt, I found out that GCC's target_clones will produce code that uses the AVX-512 registers, while compiling with -O3 -march=icelake-server -mtune=icelake-server will not!

code:
__attribute__((target_clones("default", "avx2", "arch=icelake-server")))
void loop(int a[], const int b[], const int c[], int size) {
    for(int i = 0; i < size; i++) {
        a[i] = b[i] + c[i];
    }
}
Here's the tiny dumb loop I've been poking at, and the only way I've gotten GCC 12.1 to use the zmm registers is via target_clones. It's interesting to me that march and mtune won't do it, but target_clones will. With those registers available, it seems like there's room for incremental improvements in a lot of more varied code as well.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
What's the most comfortable pattern for doing I/O stream filtering type work like compression or deserializing on a separate thread? Ideally i'm looking for a way to do something like a Boost filtering_streambuf that you can super easily pass a boost::iostreams::gzip_compressor() to to compress or decompress a stream, but have that work scheduled on a different thread than what is writing to or reading from the I/O stream.

The sanest way that jumps out at me is using a concurrent queue, and having the reader or writer just read or write from the queue and the actual source or sink has the filtering_streambuf, but then I'm adding another layer of buffering and batching to worry about.

I'd also appreciate a lay of the land for concurrent queues, in Java land I default to a basic BlockingQueue if performance isn't critical and the LMAX disruptor if performance is, but in C++ I have no idea what reputation the boost::lockfree:queue or just doing std::queue with a basic mutex have.

roomforthetuna
Mar 22, 2005

I don't need to know anything about virii! My CUSTOM PROGRAM keeps me protected! It's not like they'll try to come in through the Internet or something!

Twerk from Home posted:

The sanest way that jumps out at me is using a concurrent queue, and having the reader or writer just read or write from the queue and the actual source or sink has the filtering_streambuf, but then I'm adding another layer of buffering and batching to worry about.
It depends what you're doing with the output really. For example, if you're decompressing to disk or you can't do anything until you've decompressed the entire thing, then you could just use a promise or queue to notify of progress / that the task is finished, rather than loving with more buffering.

Envoyproxy does this kind of thing with moveable buffer-chunks and an event queue though, so if you need that degree of control you're not barking up the wrong tree.

Xarn
Jun 26, 2015
What is it with C++ tools and being incredibly half-baked?

clang-format? sounds good, but if you do something relatively normal like include all headers with angle brackets, its main header detection won't work, because that's just too novel
IWYU? sounds good, but it will suggest you remove all <> includes and replace them with "" includes, because that's what Google does, and the issue to maybe let users change that has been open for 5 years now
vcpkg? some things work properly in manifest mode, some work properly in classic mode, both of them are shite when you do something incredible like having transitive dependencies.
make? lol there is a space in some path get loving hosed
cmake? miserable pile of backwards compatible hacks, that let you do 70% of what you want with only reasonable amount of effort. If you are up to date with latest best practices. These are never actually documented anywhere and people like to tradeoff good eng. design for ease of use.
Compilers? DON'T GET ME STARTED ON THE loving COMPILERS

Xarn fucked around with this message at 21:03 on Jul 16, 2022

Foxfire_
Nov 8, 2010

Xarn posted:

you do something relatively normal like include all headers with angle brackets
That isn't relatively normal

Xarn
Jun 26, 2015
It is pretty normal, see e.g. Boost. It is also the objectively superior option.

Beef
Jul 26, 2004
I use both. Angle for system headers, quotes for local ones. Isn't that the intention?

Foxfire_
Nov 8, 2010

Xarn posted:

It is pretty normal, see e.g. Boost. It is also the objectively superior option.
Boost doesn't do that. The header-only parts of it inconsistently use angle brackets or quotes when accessing other public headers in itself. The few separately compiled parts of it consistently use quotes when including their internal headers as far as I can tell.

e.g. Locale's std\numeric.cpp has
code:
#include <locale>
#include <string>
#include <ios>
#include <boost/locale/formatting.hpp>
#include <boost/locale/generator.hpp>
#include <boost/locale/encoding.hpp>
#include <sstream>
#include <stdlib.h>

#include "../util/numeric.hpp"
#include "all_generator.hpp"
Why do you think always using <> is better?

Xarn
Jun 26, 2015

Foxfire_ posted:

Boost doesn't do that. The header-only parts of it inconsistently use angle brackets or quotes when accessing other public headers in itself. The few separately compiled parts of it consistently use quotes when including their internal headers as far as I can tell.


Ok, this might be an artifact of different libraries being made by different people. I opened Boost.Container and that looks like this

code:
// monotonic_buffer_resource.cpp
#include <boost/container/detail/config_begin.hpp>
#include <boost/container/detail/workaround.hpp>

#include <boost/container/pmr/monotonic_buffer_resource.hpp>
#include <boost/container/pmr/global_resource.hpp>

#include <boost/container/detail/min_max.hpp>
#include <boost/intrusive/detail/math.hpp>
#include <boost/container/throw_exception.hpp>
Then I opened Boost.DateTime and that one is just inconsistent between files :suicide:, so I will admit that it is not a uniform policy across Boost.


Foxfire_ posted:

Why do you think always using <> is better?


Combination of multiple factors

1) Includes using relative paths are stupid. I am not interested in reasoning about the current file location to know whether #include "../utils.hpp" will include cool-project/audio-manipulators/utils.hpp or cool-project/video-manipulators/utils.hpp.
2) The only part of include resolution defined by standard is that after the compiler preprocessor tries to resolve #include "some-identifier-string" and fails, it will try to resolve #include <some-identifier-string>. This means that #include "vector" will happily include stdlib's vector, as long as there isn't a file called vector right next to the file being processed. In practice, the usual suspects have agreed on "" search paths being local-relative and <> not.
3) I can tell whether a specific header is from my project based on the path prefix, I don't need to check for "" vs <>.

Taken together, using "" for includes introduces compile time overhead*, allows you to include different file than you thought** and doesn't provide any advantage in return.

* failed stat when the preprocessor looks for the path relative to current path
** I've actually had to deal with this just this week. At some point someone got sloppy, didn't include a file with full path from project root and due to somewhat messy include dirs, suddenly the include was picking up a different file than intended.

Xarn fucked around with this message at 23:02 on Jul 16, 2022

Xarn
Jun 26, 2015

Xarn posted:

so I will admit that it is not a uniform policy across Boost.


Actually if I am reading this right, you are supposed to use angled includes everywhere, but uh, it isn't enforced and people are bad at this consistency thing.

https://www.boost.org/development/header.html

quote:

Then both your code and user code specifies the sub-directory in #include directives.

Adbot
ADBOT LOVES YOU

Foxfire_
Nov 8, 2010

I think that's just about the publicly visible headers that would end up somewhere like /usr/local/include/boost, since you want #includes for peer boost stuff to not find conflicting names in the local project.

Headers specific to separately compiled libraries whose binaries end up in /usr/local/lib (the ones that are under libs/LibraryName/src/, not boost/include/LibraryName/ in the boost source code) use normal quotes since if there's a local vs system conflict while building that library, they want the local file. Like if you put a file named cpuid.hpp in /usr/local/include/, compiling Boost's Atomic library still wants its own file, not that one


Using from-the-base-of-your-project paths seems like a reasonable goal to me, but it'd be better implemented by sticking to the normal <> vs "" convention and modifying the user include path to include the root of the project instead of modifying the system include path (-iquote on gcc vs -I). One extra stat doesn't seem likely to actually matter to compile time enough to be worth doing something unusual.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply