Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
I'm taking over maintenance of a small C++ codebase, and it doesn't have an internally consistent code style. I've used JIndent previously for Java codebases, and am a big fan of Prettier on the js side.

Is clang-format a good tool to use here, or should I buy a JIndent license? I'm looking for something opinionated and just formats code and doesn't make me think. There's no specific style requirements, so I'm probably just going to reach for clang-format's google or LLVM presets and use those.

Adbot
ADBOT LOVES YOU

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
I have to get a C++ project from 2011 to build, and I'm having a hell of a time. I'd like to be able to just get a working binary out with as little fuss as possible, but right at the entry point in global scope it starts with
code:
using namespace boost;
using namespace std;
which causes pretty much any modern compilers and cstd libs to explode. I guess that when this was originally being developed, the version of boost that was included in the includes directory was synchronized to the stdlib version on the development machine?

Long story short, I've got a ton of ambiguous references, and I'm not even sure where to start. Just remove all "using" statements and be specific about namespace where things are called?

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Absurd Alhazred posted:

It's more that since then a lot of boost stuff emigrated to std. I think shared_ptr, unique_ptr, etc, started in boost. So just be specific; I think it might be easier for you to start by removing "using namespace boost", adding "boost::" to the names that aren't found, then maybe doing the same with "using::std", if you're not one of those freaks who is okay with that.

That's where I was starting. It looks like I might just need to update boost as well, because even after starting down that path, I'm still getting ambiguous pair from trying to include just a single boost class: "include/boost/functional/hash/extensions.hpp:34:33: error: reference to 'pair' is ambiguous" looks a hell of a lot like: https://marc.info/?l=boost-bugs&m=132481728414582&w=2 and that issue was fixed about 6 years ago: https://svn.boost.org/trac10/ticket/6323

OK. Updating boost it is!

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Absurd Alhazred posted:

That's usually a good idea anyway. :unsmith:

Not if the original boost library included with the code was customized! I guess I'm just hosed, because it was using a custom boost that had subgraph::remove_vertex implemented. I've been looking through old boost versions to figure out what version was originally included, and that function shows up in 1.34, was commented out in 1.45 right here, and was never anything but an assert(false) in between.

Either this is a boost that was pulled from the trunk during one hot minute when that was implemented and then undone before a version release, this is from some branch of boost that never merged to master, or the original developers of this application modified boost. Does this look familiar to anyone? This is the included boost subgraph.hpp, that doesn't match any released version of boost that I can find:
code:
// TODO: Under Construction
template <typename G>
void remove_vertex(typename subgraph<G>::vertex_descriptor u, subgraph<G>& g)
{
	if ( !g.is_root() )
	{
		typename std::map< typename subgraph<G>::vertex_descriptor, typename subgraph<G>::vertex_descriptor>::iterator
	        i = g.m_global_vertex.find(u);
		if ( i != g.m_global_vertex.end() )
		{
			g.m_local_vertex.erase( i->second );
			g.m_global_vertex.erase( i );
		}
	}
	remove_vertex( u , g.m_graph );
}
Another update: I found a built binary from the machine it was developed on. It fails the tests that binary was used to create, 7 years ago. I have no idea what's going on. I wonder if I can just paste in this implementation of remove_vertex.

Here's some discussion about why subgraph::remove vertex is marked wontfix: https://svn.boost.org/trac10/ticket/4752. I guess I don't really have a question anymore, but I wanted to vent about this insanity.

Twerk from Home fucked around with this message at 06:12 on Feb 11, 2018

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
I'm looking for a recommended IDE or refactoring tool that could help me un-gently caress codebases that are using namespace std; that are neglected for a few years, and then now running into namespace collision with the stdlib. The quick and dirty is to rename things so they don't collide, but I think it would save future pain to get out of std namespace and be clear where things are coming from.

Any suggestions?

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
I'm trying to revive an ancient C++ project for an academic lab. If anybody wants to actually snag a copy and follow along to help my dumb rear end, it's this: https://www.decode.com/software/allegro/

Original work seems to have been done in the late 90s-2003-ish. I need it to build on modern, supported in 2020 Linuxes only. It doesn't seem to specify a C++ standard, and I'm in way over my head with autotools, so I've resorted to hacking on the generated src/configure file. I'd really like to fix autotools, though.

The biggest issues seem to be that it's using macros that don't exist out of the box in modern autotools distros. Also, potentially I can simplify this by throwing Windows / PowerPC Mac / old-rear end Linux compatability out the window.

For example:
code:
AC_TRY_COMPILE([#include <hash_map>], [],
    [
        AC_MSG_RESULT(yes)
        AC_DEFINE(HAVE_HASH_MAP, 1, [have <hash_map>])
    ], AC_MSG_RESULT(no))
This block seems to cause <hash_map> to be detected, even when it's not there, which makes the application fail the first time it tries to define a type that uses hash_map:

typedef hash_map<string, Double, stringhash> String2Double;

Also, I could use a hand modernizing what seem to be a lot of null pointer checks that compare to 0, which are getting ambiguous overloads. Is it safe just to check truthiness / falsiness of a pointer to see if it's null? Alternatively, are there compiler flags I can set to make this build like it's 2003 without actually having to use a toolchain from 2003? I'd love to be able to run autoreconf instead of continuing to hack on the configure script, but there's several more macros that don't seem to exist on modern autotools, like:

code:
configure.in:48: error: possibly undefined macro: AC_CREATE_STDINT_H
      If this token and others are legitimate, please use m4_pattern_allow.
      See the Autoconf documentation.
configure.in:97: error: possibly undefined macro: AC_COMPILE_CHECK_SIZEOF
Other stuff that I'm fixing that I have questions about :
1. Missing imports to string.h and limits.h, how was this ever building? Were those present in some global context at some point?
2. I switched to using clang because the first place I got this to build in 2020 was locally on my Mac, and GCC is having a harder time dealing with this mess on both Mac and Ubuntu 18.04. This isn't a giant red flag, right?
3. Target platforms for this to run are Ubuntu 16.04 and newer, CentOS 6 and newer, and whatever SUSEs are supported right now. All of these can be assumed to have a modern-ish C++ stdlib where I can ditch a lot of the out of date detection of alternative stdlibs, right?
4. Where and how would I set a C++ standard to be able to build this thing on modern compilers without having to hack away at code like I'm doing? The dream would be a combination of flags that make this build & run, missing #includes and sketchy null pointer checking and all.

Twerk from Home fucked around with this message at 17:07 on Feb 2, 2020

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
I really appreciate all this help, this is starting to explain things. My day job is mostly JVM langs (Scala, Java, Kotlin, Groovy for Jenkins) and Typescript/JS stuff. I would appreciate basic C/C++ explanations, but am comfortable googling stuff or referring to texts too.

Transitive includes in stdlib explains why it was easier to get building on a Mac than Linux, but that's also easy enough to add and doesn't feel very dangerous.

The other issues I'm encountering are like:
code:
findlr.cc:245:26: error: use of overloaded operator '!=' is ambiguous (with operand types 'Outfile' and 'int')
      if (options->rfile != 0 && lm <= int(options->maxlocusforoutput))
Outfile extends fstream, this still fails with -std=c++98 even though my understanding is that streams are implicitly convertible to void* in C++ 98. I'm fixing these by just checking options->rfile instead of comparing to zero for example, and it seems fine.

code:
template<class Iter>
Iter next(Iter i) {
  i++;
  return i;
}
This project seems to have implemented its own Iterator template? I'm assuming this is compatible enough with std::next, so we'll just use std::next instead. Somebody tell me if this is different than std:: and I should rename it to myNext or something instead and continue using it.

code:
#ifdef HAVE_HASH_MAP
#include <hash_map>
#elif HAVE_EXT_HASH_MAP
#include <ext/hash_map>
using __gnu_cxx::hash_map;
#else
#error "No hash_map found!"
#endif // HASH_MAP

struct stringhash {
private:
#ifdef HAVE_HASH_MAP
  hash<const char*> hasher;
#elif HAVE_EXT_HASH_MAP
  __gnu_cxx::hash<const char*> hasher;
#endif
I can see that hash_map wasn't very portable, because autotools checks for it in several places, and then does this to use whichever is available. I'm assuming that I can just typedef unordered_map hash_map if they're interface compatible, but I haven't looked yet because <ext/hash_map> is still around.

If anybody wants to hear the gory details, the lab has been passing down golden binaries of this stuff that eventually stop running on newer OSes. In general, ancient things that were still working get "fixed" by rebuilding against a modern library, or running Valgrind to figure out which statically sized buffer overflowed, and then making it 10x bigger. This one is trickier than I've helped them with previously. I'd considered dockerizing this, but couldn't easily find a CentOS 3 or similar vintage docker image to be able to run it as-is, and if I get it building on modern machines it should be sufficiently flexible for recent supported OSes anyway. I'll confirm it builds on both Debian family stuff and Red Hat family stuff, though.

This is far from the oldest thing I've beaten into running on modern machines for them. Another recent one had a README that said:
code:
Unix executables for the Sun, Dec and Sgi are available at present.
and talked about using Code Warrior to compile it on Mac.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Dren posted:

building your own CentOS 3 or w/e docker image sure seems like it would be easier than actually porting this code

the problem you hinted at with “oh this buffer overflowed just make it 10x bigger, lol” speaks to a larger problem that without a good test suite you really have no idea if doing something seemingly innocuous like switching from hash_map to unordered_map or using std::next instead of some pre-stl iterator implementation will horribly break something in some subtle way.

On the other hand, if you can be reasonably confident this thing isn’t making deprecated syscalls, you could gin up an old as hell docker for it and it can live on for the foreseeable future.

I'm terrified that things are breaking in a subtle way, but I've been reassured that every time these tools are used, a trained human is evaluating the output and looking for questionable things. Additionally, many of these programs have other programs that exist solely to sanity check outputs. Most of these ship with small smoke tests, but certainly not an exhaustive suite.

My experience has been that scientists are incredibly reckless about software. These things still get used by dozens of labs and cited regularly, meaning still in active use for research. Pedcheck, another application written in the late 90s and with static sized buffers that cause memory corruption instead of outright crashes, was cited more than 100 times in the last 3 years. When fixing another broken program, I found out that labs will fork software all the time, meaning there's dozens of home-hacked versions of all of these tool sets out there, including some private forks of GPL things and unauthorized forks of commercial things. I'm doing due diligence by getting the Lab PI to try to reach out to the owners / maintainers of these things to see if they're interested in releasing an updated version that runs on modern machines with modern toolchains. We've had one success so far, with an application getting an update for the first time in more than 5 years!

Anyway, I tell my wife to keep full documentation about her processing pipelines around, so if there's a bug she can just re-run things: https://arstechnica.com/information-technology/2019/10/chemists-discover-cross-platform-python-scripts-not-so-cross-platform/

I'm excited about the idea of a "linux from 2003" docker image, but also concerned because these things now need more than 4GB of RAM because of how much bigger DNA chips have gotten since then and I'm worried that 64-bit linux stability was not good until some time after that.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
Can somebody direct me towards best practices for C project organization? I keep just seeing a ton of .c and .h files in a flat directory, and thinking there must be a better way to do this.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Phobeste posted:

Honestly just do the same poo poo as you would in any other language. Separate your functionality logically into separatable modules that hold contracts with each other, separate the files along the same lines, etc. Just like you would in python or anything else. Could do worse than looking at big c projects like the linux kernel or systemd or something.

Awesome, thanks for this. https://github.com/systemd/systemd/blob/main/docs/ARCHITECTURE.md is exactly the kind of thing that I was looking for, even though the projects I'm dealing with are unlikely to get to a fraction of the size.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
Is https://github.com/DLTcollab/sse2neon a decent way to go for getting code that is directly importing SSE intrinsics (<emmintrin.h>) working on arm64?

I'm just shoving it in there like this. Things are building & running, but I don't have great confidence that I'm using the appropriate type of ARM uarch or the full SIMD capabilities of the Apple M1 & friends.
code:
#ifdef __AARCH64_SIMD__
#include "sse2neon.h"
#else
#include <emmintrin.h>                 // Define SSE2 intrinsics. Will fail on ARM
#endif

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
Can someone point me to a crash course in autotools? I'm hacking away at https://github.com/samtools/htslib, experimenting with adding zstd as a compression format over libz or libdeflate, and I can't get the error message to yell at me if libzstd-dev isn't installed and available:

I've added
code:
AC_ARG_ENABLE([lzstd],
  [AS_HELP_STRING([--enable-lzstd],
                  [enable support for zstd, allowing build of bzstd])],
  [], [enable_lzstd=no])

if test "$enable_lzstd" != no; then
    zstd_devel=ok
    AC_CHECK_HEADERS([zstd.h], [], [zstd_devel=header-missing], [;])
    AC_CHECK_LIB([zstd], [ZSTD_compress], [], [zstd_devel=missing])
    if test zstd_devel = missing; then
MSG_ERROR([cannot find lzstd])
    fi
    pc_requires="$pc_requires libzstd"
    static_LIBS="$static_LIBS -lzstd"
fi
following the existing norms in https://github.com/samtools/htslib/blob/develop/configure.ac, but I honestly have no idea what I'm doing and need to take a step back and hit the books about Autotools. I'm historically a Java / C# / Scala / that family of things developer.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Plorkyeran posted:

You're missing a $ in if test zstd_devel = missing and so are just comparing some string literals.

There's also a bunch of nitpicky autotools things that that code isn't doing properly but I'm assuming you copied and pasted that from existing checks and so you should just stick with matching the existing style.

I am absolutely interested in hearing about the nitpicky autotools things!

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
What is current industry standard for C/C++ dependency management? Is it as big of a mess as it seems?

I've seen projects that check in source for all of their dependencies, or projects that use Git submodules to include complete repos of their dependencies, I've seen projects that say "this needs armadillo" with no mention of Armadillo version, I've seen a mix of all three of the above in a single project.

Just today I was installing https://github.com/xqwen/fastenloc, which doesn't have a word about its dependencies anywhere in the documentation, so I built it by running Make, seeing what headers it couldn't find, and installing packages based on header names. Is this normal? Is there a tool that can look at a project and tell me what dependencies it needs so that I wouldn't need to do repeated trial and error?

Also, they included a statically linked target for that application, but when building it I am warned:
code:
/usr/lib/gcc/x86_64-linux-gnu/7/libgomp.a(target.o): In function `gomp_target_init':
(.text+0x8b): warning: Using 'dlopen' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
This sounds to me like it will now only work against a single specific version of glibc, meaning that these static binaries are no longer portable between Linux distros with a different version of glibc?

Twerk from Home fucked around with this message at 02:09 on Sep 16, 2021

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
I've got some really basic Make questions, and realized that I have never really learned make, but rather just muddled through. It looks like I'm going to be spending a lot more time in C/C++ than I ever have over the last decade, so it's time to finally learn Make. Is there a better place to start than https://www.gnu.org/software/make/manual/make.html ?

I'm going to be dealing with a lot of scientific projects that probably only ever worked at a single university in their default computing environment, without any thought to portability. While my group is installing & using these projects I'm hoping that I can improve documentation and the build process a bit where it's easily possible. Here's a great example of a Makefile that I'm starting with: https://github.com/xqwen/fastenloc/blob/master/src/Makefile

code:
main: main.o controller.o sigCluster.o
	g++ -fopenmp -O3 main.o controller.o sigCluster.o  -lm -lgsl -lgslcblas -lboost_iostreams -lz  -o fastenloc
static: main.o controller.o sigCluster.o
	g++ -fopenmp -O3 main.o controller.o sigCluster.o  -lm -lgsl -lgslcblas -lboost_iostreams -lz  -static -o fastenloc.static
main.o: main.cc
	g++ -c  main.cc
controller.o: controller.cc controller.h
	g++ -fopenmp   -c  controller.cc
sigCluster.o: sigCluster.h sigCluster.cc
	g++ -c sigCluster.cc 
clean:
	rm *.o fastenloc
One thing that jumps out at me is that they forgot to -O3 where they actually compiled the code, meaning that the binary that this group uses is entirely unoptimized, right? Binary size changed a lot when I applied the -O3 flag to where we're actually compiling the .cc files, and it ran faster in some very small test data sets.

My goals to improve this are to:
  • Allow environment variables to extend or change compiler options
  • Let it build on GCC or Clang, and ideally on Mac or Linux
  • Turn on some warnings because the code quality is pretty awful

This is as far as I got before I realized I should read Make documentation, because I have no idea if what any of what I'm doing is following norms, nor do I have a good solution right now to be able to stick "-march=broadwell" or similar on there via an environment variable. I've also added the include and library paths for libraries installed via Homebrew on Mac, but I only want those to be present if we're building on a Mac. I'm also not sure how to handle clang vs gcc, because gcc just needs -fopenmp while clang needs that to be passed to the preprocessor with -Xpreprocessor.

code:
CXX=g++
CXXFLAGS=-O3 -Wall
INC=-isystem /opt/homebrew/include # Mac libraries installed via Homebrew
LIBS=-lm -lgsl -lgslcblas -lboost_iostreams -lz -lomp
LDFLAGS=-L/opt/homebrew/lib

all: main

main: main.o controller.o sigCluster.o
	$(CXX) $(CXXFLAGS) $(INC) -Xpreprocessor -fopenmp $(LDFLAGS) main.o controller.o sigCluster.o $(LIBS) -o fastenloc
static: main.o controller.o sigCluster.o
	$(CXX) $(CXXFLAGS) $(INC) -Xpreprocessor -fopenmp $(LDFLAGS) main.o controller.o sigCluster.o $(LIBS) -static -o fastenloc.static
main.o: main.cc
	$(CXX) $(CXXFLAGS) $(INC) -c  main.cc
controller.o: controller.cc controller.h
	$(CXX) $(CXXFLAGS) $(INC) -Xpreprocessor -fopenmp -c  controller.cc
sigCluster.o: sigCluster.h sigCluster.cc
	$(CXX) $(CXXFLAGS) $(INC) -c sigCluster.cc
clean:
	rm *.o fastenloc

Can I get some tips? Should I just read the Make documentation and then come back and try again? Did I miss something in the original Makefile as distributed where it was actually optimizing, and I just misunderstood?

If I get this one cleaned up, there's a related, even more widely used project that has -fpermissive all over the place that I'd like to clean up.

Edit: It looks like to allow CXXFLAGS to be extended by environment variables, I just need to do
code:
CXXFLAGS+=-O3 -Wall
and then call make with an environment variable set, like CXXFLAGS=-march=broadwell make

Twerk from Home fucked around with this message at 21:38 on Sep 16, 2021

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Plorkyeran posted:

Cmake suffers a lot from all the outdated tutorials on the internet. If you do everything the modern way and require a cmake version from the last year, even moderately complex projects can be mostly straightforward and simple. A lot of features you'd consider incredibly basic were added very recently, though, so good luck actually finding a guide that tells you the right way to do things. At best you might find something which is twice as complicated as it needs to be because it's working around a missing feature, and if you're really unlucky it might tell you do to everything incorrectly.

Once you're off the happy path and trying to write your own functions or macros everything is completely insane still.

A basic CMake task that I expected it to be able to do trivially: Find generic Lapack on Ubuntu 20.04 in /usr/lib/x86_64-linux-gnu for static linking, which is used by OpenBLAS as well as generic BLAS and multiple other BLAS library options.

It can't until a bug that was fixed in 3.21, which is too new for me to ask people to use for something distributed: https://gitlab.kitware.com/cmake/cmake/-/merge_requests/6036. So instead, I'm just going to direct people to tell it where lapack is by hand with -DLAPACK_LIBRARIES=/usr/lib/x86_64-linux-gnu/liblapack.a .

I don't have an idea of how frequently cmake is releasing, but given that C build systems tend to use system packages for all of the libraries you're linking against, I was surprised to find that the Cmake version (3.16) distributed from Ubuntu 20.04 repositories is so old, and apparently too old to Just Work for basic tasks.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
Does anyone have any favorite books / references / articles / blog posts about static vs dynamic linking? I'm trying to wrap my head around a good plan for an HPC cluster that I inherited without much guidance.

The most common options in the HPC space seem to be either statically linking everything, or putting a network share on LD_LIBRARY_PATH and putting all of the .sos that binaries need there. I didn't know about these approaches and have just been merrily installing all of the library dependencies across the cluster with Ansible, but I think it's time to fall more in line with norms.

More broadly on the topic of static vs dynamic linking, I found this approach pretty pragmatic, to dynamically link, but ship all of your dependencies other than libc with your binary.

Am I overthinking this and just static linking everything is the way to go for portable binaries of software that's built outside of a system's repository infrastructure?

Some relevant thoughts about why static linking is king for scientific computing: https://ro-che.info/articles/2016-09-09-static-binaries-scientific-computing

Twerk from Home fucked around with this message at 04:14 on Sep 29, 2021

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
I'm back with more CMAKE troubles.

I'm trying to statically link an application that's using BLAS/LAPACK with a CMAKE build. I am intending to have this support both OpenBLAS an Intel MKL. Cmake is finding my libraries just fine with FindBLAS, like this:

code:
-- Found BLAS: /usr/lib/x86_64-linux-gnu/libmkl_intel_lp64.a;/usr/lib/x86_64-linux-gnu/libmkl_sequential.a;/usr/lib/x86_64-linux-gnu/libmkl_core.a;-lpthread;-lm;-ldl  
However, because finding BLAS is sticking -lpthread;-lm;-ldl on the end of the BLAS library string, when it comes time to link, CMake is marking those dynamic. The link.txt that CMake is generating that is what the linker is called with ends up looking like this, while what I am wanting is an entirely static link:

code:
-static -Wl,--start-group -Wl,-Bstatic -lmkl_intel_lp64 -lmkl_sequential -lmkl_core \
-Wl,-Bdynamic -lpthread -lm -ldl -Wl,-Bstatic -lmkl_intel_lp64 -lmkl_sequential -lmkl_core \
-Wl,-Bdynamic -lpthread -lm -ldl -Wl,--end-group \
-larmadillo /usr/lib/x86_64-linux-gnu/libboost_iostreams.a -Wl,-Bstatic -lz -lpthread -lgfortran -lquadmath -Wl,-Bdynamic
I need some way to get CMake to stop stuffing -Bdynamic and -Bstatic all over the place and just let -static do its work. I hacked through this to actually get, validate, and profile a static executable by editing CMake's generated link.txt file and replacing every -Bdynamic with -Bstatic, and it accomplished what I wanted. The best solution would be if CMake wouldn't put those there in the first place. Ideas?

Edit:

I worked around this by doing an incredibly un-Cmake thing and manually passing it the linker flags I want it to use, so now my build is incredibly un-portable and Intel MKL only to build a static build. I think that I really want two completely different CMakeLists.txt , one that is able to build on most machines letting Cmake find the libraries, and using the libraries it finds to build & link a dynamically linked executable, and another entirely separate build process that is extremely tightly coupled to Ubuntu 20.04 with Intel MKL, but produces a static executable that will run everywhere. Thoughts about that approach?

Twerk from Home fucked around with this message at 23:01 on Oct 1, 2021

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Winter Stormer posted:

Sucks, but you do what you gotta do sometimes.

For what it's worth, I tried to recreate your problem on Fedora and failed.

pre:
[khulud@phoenix build]$ cat ../CMakeLists.txt
cmake_minimum_required(VERSION 3.10)
project(test)

add_executable(exe main.cpp)
set(CMAKE_EXE_LINKER_FLAGS -static)

set(BLA_STATIC on)
find_package(BLAS)

target_link_libraries(exe PRIVATE BLAS::BLAS)
[khulud@phoenix build]$ cmake --version
cmake version 3.20.5

CMake suite maintained and supported by Kitware (kitware.com/cmake).
[khulud@phoenix build]$ cmake -G Ninja ..
[...]
-- Found BLAS: -Wl,--start-group;/opt/intel/oneapi/mkl/2021.4.0/lib/intel64/libmkl_intel_lp64.a;/opt/intel/oneapi/mkl/2021.4.0/lib/intel64/libmkl_sequential.a;/opt/intel/oneapi/mkl/2021.4.0/lib/intel64/libmkl_core.a;-Wl,--end-group;-lpthread;-lm;-ldl  
-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/wow/build
[khulud@phoenix build]$ ninja
[2/2] Linking CXX executable exe
[khulud@phoenix build]$ ninja -t commands
/usr/bin/c++    -MD -MT CMakeFiles/exe.dir/main.cpp.o -MF CMakeFiles/exe.dir/main.cpp.o.d -o CMakeFiles/exe.dir/main.cpp.o -c ../main.cpp
: && /usr/bin/c++  -static CMakeFiles/exe.dir/main.cpp.o -o exe  -Wl,--start-group  /opt/intel/oneapi/mkl/2021.4.0/lib/intel64/libmkl_intel_lp64.a  /opt/intel/oneapi/mkl/2021.4.0/lib/intel64/libmkl_sequential.a  /opt/intel/oneapi/mkl/2021.4.0/lib/intel64/libmkl_core.a  -Wl,--end-group  -lpthread  -lm  -ldl && :
[khulud@phoenix build]$ file exe 
exe: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, BuildID[sha1]=2f2cf350d5012a37de7b552295e438dd0d334552, for GNU/Linux 3.2.0, not stripped, too many notes (256)
[khulud@phoenix build]$ ./exe
Major version:           2021
Minor version:           0
Update version:          4
Product status:          Product
Build:                   20210904
Platform:                Intel(R) 64 architecture
Processor optimization:  Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors
================================================================

I really appreciate this! I was going to post a minimal example to pull this out from all the other libraries I'm linking hoping to get some help like this, but you went and did it for me. I'm going to fix up my CMakeLists and probably be in business. I'm a CMake novice muddling through, and what I had was more like:

pre:
cmake_minimum_required(VERSION 3.0)
project(main)

set(CMAKE_CXX_STANDARD 17)

set(BLA_STATIC ON)
find_package(LAPACK REQUIRED)
find_package(BLAS REQUIRED)

... other libs and all project support files

target_link_libraries(main -static)
target_link_libraries(main ${BLAS_LIBRARIES})

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Winter Stormer posted:

I futzed with it a little further just now and did manage to break my build by setting cmake_version_required < version 3.3. Since that function sets CMake policy versions, I went looking at what policies 3.3 added, and I'm pretty certain you're running into the problem fixed by https://cmake.org/cmake/help/latest/policy/CMP0060.html. Try setting your minimum to be 3.3 or better.

I checked out the commit that had me pulling my hair out, set cmake_minimum_required to (VERSION 3.16), and everything worked. That would have saved me about 2 hours of trying things and deep diving on how CMake chooses how to interpret linker flags.

Thanks!

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
What's the industry standard way to link against an older version of glibc to allow a dynamic executable to be portable between say, Centos 7 and newer, Debian 18.04 and newer? I'd prefer to actually do the build in a newer distro, so I was thinking maybe the way to go is to build glibc locally from a git tag of the version I want to target, and link against that?

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

pseudorandom name posted:

Build using mock targeting your desired minimal distribution version.

Although you’re still likely to run into incompatibilities between Debian and Red Hat distros.

Thanks, I'll check that tool out.

Is this likely even if the only dynamically linked libraries are libc, libpthread, and libm?

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
I'm trying to get an application that's using the simde library for portable SIMD to run happily on Apple Silicon. So far, the biggest tripping point is that ARM NEON bit-shifting operations can only be called with a compile-time constant offset. For example, I'm calling _mm_slli_epi64, which simde is mapping onto these NEON instructions like vshlq_n_s16. I can't find an elegant way to make sure that these always are called with compile-time constants, as we had wrapper functions around them and as I understand it, there's no way to guarantee that a function is called only with compile-time constants, so I changed the functions into macros.

The other problem that I had is a small number of places where these were called with runtime variables, which I could build a lookup table like this project did: https://github.com/VectorCamp/vectorscan/pull/81/files#diff-1f738fe3ab986614e926b3ce01fcd42dbf3e8ccb79337931af69450f56319554R178

Doing a lookup table like that feels incredibly repetitive and inelegant, is there some way with templates to create something like this over a range of numbers that I define?

code:
VecW vecw_slli_lookup(VecW vv, uint8_t ct) {
  switch(ct) {
    default: return _mm_slli_epi64(vv, 0);
    case 1: return _mm_slli_epi64(vv, 1);
    case 2: return _mm_slli_epi64(vv, 2);
    case 3: return _mm_slli_epi64(vv, 3);
    case 4: return _mm_slli_epi64(vv, 4);
    case 5: return _mm_slli_epi64(vv, 5);
    case 6: return _mm_slli_epi64(vv, 6);
    case 7: return _mm_slli_epi64(vv, 7);
  }
}

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Jabor posted:

NEON has "shift by a runtime value" operations (e.g. vshlq_s16 instead of vshlq_n_s16), it seems strange that your portability library is using the compile-time-constant-only variant of the intrinsic to emulate an SSE instruction that isn't restricted to compile-time constants.

I looked at vshlq_s16, it looks like it takes a vector instead of an integer as the argument to shift by, so if I wanted to bit-shift a whole vector by 4, I'd need to create a second vector of equal length and fill it with 4s.

I appreciate all of the suggestions that people gave here! The project is using C++11, and I think that the lookup table will be easier to live with, given that it's only needed in one place. The value to shift by is actually determined at runtime, so maybe I should inline the function to keep overhead reasonably small.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
Using GNU autotools, what is the best practice for managing config.sub and config.guess? Should they be checked into the repository at all?

I'm updating a project that uses GNU autotools so that it will build and run on the new macs. It has config.guess and config.sub checked into the project, but running autoreconf -fi did not do anything to update them. I'm coming at this with very little existing autotools knowledge, so basically all that I know is that these scripts are used to standardize strings about the specific platform and architecture, and older versions won't have any knowledge of aarch64 Macs.

I hacked through this by copying in the copies of config.guess and config.sub that were installed with my system automake, at /opt/homebrew/Cellar/automake/1.16.5/share/automake-1.16/ because I installed autotools with homebrew.

Is there some way to set up autotools to use the system config.sub and config.guess instead of checking it into the project?

This is basically all that I can find as far as official documentation: https://www.gnu.org/software/gettext/manual/html_node/config_002eguess.html

Twerk from Home fucked around with this message at 17:32 on Feb 11, 2022

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Plorkyeran posted:

There don't appear to have been any apple silicon-related changes to config.sub or config.guess.

The intended use of autotools is that building the project doesn't require having autotools installed, and the project should include all of the generated files in its distribution package. If the git repository is the primary way for users to get the project, that implies they should be committed to the repository.

Interesting that it's not Apple silicon related, when I ran ./configure without replacing the config.sub and config.guess, I saw this:

pre:
checking build system type... configure: error: /bin/sh ./config.sub -apple-darwin21.3.0 failed
and when I ran config.sub directly to see why this was failing, I see:

pre:
$ src/config.sub -apple-darwin21.3.0 
config.sub: invalid option -apple-darwin21.3.0
Try `config.sub --help' for more information.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Plorkyeran posted:

That was incidentally fixed by a change in 2004 which made it no longer have a hardcoded list of possible processors.

autoreconf -ivf should be overwriting the existing config.guess and config.sub files with the up-to-date ones. If it's not, you can also try running automake --add-missing --copy --force-missing directly. Copying them over manually is fine too if you don't feel like trying to debug why autotools isn't doing the right thing (but you probably will need to regenerate the configure script with a modern version of autoconf and if automake isn't working that might not be either).

Sweet! I had run autoreconf -ivf and it did not replace them. It did update the configure script, and put me into a state where I had a new configure script and old, broken config.sub and .guess.

Thanks!

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
What's best practice for linking boost libraries? Static or dynamic?

I got blindsided by some dynamically linked binaries that I had built against boost on Ubuntu 18.04 not working on 20.04, because the version of boost has changed between the two, and felt like a fool.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

ultrafilter posted:

Do you care about the size of your executable? If not, go static.

In general I'm good with fully static binaries, in reality there seem to be some really hard to statically link libraries, including glibc and some math libraries I've been working with that use dlopen. The practical middle ground seems to be statically linking most libraries, and dynamically linking glibc. I've also mistakenly thought that I got something working with static linking, only to have it entirely fail when exercising some paths in the application.

I guess that I could also just bundle the .sos that it was linked against with the binary and use a start script to set up LD_LIBRARY_PATH.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

ultrafilter posted:

I guess I've been lucky cause I've never seen anything like that. How can static linking succeed but something fail at runtime? Is it because something brings in a .dll/.so?

It must have been, it was a project that uses BLAS / LAPACK that I was attempting to update to build two separate versions, one with Intel MKL and one with OpenBLAS.

Linking Intel MKL is insanity already, they've got a tool to tell you what link line arguments to generate: https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-link-line-advisor.html and then I'm trying to take that and shove it into CMake.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
Two questions:

I'd like a reference textbook around for C, ideally one that has some best practices or common patterns. I'm 10 years into a mostly Java, Javascript, .NET, etc career and found myself in a place where we need to make changes to some older tools implemented in C, and nobody else has ever managed their own memory either, so I'm on my own. What's a good textbook to have around? All that I have right now is my college textbook that I held onto, "C Programming: A Modern Approach" by King: https://www.amazon.com/C-Programming-Modern-Approach-2nd/dp/0393979504

What's a good pattern for reading in huge lines from a file, when I really want the whole line in memory at once and don't know how big it is? These lines may be textual and end with a newline character, or may be binary and terminated by another specific multi-byte pattern. I'm updating older human genetics analysis tools to work at modern biobank-scale datasets, and running into a lot of fixed size buffers without bounds checking, which are clearly failing spectacularly. Basically, I need to allocate enough space to hold a line of data and I don't know how big it is. I verified that just upsizing these buffers will make it work for some datasets, but I'd like to fix this according to best practices.

Edit: I just discovered getline. I am a fool. I will use getline.

Let's start with textual files with lines ending in \n, and then I'll figure out binary files with multi-byte line endings later. The existing application is using fscanf all over the place without bounds checking and constantly corrupting itself. The options that my naive rear end came up with to fix this are:

fgets into a buffer, if the copied string doesn't contain a newline, then realloc to make the buffer bigger and read again. Do I want to double the size of the buffer every time?

fgets into a buffer, if it fills up fgets into another buffer and keep an array of pointers to these buffers. When I get to the end of line eventually, allocate a new buffer that's the right size by adding together the sizes of all of the smaller buffers I've allocated.

Allocate an enormous buffer in the first place and let virtual memory sort it out. Un-used memory that I never write anything to will never have any physical memory assigned to it, right? So if I over-allocate a huge buffer and only use the first 1/4 of it, then there won't be any problems caused by the unused end of the array because it never gets mapped to physical memory and just occupies virtual memory space.

Just mmap the drat file.

The machines that this will be running on have the smallest node at 512GB of memory or larger, and extrapolating based on memory usage of these tools smaller files makes us think that we're looking at several hundred gigs of memory being used. Which of these strategies sounds sanest for this?

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

roomforthetuna posted:

I don't know if it's best practices, but Envoy is a performance-oriented networking thing with buffers that don't know how big they're gonna need to be. They're using C++, but you could do the same behavior in C with a little more effort. Their method involves a buffer class that essentially represents a linked list of arbitrary-sized chunks of data, and knows its own total size. It has a whole bunch of helper stuff to minimize copying, like you can move any amount of data between buffers and it moves the chunks rather than copying them if the amount you're consuming goes over whole chunks.

But that's a model for handling data that's streaming from network. If all your data is in files and you're going to be operating on it non-sequentially and mmap is supported by the platform then using mmap seems like a no-brainer, let the operating system figure out what's worth caching in RAM and what's not. It usually does a pretty good job, and if you're reading unknown-sized lines with terminators, you're obviously starting from a data design that wasn't planned for doing anything more efficient than the OS will do with mmap.

Thanks to both of you for the input!

The filesystem in question is CephFS, which looks like it supports mmap with only a few small caveats, but gives no guidance about how mmap would perform. I'll start there.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
I'm trying to expose a Python API to a binary that we have, which means packaging it as a shared library that Python will then open, I'm pretty sure with dlopen. However, I've got a couple of questions about how to link and package this thing.

We've got a lot of libs, some of which are enormous (intel MKL). A lot of these libs we have been statically linking as static libraries. Is it possible to statically link archives into an .so? It seems not, which means that I'm going to be looking at reworking the packaging so that I ship dependencies, which is going to make for an enormous Python module. I looked at how Numpy & similar do this, and they link to the system BLAS libraries, which can make a big performance trap if the reference BLAS & LAPACK are being used: https://stackoverflow.com/questions/37184618/find-out-if-which-blas-library-is-used-by-numpy

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Zopotantor posted:

You can absolutely link static libraries into a dynamic one.

Zopotantor posted:

An executable also doesn’t need to (but can) be position independent.


Sweet! What I ran into initially that made me think that static linking .as into an .so is not easy is that my .as seem to have been built without PIC, while my .so needs to be (or maybe is just chosen to be?) position independent. So it looks like I may need to build these dependencies from source with -fPIC on in order to link them into an .so. Either that or find archives that are built position independent.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
I'm pulling my hair out here trying to figure out why an application won't build inside of conda, but builds just fine with system libraries. It's a brand new conda environment from the conda-forge channel with nothing in it but python (3.10 by default) and r-devtools.

I cannot get the macros in <cinttypes> to work inside of the conda environment: https://www.cplusplus.com/reference/cinttypes/

code:
#include <cinttypes>
...
error: expected ')' before 'PRId64'
out += std::sprintf(out, "%" PRId64, v);
It compiles just fine against system libraries, so I'm sitting here trying to figure out what's different about condas set of libraries and headers. It ships its own complete C++ environment, including all of the standard headers. Conda's <cinttypes> looks like this, and so does the system one. When I compile with the system compiler and libraries on Ubuntu 20.04, the macros work and life is good.

code:
#ifndef _GLIBCXX_CINTTYPES
#define _GLIBCXX_CINTTYPES 1

#pragma GCC system_header

#if __cplusplus < 201103L
# include <bits/c++0x_warning.h>
#else

#include <cstdint>

// For 27.9.2/3 (see C99, Note 184)
#if _GLIBCXX_HAVE_INTTYPES_H
# ifndef __STDC_FORMAT_MACROS
#  define _UNDEF__STDC_FORMAT_MACROS
#  define __STDC_FORMAT_MACROS
# endif
# include <inttypes.h>
# ifdef _UNDEF__STDC_FORMAT_MACROS
#  undef __STDC_FORMAT_MACROS
#  undef _UNDEF__STDC_FORMAT_MACROS
# endif
#endif
I can see that _GLIBCXX_HAVE_INTTYPES_H should be defined in c++config.h, but when I examine my whole include path with g++ -H , i don't see c++config.h on it at any point. Should c++config.h be automatically included by the compiler? I haven't found much discussion around this, other than this: https://github.com/nfrechette/sjson-cpp/issues/15. I did find c++config.h way down inside the conda environment in x86_64-conda-linux-gnu/include/c++/9.4.0/x86_64-conda-linux-gnu/bits/c++config.h , and it has #define _GLIBCXX_HAVE_INTTYPES_H 1 .

I've been able to power through this by manually adding -D__STDC_FORMAT_MACROS to the Makevars, which feels wrong, but starts making the macros work when building in Conda.

I guess it's also possible that something on the conda include path is including <inttypes.h> before this, and without __STDC_FORMAT_MACROS set then it's not applying the macros, which would explain why when I set it, it works. I didn't see anything including inttypes.h directly when I looked at all of the includes with g++ -H, though!

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
OK, I figured out some of my conda mess!

Before #including <cinttypes> , another header in the include chain was including <inttypes.h>, the C version of this header. The system <inttypes.h> on Ubuntu 20.04 looks like this, matching a recent glibc: https://github.com/bminor/glibc/blob/master/stdlib/inttypes.h

But the <inttypes.h> included with conda includes this!
code:
/* The ISO C99 standard specifies that these macros must only be
   defined if explicitly requested.  */
#if !defined __cplusplus || defined __STDC_FORMAT_MACROS
It looks like conda is shipping headers that are from before this change where that code was removed!: https://sourceware.org/git/?p=glibc.git;a=commit;h=1ef74943ce2f114c78b215af57c2ccc72ccdb0b7

But when I check conda info, I see that it self-reports that it's using glibc 2.31, which is pretty recent. Is it sane for an environment to have old headers and a new .so for glibc, because it looks like that's what's happening.

Also, I have questions about the C99 / C11 standards now, because C99 says that the macros in <inttypes.h> shouldn't work unless __STDC_FORMAT_MACROS is defined, but it looks like C11 reverted that change and now they always work. Wouldn't this make it impossible for a compiler to fully implement C99 now because the headers are not following the C99 spec?

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
How can I package executables that will use AVX2 / AVX512 if available, but do not require it? I've been packaging some scientific applications into Docker containers to make them easier for our group to use and deploy, and although we're able to require AVX2 and will move to requiring AVX-512 in the near future, I'm curious what best practices are around distributing binaries.

A concrete example: https://odelaneau.github.io/shapeit4/

I was thrilled initially to see that it had already been packaged for Debian / Ubuntu by the Debian med team, which was going to make installation really easy: https://packages.ubuntu.com/jammy/shapeit4. However, their patchset has disabled AVX2, which this application really benefits from. This is common for a lot of bioinformatics tools, we've got ones that speed up 80% using AVX512 over AVX2 even.

Debian / Ubuntu package maintainer patchset: https://salsa.debian.org/med-team/shapeit4/-/blob/master/debian/patches/use_shared_libs.patch

How could something like this be built so that it uses AVX2 or AVX512 if available, but is still able to run on older CPUs without it? Or is "must have AVX2" a sane requirement for a modern widely consumed docker image? The only strategy that I'm personally familiar with is building an executable per instruction set, and having an entry-point executable that examines the CPU and chooses which one to launch. bwa-mem2, which was written mostly by Intel engineers, does that: https://github.com/bwa-mem2/bwa-mem2

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
I'm back with more built-in questions, specifically about https://gcc.gnu.org/onlinedocs/gcc/x86-Built-in-Functions.html

https://gcc.gnu.org/onlinedocs/gcc/x86-Built-in-Functions.html posted:

If you specify command-line switches such as -msse, the compiler could use the extended instruction sets even if the built-ins are not used explicitly in the program. For this reason, applications that perform run-time CPU detection must compile separate files for each supported architecture, using the appropriate flags. In particular, the file containing the CPU detection code should be compiled without these options.

This suggests that if I use something like -mavx2, the compiler may emit AVX2 instructions as optimizations for code even in places that I am not explicitly using AVX2 intrinsics myself, right? That makes it seem like to create a binary that uses AVX2 or AVX512 if available, I am required to compile multiple binaries and have an entrypoint binary that detects CPU feature support and launches the correct binary based on that.

Am I reading this right? How does anything get packaged correctly such that it uses features if available, but is able to run on CPUs that don't support those instructions?

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
My mental model of compiler SIMD flags was extremely primitive and wrong. Now I'm curious how effectively the compiler will auto-vectorize tight numerical loops with -march=native -mtune=native -O3 (or is Linus right and -O2 is better in most situations due to smaller code generated?)

I'm guessing that the compiler still isn't willing to do things like pad vectors or have a vector loop followed by a scalar loop to get the leftovers, but I hadn't even thought of being able to fit small structs entirely in 256 or even 512 bit registers.

It looks like the default target is only SSE2 if you don't pass any of this flags, but also it looks like passing say, -mavx2 implies AVX1, SSE4.2, POPCNT and all.

My interest in this is more than academic, we've got a cluster of varying vintages and also extremely vectorizable workloads. The sooner we're able to more effectively use AVX-512 while still being able to execute on Broadwell, the better.

Edit: Also, it looks like mtune is pretty important. I'm seeing that the default -mtune generic produces some code that leaves a good amount of performance on the table for modern Intel or AMD CPUs: https://stackoverflow.com/questions/52626726/why-doesnt-gcc-resolve-mm256-loadu-pd-as-single-vmovupd . I can think of several applications off the top of my head that are using -mavx2 -mfma without -mtune.

Twerk from Home fucked around with this message at 05:00 on Jul 8, 2022

Adbot
ADBOT LOVES YOU

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Beef posted:

Compilers can give you vectorization advice, in the form of annotations added to your code. It can be really helpful. I used to rely on the Intel compiler and the vector advisor tool, but gcc was producing pretty good vectorized code too.

You don't have to split your own loops into canonical form etc. The compiler does that for you. The vectorization feedback can help you make slight tweaks to your code so the compiler can do that job better.

Yeah, my thought through all of this is that I really would prefer to just try to keep the hot regions of code as tight loops without branching, and seeing if the compiler can get to a good-enough result. GCC's target_clones looks like a great approach towards that. In fact, while poking around a bit in Godbolt, I found out that GCC's target_clones will produce code that uses the AVX-512 registers, while compiling with -O3 -march=icelake-server -mtune=icelake-server will not!

code:
__attribute__((target_clones("default", "avx2", "arch=icelake-server")))
void loop(int a[], const int b[], const int c[], int size) {
    for(int i = 0; i < size; i++) {
        a[i] = b[i] + c[i];
    }
}
Here's the tiny dumb loop I've been poking at, and the only way I've gotten GCC 12.1 to use the zmm registers is via target_clones. It's interesting to me that march and mtune won't do it, but target_clones will. With those registers available, it seems like there's room for incremental improvements in a lot of more varied code as well.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply