So I'm trying to track down a bug in a multi-threaded program, which I believe comes from a third-party library I am using. After spending a significant amount of time in lldb, I realized that what is going on is that a vector of strings is being allocated over the same memory as an object in another thread, and then they mess with each other and eventually this causes a segfault. This is clear from looking at the addresses of the vector elements and the object, and from the fact that the the object's members clearly contain data from the vector (e.g. vector contains "poo", object has an int member which is 0x6f6f7003 at same address). The vector of strings is global static, and gets initialized at runtime with some code that looks like this:C++ code:
But I'm not quite seeing how this ends up actually over-writing another thread's object. My best guess is that the call to resize causes the data associated with s_items to be re-allocated, and that the thread doing that re-allocation then allocates that memory over the object from the other thread. But, I thought that heap allocation was thread-aware, so it seems like that sort of thing should be impossible. So my question is, can the fact that the s_items is global static mess with the heap allocator's thread awareness, thus failing to prevent a collision? Or is something else going on? This is all being compiled with clang++, if that matters.
|
|
# ? Sep 29, 2017 09:45 |
|
|
# ? May 24, 2024 17:20 |
|
hackbunny posted:correct! but the microsoft C runtime (until the "universal runtime" refactoring, which split the core C runtime from the compiler intrinsics library) was never meant to be used by other compilers, and they freely changed standard conformance and even the ABI from one version to the next. the "funniest" (in the "funniest home videos" sense of a football hitting someone in the crotch) one is probably msvcrt.dll, which while continuously upgraded with the latest bells and whistles, has to retain backwards compatibility with Visual Studio 6. i was once working on a soccer game for pc with dx5 or 6, and we came across a bug where the anim system would animate all the players apart from their left foot, which would stay at 0,0,0 - the floor under the players center of mass. but only on some pcs. it was eventually tracked down to the version of MSVCRT.dll - if we used the latest version, all the left feet stayed under the player while they ran. if we used the previous dll, it worked fine. i have no idea why or how it could affect what was essentially a for loop, but if theres a way to bugger up your code when you're not looking, vstudio 6 would do it. we shipped with statically linked CRT instead.
|
# ? Sep 29, 2017 11:15 |
|
having fun with the good old nullable boolean rn. hurray mysql.
|
# ? Sep 29, 2017 11:19 |
|
gonadic io posted:having fun with the good old nullable boolean rn. hurray mysql. Use MongoDB. And by "use MongoDB", I mean PostgreSQL
|
# ? Sep 29, 2017 12:00 |
ratbert90 posted:Use MongoDB. hoo gently caress i was ready to start fighting
|
|
# ? Sep 29, 2017 12:04 |
|
VikingofRock posted:
jesus christ why are you doing this in c++ code, make that a const size_t (or maybe a static const size_t depending on context)
|
# ? Sep 29, 2017 15:03 |
|
quiggy posted:jesus christ why are you doing this in c++ code, make that a const size_t (or maybe a static const size_t depending on context) hi quiggy hope you're good
|
# ? Sep 29, 2017 15:04 |
|
also the answer to your question is that thread support in c++ is janky at best and you shouldn't be invoking undefined behavior
|
# ? Sep 29, 2017 15:05 |
|
MALE SHOEGAZE posted:hi quiggy hope you're good i am well, thank you friend
|
# ? Sep 29, 2017 15:05 |
im starting to be really surprised with popularity of python in data analysis. more specifically, for some reason i assumed that pandas is much better than it actually is - im not sure its good for anything other than very small, low-dimensional datasets
|
|
# ? Sep 29, 2017 15:36 |
|
VikingofRock posted:So my question is, can the fact that the s_items is global static mess with the heap allocator's thread awareness, thus failing to prevent a collision? Or is something else going on? This is all being compiled with clang++, if that matters. sometimes these races can result in memory being freed on one thread while another thread is still using it. that can screw up the allocator's data structures, but more likely in this case the memory is just being reallocated for a new purpose, allocators like to return memory that's just been freed libc++ or libstdc++ would be the more important difference because of the difference in the layout of std::string. libc++ uses a small-string optimization that sometimes puts character data directly in the std::string object, libstdc++ doesn't (at least in older versions). when the string data is out-of-line, racy assignments can trigger malloc/free problems
|
# ? Sep 29, 2017 16:22 |
quiggy posted:jesus christ why are you doing this in c++ code, make that a const size_t (or maybe a static const size_t depending on context) quiggy posted:also the answer to your question is that thread support in c++ is janky at best and you shouldn't be invoking undefined behavior For the record, this is all in a third-party library that I am using (and now debugging). It's full of coding horrors, and the whole thing is written as "object oriented C" instead of C++. Shockingly the library actually seems to work when things are single-threaded, and it's only been recently when I've been scaling up my concurrency that I've started getting random segfaults coming from it. There's a C library which does the same thing as this C++ library, which I will probably be switching to, since that library has been re-written in the past few years to take concurrency into account. That library is a horrifying maze of #ifdefs, but at least it seems a little more battle-tested than this one, since its one of the most widely-used libraries in astronomy. So at this point I am mostly just trying to figure this out so I can be a good astronomy citizen and submit a good bug report / patch to the C++ library people at NASA. rjmccall posted:sometimes these races can result in memory being freed on one thread while another thread is still using it. that can screw up the allocator's data structures, but more likely in this case the memory is just being reallocated for a new purpose, allocators like to return memory that's just been freed This is using libc++, and the small-string optimization seems to definitely be in effect: when I hexdump the vector data, I can see the contents of the strings. My guess is that this is related to the former problem that you mention. In any case, this is all good enough for a solid bug report and suggested fix at this point. For when I got to submit a bug report / patch: Is this a good way to fix this in C++98? 99% of the C++ I've written has been C++11 or later, so I can never remember the idiomatic way to do things pre-C++11. C++ code:
|
|
# ? Sep 29, 2017 19:07 |
|
cinci zoo sniper posted:im starting to be really surprised with popularity of python in data analysis. more specifically, for some reason i assumed that pandas is much better than it actually is - im not sure its good for anything other than very small, low-dimensional datasets well, a lot of data scientists will do work with a much smaller representative data set that a lot of popular tools work well with, but only work well with machine-size sets then they bring in the systems people to scale all that up
|
# ? Sep 29, 2017 20:56 |
|
and then of course you have all the tooling for people that have the resources to run it like databricks and zeppelin and hue that let them screw around with larger datasets to begin with
|
# ? Sep 29, 2017 20:58 |
lancemantis posted:well, a lot of data scientists will do work with a much smaller representative data set that a lot of popular tools work well with, but only work well with machine-size sets lancemantis posted:and then of course you have all the tooling for people that have the resources to run it like databricks and zeppelin and hue that let them screw around with larger datasets to begin with i mean sure, if you've got the resource you can work with it, but for how much everyone is touting data science i did imagine that ~the package~ defining not exclusively numerical analysis would be able to not be somewhat hamstrung to two-dimensional data and datasets of about total_ram*0.2 maybe i vastly overestimate how much data people use on average (since in my domain answer always is a fuckton (the representative dataset = the dataset) and you are mathematically wrong if you do anything else) or how dimensional it is or vastly underestimate the clouds/serverfarms at disposal of an average pandas user cinci zoo sniper fucked around with this message at 21:16 on Sep 29, 2017 |
|
# ? Sep 29, 2017 21:11 |
not to say i dislike it or think its poo poo, im just mildly disappointed that i can make a strong case for using r over python in my jerb without really trying or pulling arguments by their ears
|
|
# ? Sep 29, 2017 21:14 |
|
VikingofRock posted:For when I got to submit a bug report / patch: Is this a good way to fix this in C++98? 99% of the C++ I've written has been C++11 or later, so I can never remember the idiomatic way to do things pre-C++11. if you're just trying to initialize your vector with the empty string, you should be able to do C++ code:
C++ code:
sadly c++98/03 don't have the c++11 initializer list syntax so you have to do it this ugly way instead
|
# ? Sep 29, 2017 21:24 |
|
also if you have any control over it please do not #define N_ITEMS, that's horrible bullshit and you shouldn't even be doing it remotely modern c let alone c++
|
# ? Sep 29, 2017 21:25 |
|
c++ can std::suck my std::dick
|
# ? Sep 29, 2017 21:27 |
|
JewKiller 3000 posted:c++ can std::suck my std::dick
|
# ? Sep 29, 2017 21:29 |
|
JewKiller 3000 posted:c++ can std::suck my std::dick i think these are from boost actually
|
# ? Sep 29, 2017 21:34 |
|
JewKiller 3000 posted:c++ can std::suck my std::dick remember to always #include<protection> in your fun time, nobody wants an std::dick
|
# ? Sep 29, 2017 21:38 |
|
VikingofRock posted:For when I got to submit a bug report / patch: Is this a good way to fix this in C++98? 99% of the C++ I've written has been C++11 or later, so I can never remember the idiomatic way to do things pre-C++11. moving to a global initializer is definitely better if they're fine with the initializer being executed eagerly during load. if they really want to make it lazy, they should move it into a static local variable and make sure they're compiling with thread-safe statics, which are the default on most compilers VikingofRock posted:Now that I think about it, the last thing that might be relevant here is that s_items is actually a member of a class. Not sure if that changes things (other than the syntax, slightly). begin a static class member shouldn't make any difference vs. being a true global
|
# ? Sep 29, 2017 21:39 |
|
cinci zoo sniper posted:not to say i dislike it or think its poo poo, im just mildly disappointed that i can make a strong case for using r over python in my jerb without really trying or pulling arguments by their ears Yeah, some of the cluster frameworks support R as well, so unless you drive them towards ones that are python/java/scala only they can always argue to continue on that path
|
# ? Sep 29, 2017 21:53 |
|
rjmccall posted:begin a static class member shouldn't make any difference vs. being a true global
|
# ? Sep 29, 2017 22:15 |
|
cinci zoo sniper posted:im starting to be really surprised with popularity of python in data analysis. more specifically, for some reason i assumed that pandas is much better than it actually is - im not sure its good for anything other than very small, low-dimensional datasets there's some pretty nice looking "live notepad" type deal that hooks into data sources and let's you just gently caress around with python queries against boxed datasets or something that I've seen people at work using but I haven't had time to try it. i mean, it's just like having ssms and a proper data source instead of 8 billion csv files but I guess without using God's own language aka SQL
|
# ? Sep 29, 2017 22:32 |
|
im a terrible (old) programmer and whenever i see BSS the first expansion that comes to mind is The Verve's
|
# ? Sep 29, 2017 22:33 |
rjmccall posted:moving to a global initializer is definitely better if they're fine with the initializer being executed eagerly during load. if they really want to make it lazy, they should move it into a static local variable and make sure they're compiling with thread-safe statics, which are the default on most compilers That's exactly what I thought, but I wasn't sure. Cool cool. And now that I think about it, I think this vector only gets used in the .cxx file, so I think I can actually just remove it as a class member altogether. Thanks for your help, everyone.
|
|
# ? Sep 29, 2017 22:35 |
Powerful Two-Hander posted:there's some pretty nice looking "live notepad" type deal that hooks into data sources and let's you just gently caress around with python queries against boxed datasets or something that I've seen people at work using but I haven't had time to try it. that's jupyter. r has something similar, and they both are equally worthless to do actual work. if you're 13.5x coding megaburrito or something that's your lazy coverup for presentation, but that's pretty much where it ends as for tons of csv files, i can imagine with some nosql garbage. our mongo currently works like a flipcoin if you're querying more than 2 weeks of data (a few hundo megabytes) at once, whereas postges/mysql is what you would expect, you can poo poo out entire database into a single csv if you want
|
|
# ? Sep 29, 2017 22:39 |
also not sure where i stand on the overall scale of sql usage in data analysis. all my coworkers like hundreds or thousands of line long sql scripts to do all in it, and i dont really see the point, and i just dont get the point do so if you aren't limited to sql and excel. i just pull the sql in a few lines and janitor up something actually maintainable in r instead, at a fraction of time or effort.
|
|
# ? Sep 29, 2017 22:42 |
|
the problem is that r is garbage while sql is extremely cool and good
|
# ? Sep 29, 2017 23:27 |
JewKiller 3000 posted:the problem is that r is garbage while sql is extremely cool and good for data my work deals in sql is poo poo too, you're limited to the most trivial of operations you must do and insistence to try to make an analytical tool out of sql will just lead to dumb poo poo like 80 kilobyte sql scripts my coworker writes for a single calculation that takes uhh, 100 lines i nr?
|
|
# ? Sep 30, 2017 00:08 |
"what do you mean saying you dont want to use my script"
|
|
# ? Sep 30, 2017 00:10 |
|
my experience of R is some guy kept emailing our team edl demanding that we install R for him and we got so fed up telling him to gently caress off we just deleted the edl but that was years ago an I think now you can integrate it into SQL server 2016 or something so who knows!
|
# ? Sep 30, 2017 00:18 |
Powerful Two-Hander posted:my experience of R is some guy kept emailing our team edl demanding that we install R for him and we got so fed up telling him to gently caress off we just deleted the edl microsoft bought out an r vendor and started strapping lots of poo poo for analytics together using the vendors stuff and their inhouse tooling/db stuff, but i wouldnt risk putting it all together. getting r to point where it is worthy of a non-local environment (or dedicated computing farm) involves disproportionate effort there also are libraries for r that allow you to query db directly, but as you imagine, they are absolutely inferior to using somethng sqlworkbench/j or datagrip or what have you, any db tool with decent developer. closest direct sql and r intersection that i can admit being legit useful is are the libraries that allow writing sql queries inside r environment, that can be useful if you are used to sql stuff. other than that, imo, a separation between r and rest of the world is due big smart r people still seem to have troubles figuring out this whole "reproducible analytics" thing for a reason
|
|
# ? Sep 30, 2017 01:03 |
|
is the new coding horror horror poster another how!! rereg?
|
# ? Sep 30, 2017 03:09 |
|
cinci zoo sniper posted:im starting to be really surprised with popularity of python in data analysis. more specifically, for some reason i assumed that pandas is much better than it actually is - im not sure its good for anything other than very small, low-dimensional datasets ultimately i think the question is 'what else are you gonna use', python's got a long history in scientific computing and sure it could be better but it's not loving matlab and now if you're gonna ask why python's got that history, again go back to 1997 and tell me 'what else are you gonna use', and remember you gotta make it palatable to scientists used to fortran and matlab, and consider the alternate reality in which the other major scripting language of the day won in more fields besides bioinformatics
|
# ? Sep 30, 2017 03:18 |
|
JawnV6 posted:is the new coding horror horror poster another how!! rereg? i like how!!
|
# ? Sep 30, 2017 03:18 |
|
i remember how!!
|
# ? Sep 30, 2017 03:21 |
|
|
# ? May 24, 2024 17:20 |
|
JawnV6 posted:is the new coding horror horror poster another how!! rereg? I have so far resisted the urge to reply and yell a lot I am proud of me
|
# ? Sep 30, 2017 03:31 |