|
Plorkyeran posted:Asymptotic performance isn't everything. Caching is incredibly important - a cache line is 64 bytes wide on a modern CPU, which means it always loads at least that much per memory access. In my case, the entire datastructure happened to fit in one most of the time.
|
# ? Jan 20, 2016 22:12 |
|
|
# ? Jun 1, 2024 12:07 |
|
I worked on adapting an image filtering algorithm from a reference implementation by [Some Researcher] about a year or so back. The reference had input data laid out in scanlines with each color channel batched, so you first had all the R values for line 0, then all the G values, then all the B values, then repeat for line 1 and so on. To output the final value for a filter the filter itself then accessed all the surrounding pixels in a nice Gaussian region (no, the filter wasn't separable). A colleague quite accurately described this as "one huge cache miss". I think we got a 3x-5x speedup by just adjusting the layout to be in cache line-appropriate 4x4 pixel tiles. In short: sometimes access patterns really matter. Hashmaps have famously crappy ones. Especially std::unordered_thing, since in most implementations they're implemented as std::lists with a table of pointers into the list attached, which is also the sort of thing one might characterize as one huge cache miss.
|
# ? Jan 20, 2016 23:15 |
|
Xerophyte posted:In short: sometimes access patterns really matter. Hashmaps have famously crappy ones. Especially std::unordered_thing, since in most implementations they're implemented as std::lists with a table of pointers into the list attached, which is also the sort of thing one might characterize as one huge cache miss. All of the STL sets and maps are more-or-less required to be implemented with separately-allocated nodes, since the spec says that pointers and iterators into the collection can only be invalidated by removing the referenced entry. That includes std::unordered_foo. Beef posted:It is pretty hard to beat simple linear search under 10k elements. And even if hashing beats you under that number, it's because you are doing a silly microbenchmark and all the buckets get loaded into your cache hierarchy. Linear search can definitely beat hashing on small sets, but this is a wildly exaggerated claim. If you're scanning an array of very small structures (such that you can get 4-8 elements per cache line) and doing a single integer/pointer comparison per element (as opposed to something that requires touching out-of-line memory, like comparing strings), then yes, the threshold vs. a chaining hashtable might get as high as the low thousands if the loop gets properly unrolled and vectorized. But those assumptions also mean that hashing is very cheap, and a well-designed probing hashtable will probably be able to probe once or twice without leaving the cache line. It also really depends on your construction and access patterns. If you build this data structure in one pass and then immediately execute a ton of queries against it while doing very little else, then that 10k threshold also means that your entire array will probably stay in cache. If you're incrementally building and querying this data structure while a ton of other things are going on, then your array will probably not be in cache every time you want to do a search, and the linear scan will displace everything else you're working on.
|
# ? Jan 21, 2016 03:35 |
|
nielsm posted:Making proper new build steps for MSBuild, especially the C++ compilation framework in it, can be quite a task. It's insane that there doesn't seem to be any kind of a "here's how to use proto files in visual studio in a way that isn't annoying" thing anywhere. Edit: Fuuuck, nmake can't handle it simply either because the double-extension .pb.h confuses it, and the only person asking about that I could find online didn't find a solution. I'm just going to use loving JSON for my data. Edit2: Time spent trying to get protobuf into a VC++ friendly state: an entire day without success. Time spent setting up working JSON starting from not even knowing what libraries are available and including rewriting my initial small sample data into JSON format: about 20 minutes. And this was starting from "pretty familiar with protobuf". I guess it's not protobuf's fault per-se that Visual Studio loving sucks at special build rules. (And apparently changes how it does them with every new version, getting worse each time.) roomforthetuna fucked around with this message at 08:45 on Jan 21, 2016 |
# ? Jan 21, 2016 04:44 |
|
rjmccall posted:All of the STL sets and maps are more-or-less required to be implemented with separately-allocated nodes, since the spec says that pointers and iterators into the collection can only be invalidated by removing the referenced entry. That includes std::unordered_foo. I agree, but it is a slight exaggeration, not wildly Branch prediction, prefetching and cache eviction strategies still work best on flat predictable data access patterns. Although in this discussion, the fact that stl::unordered_map fragments memory by allocating nodes on the heap is probably a worse violation of data locality than the hashed-key access. The point is: measure and optimize if something becomes a problem. Do not assume a fancy data structure will be faster because of O(1) or O(log) unless you know beforehand that you are dealing with a large amount of data. Even then, in my experience, problems that are large enough to be dominated by such asymptotic behavior (e.g. gene sequencing) get vastly more performance from exploiting domain-specific information than any one algorithm/datastructure choice. It's messy. Beef fucked around with this message at 13:02 on Jan 21, 2016 |
# ? Jan 21, 2016 12:03 |
|
rjmccall posted:All of the STL sets and maps are more-or-less required to be implemented with separately-allocated nodes, since the spec says that pointers and iterators into the collection can only be invalidated by removing the referenced entry. That includes std::unordered_foo. The spec also requires explicitly iterable buckets for the unordered sets/maps as I recall, which also feeds into one or more linked lists being the only real way to implement them. There's nothing wrong with that as such, but it does have quite an impact on coherency and if someone is using the STL containers where performance matters they need to be aware of the implementation and what that means for them. I think one consequence is that there are more situations where it's a good idea to roll your own vector-backed linear probing hash map than there are situations where the same is true for the other containers.
|
# ? Jan 21, 2016 13:31 |
|
Speaking of unordered_map, holy poo poo did I have a hard time finding a performance regression involving tbb::unordered_map. Using plain pointers as keys can be a really bad idea.
|
# ? Jan 21, 2016 15:55 |
|
Beef posted:I agree, but it is a slight exaggeration, not wildly Branch prediction, prefetching and cache eviction strategies still work best on flat predictable data access patterns. Although in this discussion, the fact that stl::unordered_map fragments memory by allocating nodes on the heap is probably a worse violation of data locality than the hashed-key access. For sure. Hash tables definitely defeat prefetching, and that can suck if your access pattern is likely to hit most of the table in one go, but the fragmentation of nodes is what really kills chaining implementations. Branch prediction... I mean, yes, you execute a lot of well-predicted branches in a linear scan, but only because you're executing a lot of branches. Hash tables are actually pretty good for branch prediction; if the table lacks collisions (true for most keys if you pay any attention to your hash code!), you can get down to the one unpredictable-except-in-context branch for whether the key is actually in the map, which you're going to get for any implementation. Beef posted:Even then, in my experience, problems that are large enough to be dominated by such asymptotic behavior (e.g. gene sequencing) get vastly more performance from exploiting domain-specific information than any one algorithm/datastructure choice. It's messy. A lot of programming problems are solidly in the "dozens" range where asymptotic behavior completely doesn't matter, but there's also plenty of work in the "thousands" range where ubiquitous use of quadratic algorithms can really drag down performance. Problems like gene sequencing in the "hundreds of millions" range require completely different approaches, but they also preclude the use of standard collections because of course you can no longer rely on keeping the whole data set in RAM. Compilers (my field) do a lot of stuff in the tens/hundreds of thousands range where keeping things in memory is still defensible but the algorithms matter a lot. (I used to do more database-y stuff, though.) In LLVM, we heavily use hash tables, but our core implementation is probed for exactly these reasons. It's also power-of-two-sized, which punishes crappy hash code implementations but helps performance a lot for very cheap keys/hashes. It does, however, require the user to explicitly provide two illegal key values in order to get optimal packing of the table.
|
# ? Jan 21, 2016 20:09 |
|
Just a quick sanity check on a slightly hackish heap approach, to implement a C++ version of poppush: I'm trying to improve performance, and one of the bottlenecks is constantly removing the front element of a heap, modifying it, and adding it back to the heap, e.g.: code:
This means that doing code:
The posts I've found on stackoverflow all seem to suggest a much more complicated approach, so I am worried I am missing an edge case. Ika fucked around with this message at 10:46 on Jan 22, 2016 |
# ? Jan 22, 2016 10:29 |
|
Ika posted:Just a quick sanity check on a slightly hackish heap approach, to implement a C++ version of poppush: The standard also says first, last is a range of elements defining a valid nonempty heap to modify. Usually these functions are garbage-in-garbage-out. If it works, that's a happy coincidence for your specific standard library implementation. Have you tried it? Verifying is easy with std::is_heap, so you should write a generative test using a quickcheck clone.
|
# ? Jan 22, 2016 16:29 |
|
The Laplace Demon posted:The standard also says first, last is a range of elements defining a valid nonempty heap to modify. Usually these functions are garbage-in-garbage-out. If it works, that's a happy coincidence for your specific standard library implementation. Have you tried it? Verifying is easy with std::is_heap, so you should write a generative test using a quickcheck clone. I've tried it and it works with the compiler I am using, my question was aiming at whether it really is implementation dependent if the standard also says the first action is to copy the last element of the heap to the front, does it then matter whether the last element is sorted in correctly. I may end up using a different workaround just to be absolutely sure it remains valid.
|
# ? Jan 22, 2016 17:40 |
|
code:
I note indefinitely* because if there's an easy way to have the loop break on user input (maybe set butt = 0), I'd prefer that. Like if they hit a key or enter a command during the wait it will automatically break.
|
# ? Jan 28, 2016 20:49 |
|
Flush the iostream after each write.
|
# ? Jan 28, 2016 20:51 |
|
Your continue statement is unnecessary. You can have your loop test any condition you like, but the question of "how do I handle user input" has wildly different answers depending on your environment.
|
# ? Jan 28, 2016 21:06 |
|
Ralith posted:Your continue statement is unnecessary. You can have your loop test any condition you like, but the question of "how do I handle user input" has wildly different answers depending on your environment. If I put a cin somewhere, it will pause on the cin to wait for input, and thus, won't loop. Normally that's what I'd do, but I want the script to run indefinitely unless the user actually does something while it's looping.
|
# ? Jan 28, 2016 21:24 |
|
std::endl should flush the stream as well, as might a simple '\n' in your string if line buffering is enabled.
|
# ? Jan 28, 2016 21:24 |
|
PRADA SLUT posted:If I put a cin somewhere, it will pause on the cin to wait for input, and thus, won't loop. Normally that's what I'd do, but I want the script to run indefinitely unless the user actually does something while it's looping.
|
# ? Jan 28, 2016 22:04 |
|
Ralith posted:You'll need to either use a platform-specific multiplexing scheme (e.g. select) or multiple threads. What would be the simplest way to do this under OS X?
|
# ? Jan 28, 2016 22:19 |
|
If you want a simple solution, ncurses can wait on input with a timeout.
|
# ? Jan 28, 2016 22:24 |
|
Couldn't you just check "is a key down right now no ok" at the end of each cycle?
|
# ? Jan 29, 2016 00:08 |
|
Not using the garbage C++ IO library. Also, what you want to check for is if there have been any key presses since the last time you checked.
|
# ? Jan 29, 2016 00:12 |
|
PRADA SLUT posted:What would be the simplest way to do this under OS X? Selecting on the standard input file descriptor should work there.
|
# ? Jan 29, 2016 01:25 |
|
If you're willing to press ctrl+c to stop your program instead of an arbitrary key then this is infinitely simpler and you can just do this:code:
|
# ? Jan 29, 2016 01:39 |
|
PRADA SLUT posted:What would be the simplest way to do this under OS X? Not sure if this is a comedy option or serious suggestion: use read-type dispatch_source on stdin and write-type dispatch_source on stout's file descriptors, and set their event handlers to blocks that dispatch to a background queue.
|
# ? Jan 29, 2016 03:14 |
|
Secondly, is there a way to make my script always have administrative privileges? Like a way to set permissions on that individual file that would always allow it to do what it does without asking for my password every time it runs? It doesn't save my password since it runs nonstop in the background and requires my password every hour.GrumpyDoctor posted:Couldn't you just check "is a key down right now no ok" at the end of each cycle? My loop timer is an hour long, so it wouldn't work. Nippashish posted:If you're willing to press ctrl+c to stop your program instead of an arbitrary key then this is infinitely simpler and you can just do this: The idea is that my program will automatically do something every hour, but I can switch it to doing that thing manually on the users input instead at will, without having to close and re-run it every time. Basically it does A Thing every loop, but I was looking to automate it so it would do A Thing every hour without my intervention, but I would be able to switch it back to manual execution without re-launching the script each time. As it stands right now, it runs, and the user can do A Thing manually while it's open, or can elect to have it do A Thing every hour automatically, but there's no way to switch it back once it's automatically running without closing and re-opening. PRADA SLUT fucked around with this message at 06:43 on Jan 29, 2016 |
# ? Jan 29, 2016 06:40 |
|
PRADA SLUT posted:Secondly, is there a way to make my script always have administrative privileges? Like a way to set permissions on that individual file that would always allow it to do what it does without asking for my password every time it runs? It doesn't save my password since it runs nonstop in the background and requires my password every hour. (The select option is also relatively simple as it doesn't involve multithreading at all.)
|
# ? Jan 29, 2016 07:58 |
|
roomforthetuna posted:Here are a selection of answers to this question. The "select" option that's the highest rated answer should be supported to some extent regardless of OS. To make the relevance clearer I should probably mention that stdin is a file descriptor. http://stackoverflow.com/questions/2917881/how-to-implement-a-timeout-in-read-function-call Worth noting: Windows select() only works on network sockets, because network sockets are not file handles on Windows and select was/is a be-compatible-with-Unix hack on top of the OS.
|
# ? Jan 29, 2016 12:15 |
feedmegin posted:Worth noting: Windows select() only works on network sockets, because network sockets are not file handles on Windows and select was/is a be-compatible-with-Unix hack on top of the OS. To elaborate: On Windows, you'll have to use the WaitForSingleObject() function.
|
|
# ? Jan 29, 2016 12:43 |
|
PRADA SLUT posted:Secondly, is there a way to make my script always have administrative privileges? Like a way to set permissions on that individual file that would always allow it to do what it does without asking for my password every time it runs? It doesn't save my password since it runs nonstop in the background and requires my password every hour. setuid root?
|
# ? Jan 29, 2016 14:37 |
|
nielsm posted:To elaborate: On Windows, you'll have to use the WaitForSingleObject() function. No, Windows has no equivalent of select/poll/epoll/kpoll, period. Windows asynchronous I/O follows the aio model, scratch that, Windows asynchronous I/O is aio after a search-replace: OVERLAPPED is struct aiocb, GetOverlappedResult is aio_suspend/aio_return, CancelIoEx is aio_cancel Use a library/framework that hides all the complexity of asynchronous I/O or you'll go insane. There is no portable way to do it efficiently, and in some combinations of platform/device/operations you will have to spin a thread (Windows console I/O is a big one) or use non-standard operations (for example WaitForSingleObject on a console input handle works like select instead of working like aio_suspend, like it does for all other file handles) I can't recommend any in particular, though, but never ever ever write low level async I/O code
|
# ? Jan 29, 2016 14:45 |
Do JetBrains have any plans to make CLion work with MS toolchain? Anyone knows?
|
|
# ? Jan 30, 2016 18:51 |
|
Can someone explain a bit what is happening with this type deduction...code:
|
# ? Jan 31, 2016 13:02 |
|
Hyvok posted:Can someone explain a bit what is happening with this type deduction... If the initial value is an int then all the intermediate values are int and the final return value is an int. It's just int = int * int64_t at every step and it's your fault if the operation overflows or truncates. Look at the example implementations on cppreference.com. MutantBlue fucked around with this message at 13:47 on Jan 31, 2016 |
# ? Jan 31, 2016 13:33 |
|
I feel so dumb that I can't figure this out but I'm simply not getting it. I'm trying to generate 4 random numbers between 1 and 25 and none of the 4 can match. I'm able to generate 4 random numbers but the way I'm doing it, I'm only comparing the last random number to the current one, meaning that I end up with two of the four numbers being the same occasionally. I know this has to be something easy that I'm not understanding but I can't figure it out. Should I be reading each random number into its own variable like randOne, randTwo, randThree, randFour? How would I do that? Here is the code I have: code:
Scrapez fucked around with this message at 17:23 on Feb 1, 2016 |
# ? Feb 1, 2016 07:42 |
|
If you want to use a for loop like that, the easiest approach would be to define an array of 4 ints, then in each iteration of your outer loop, repeatedly generate numbers until you get one that isn't in the part of the array you already filled in. Be careful not to examine elements of the array that you haven't filled in yet (they'll have undefined values). For something a little more modern and well-behaved than (s)rand, you might also want to look into C++11 random number generation. e: Also, not sure if you realize this, but standard convention is to do loops such that the counter starts at 0 and the the condition is < N for some N. This makes manipulating arrays more convenient. Ralith fucked around with this message at 08:22 on Feb 1, 2016 |
# ? Feb 1, 2016 07:52 |
|
Declare an array of four ints, int nums[4]; Then use your loop variable as an index into the array; you know you need to be storing the random number into the ith slot in the array. Then iterate over each number up to (but not including) the ith slot in the array, and if any of them are equal, redo that number and try again. At the end just print all four numbers in the array out: you know they're all unique. E: that'll teach me to answer on my phone
|
# ? Feb 1, 2016 07:55 |
|
I wrote this really quick:C++ code:
It's also got a bunch of various outputs so you can see what it's doing (checking array positions, generating a different number, etc), and finally just outputs the final results at the end. PRADA SLUT fucked around with this message at 08:32 on Feb 1, 2016 |
# ? Feb 1, 2016 08:30 |
quote:
|
|
# ? Feb 1, 2016 08:53 |
|
Since it's such a small range of numbers you might consider just shuffling the numbers 1..25 and then taking the first 4.C++ code:
|
# ? Feb 1, 2016 09:00 |
|
|
# ? Jun 1, 2024 12:07 |
|
nielsm posted:That's an array of three elements, indexed 0, 1 and 2. Your code then proceeds to stuff 4 elements into it. I noticed that after. It still outputs correctly.
|
# ? Feb 1, 2016 09:18 |