|
leper khan posted:That looks correct. disgusting lets talk indentation tabs for indenting, spaces for alignment
|
# ? Nov 4, 2023 15:11 |
|
|
# ? Jun 7, 2024 15:12 |
|
Sweeper posted:disgusting I want to do tabs to indent spaces to align, but no one I've worked with was able to keep that consistent. So only spaces it is.
|
# ? Nov 4, 2023 15:18 |
|
leper khan posted:I want to do tabs to indent spaces to align, but no one I've worked with was able to keep that consistent. So only spaces it is. tools don't support it well so if it can't be done automatically it can't be done at all, thats the problem with formatting
|
# ? Nov 4, 2023 15:22 |
|
roomforthetuna posted:Yes it still is sometimes an else after a brace, you're just saying you should format that like a maniac. You appear to be suggesting this monstrosity?
|
# ? Nov 4, 2023 16:51 |
|
roomforthetuna posted:
big "code generated by a shell script" energy in this indentation style
|
# ? Nov 4, 2023 16:54 |
|
roomforthetuna posted:
|
# ? Nov 4, 2023 18:50 |
|
leper khan posted:That looks correct. There is something wrong with your eyes.
|
# ? Nov 4, 2023 18:54 |
|
Same line braces, none of this mixed crap. That way lies madness
|
# ? Nov 4, 2023 19:20 |
|
When I first started at my current employer, one of the simulators I became owner of was consistently formatted like this:code:
I deleted the entire codebase and rewrote the thing.
|
# ? Nov 4, 2023 19:41 |
|
ShoulderDaemon posted:When I first started at my current employer, one of the simulators I became owner of was consistently formatted like this:
|
# ? Nov 5, 2023 00:27 |
|
ShoulderDaemon posted:When I first started at my current employer, one of the simulators I became owner of was consistently formatted like this: How did you find my university assignments?
|
# ? Nov 5, 2023 20:30 |
|
2 space indentation, except use 4 when you need to point out the really important parts of your code
|
# ? Nov 6, 2023 02:14 |
|
4 space indentation is best, it prevents you from doing horrible pyramids of doom.. 2 space lets you construct huge nested crap without you realizing it. --- Anyways, I'm at a school project yet again. Basically we have a database of publications and affiliations, and everything is done with C++ without external libs. Threading is not allowed. Program adds, removes, modifies and queries data based on different parameters and sorting orders. I did first this: C++ code:
That made the subsequent function calls a magnitude faster, because it sorts the data only once, and returns pre-sorted data if data did not change. Next I realized that performance is estimated only with the following functions: code:
So I could use functions like add_affiliation or remove_affiliation to keep a few always sorted indexes up to date, without it affecting the performance measured by perftests. Of course I'd have to do some extra work in change_affiliation_coord too, but maybe it would not be that bad. My question is: does it make sense to keep a few pre-sorted index lists, and keep the lists sorted during add/remove/modify operations, instead of sorting the data only when requested? As a downside, n adds would be O(n^2) instead of O(n), for example.. 2nd question is: would it be fast enough to just iterate over the data once and add the distances to a 3 item std::priority_queue (min heap) with function get_affiliations_closest_to. It must return 3 closest affiliations to a random coordinate. Or should I start researching & implementing some kind of spatial indexing solution for that..? Ihmemies fucked around with this message at 13:38 on Nov 6, 2023 |
# ? Nov 6, 2023 12:50 |
I would say a std::map (or std::multimap, if you can have duplicate names) from name to id, keyed on the alphabetical collation of the names. You just need to supply a Compare function to the std::map template to get the ordering you want.
|
|
# ? Nov 6, 2023 14:24 |
|
Instead of a vector, you could use a sorted data structure that offers O(log n) inserts, like std::set. But since it doesn't look like the insert time is actually part of the grading rubric, making inserts as slow as you like so that the actually graded functions are as fast as possible is probably the way to go. For your second question, only you can know whether a single pass over every element is "fast enough" for a good grade.
|
# ? Nov 6, 2023 14:24 |
|
Jabor posted:Instead of a vector, you could use a sorted data structure that offers O(log n) inserts, like std::set. But since it doesn't look like the insert time is actually part of the grading rubric, making inserts as slow as you like so that the actually graded functions are as fast as possible is probably the way to go. Hmm yes, I could use std::set with a custom comparator. Comparator would compare the names or distances to 0,0 for the affiliations, and hold the id's. The problem here is that I need to return a vector<AffiliationID>. I forgot to mention that. Basically all the queries want a vector back, and creating a vector from set is O(n) operation (slow).. which kind of defeats the purpose of doing sorting beforehand. Affiliations sorted by name? They want a vector with id's. Affiliations sorted by distance to 0,0? A vector with id's as return value.. etc. quote:For your second question, only you can know whether a single pass over every element is "fast enough" for a good grade. Least effort way is to do it as I described, and see if it's good enough. There are 5 additional functions and that is one of them, if they are fast enough the points are just awareded directly.
|
# ? Nov 6, 2023 14:42 |
Copying one vector to another is also O(n), although with smaller constants.
|
|
# ? Nov 6, 2023 14:45 |
|
Meh. I guess I'll just wait it out, the final "performance targets" or whatever should be locked a week before deadline. So I can see a week before deadline if my program is too slow or not. And defer doing any extra optimizations to that moment. Implementing a set or a map sorted by custom comparator should not take too much time. Grading works like this: every student's programs are performance tested, and then compared with each other. Average performance gets a 3, piss poor gets 1, very good performing programs get a 5. So the performance target is moving, and depends on how good/bad the other people are with their performance optimizations. I used hashmaps because they check everything with Valgrind, and my pointer handling is never flawless. With STD's data structures you can be fairly sure you don't gently caress up memory management at least. Ihmemies fucked around with this message at 14:54 on Nov 6, 2023 |
# ? Nov 6, 2023 14:51 |
|
Ihmemies posted:Hmm yes, I could use std::set with a custom comparator. Comparator would compare the names or distances to 0,0 for the affiliations, and hold the id's. O(n) in elements over a collection is O(1) per element. It's not that slow. memcpy is O(n) in memory size to be copied. You need to look at what n defines to determine whether the thing is fast or slow. Otherwise I'll just redefine n to confound you to think slow operations are fast and fast operations are slow. It's impossible to create a vector of n elements in less than O(n) time. So anything you do to create a vector will be at least O(n). You're free to perform as many [constant number valued] O(n) operations as you like without impacting the asymptotic bound.
|
# ? Nov 6, 2023 14:55 |
|
Ihmemies posted:Grading works like this: every student's programs are performance tested, and then compared with each other. Average performance gets a 3, piss poor gets 1, very good performing programs get a 5. So the performance target is moving, and depends on how good/bad the other people are with their performance optimizations. What an incredibly bad way to structure this.
|
# ? Nov 6, 2023 15:14 |
|
What you're describing is often called a secondary index. You could go further than your proposal: observe that your secondary index structure would always be "mostly in order" (from previous sorts), so re-sorting the whole thing involves a lot of unnecessary work. So, how about maintaining a sorted structure with the preexisting entries, as you propose, and have a side data structure where new inserts will go. On removal, just find the entry in the sorted index and mark it as invalid with a special sentinel entry (this is called "tombstoning" sometime) so you don't take the linear-time penalty of removing from the vector and shifting all the subsequent elements down. Modifying an element, if it changes the ordering, would involve both tombstoning the old version and inserting the new version; in essence, a deletion followed by an isertion. vector<AffiliationID> affiliationsAlphabetically; // Invariant: always ordered vector<AffiliationId> affiliationAdditions // Invariant: none! Just push_back() into this on inserts. (edit: of course, morally, `affiliationAdditions` could well be an std::set.) When the user wants the Vector with all entries returned, sort the smaller one (or use a set where ordering is always maintained, whatever), which would be O(m * log m) for m <<< n, then merge the two structures together (which is O(n), since they're now both sorted, so you can scan through both), skipping over any tombstoned values. As others have said, if you're returning a Vector by copy so you can't do better than O(n) anyway Dijkstracula fucked around with this message at 15:41 on Nov 6, 2023 |
# ? Nov 6, 2023 15:38 |
|
Subjunctive posted:What an incredibly bad way to structure this. I am glad I am not the only bad thing here Yes it makes sense that copying a vector is O(n), I am just being bad. I implemented two sets with custom comparators, since that was relatively straightforward. As a bonus my comparison logic moved over from the mess of class implementation to my header... which looks nicer. Before, doing a sort once, and returning cached data in subsequent queries: code:
code:
I forgot to cache the return vectors after the set change, so my optimization score went to zero... despite the functions performing a lot faster! Oh well. The sets with custom comparator seem to be a "good enough" solution, basically it's as fast as retrieving all the affiliations, because the affiliation id's are already pre-sorted. I still need to re-add the caching of vectors so I don't get penalties from not being optimized Tombstone thing looks interesting, but maybe it's not neccessary after using the sets already gave such a big perf boost in the tests which matter. It adds significant execution time to operations like add though, but that isn't measured so code:
Ihmemies fucked around with this message at 17:37 on Nov 6, 2023 |
# ? Nov 6, 2023 17:30 |
|
Are you submitting code and the professor chooses how to compile it and with what flags, or is there room for somebody to forget to pass flags to optimize and get a bad grade?
|
# ? Nov 6, 2023 17:45 |
|
Twerk from Home posted:Are you submitting code and the professor chooses how to compile it and with what flags, or is there room for somebody to forget to pass flags to optimize and get a bad grade? We have an online course environment, we add gitlab commit hashes as submissions. So the school’s automatic grader pulls code from gitlab and compiles and runs it automatically.
|
# ? Nov 6, 2023 18:18 |
|
Well, at least they are encouraging you to use version control.
|
# ? Nov 6, 2023 18:22 |
|
The only micro things that stand out to me are that you’re keeping some of the publication and affiliation data in separate maps instead of storing that in the same record, as if you can’t just put a std::vector in a struct or something. Maybe that makes sense given the access patterns in your API, but it could also be forcing extra lookups. You could also store the distance from the home coord used by get_affiliations_distance_increasing (I assume that’s how that API works?) so that you don’t have to recompute that in your comparator. Note that std::unordered_map has stable references — the entries don’t have to get moved as the hash table grows, so there’s no surprise penalty for making those records bigger or more expensive to move. Just caching the results of some queries is a reasonable way to optimize them if changes are uncommon. For getting the 3 closest to a particular point, you could definitely use a spatial index if this is an important operation. Otherwise, it’s a bit if a shame to build a vector with every ID in it and then drop most of them; you could hand-roll this, just remembering the IDs and distances for your current top three.
|
# ? Nov 6, 2023 18:45 |
|
Now the problem is, that I lost optimization points from get_affiliations_alphabetically and get_affiliations_distance_increasing functions. In Plussa's prg1 grading system (specifically perfestimate-grader), 150 optimization points are available. The functions subject to optimization are: get_affiliations_alphabetically get_affiliations_distance_increasing find_affiliation_with_coord There are two testing approaches for optimization points: The approach for functions get_affiliations_alphabetically and get_affiliations_distance_increasing. Another approach for the function find_affiliation_with_coord. For both testing approaches, we calculate what we call the "optimization percentage," which is a number between -% and 100%: - To be considered optimized, the performance should, on average, be more efficient than single executions. A more efficient execution has a lower command counter value. The optimization percentage helps determine if a function has been optimized. When the optimization percentage is significantly negative, it means that a single execution is more efficient than the average of multiple executions. Conversely, when the percentage is significantly positive, it indicates that a single execution is less efficient than the average performance. Optimization points are granted if the optimization percentage is greater than 2%. Note: The values of command counters may vary depending on the computer, and optimization testing should be conducted without any compiler optimizations. So the repetitive function calls should be faster than the 1st one. Since the data is basically already sorted, what is there still to optimize? How could I return the data even faster? Do I need to have some kind of delay with the first function call occurs, where the function spends a bit of an extra time just waiting...
|
# ? Nov 6, 2023 18:47 |
|
rjmccall posted:The only micro things that stand out to me are that you’re keeping some of the publication and affiliation data in separate maps instead of storing that in the same record, as if you can’t just put a std::vector in a struct or something. Maybe that makes sense given the access patterns in your API, but it could also be forcing extra lookups. You could also store the distance from the home coord used by get_affiliations_distance_increasing (I assume that’s how that API works?) so that you don’t have to recompute that in your comparator. Note that std::unordered_map has stable references — the entries don’t have to get moved as the hash table grows, so there’s no surprise penalty for making those records bigger or more expensive to move. Yes, I could definitely store more data in the Affiliation and Publication structs. I also should compute the distance to 0,0 and store it. It probably doesn't affect performance, only the resource usage. I'm not sure if anyone will look at my code though so I don't think there's an incentive to make it look "nice".
|
# ? Nov 6, 2023 18:53 |
|
I assume so, but, is the testing API fixed? Could you return a shared_ptr<vector<...>> or the vector by reference instead of the vector by value?Ihmemies posted:Yes, I could definitely store more data in the Affiliation and Publication structs. I also should compute the distance to 0,0 and store it. It probably doesn't affect performance, only the resource usage.
|
# ? Nov 6, 2023 18:57 |
|
Dijkstracula posted:I assume so, but, is the testing API fixed? Could you return a shared_ptr<vector<...>> or the vector by reference instead of the vector by value? It is fixed. The header file's public part has all the function definitions, and they are not allowed to be changed. quote:hard to say until you benchmark it, but wider datatypes will be less CPU cache-friendly so, even though the prefetcher will figure out that you're walking through an array, you may take a hit from there And this goes beyond the scope I understand, or can estimate the down/upsides.. maybe I should test it. At least calculating the distance when adding or modifying an affiliation super good idea.
|
# ? Nov 6, 2023 19:08 |
|
Ihmemies posted:And this goes beyond the scope I understand, or can estimate the down/upsides (though, now that I think about it some more, if your dataset isn't at least multiple-megabytes in size then I doubt you'd see enough cache pressure for what I said to matter anyway)
|
# ? Nov 6, 2023 19:33 |
|
Edit: here I am being stupid again. OF course the id's must be first removed from the sets, then the map, because set's comparators depend on the map data... and it causes undefined behaviour otherwise. ... the indexed sets cause some weird behaviour. Has anyone ever experienced keys returning back to an unordered map, with values initialized to default values... after erasing key from the map. C++ code:
At next step in affiliationsSortedByName.erase(id) - the key returns to affiliations map.........? Surely set's comparators can't cause something like this. Ihmemies fucked around with this message at 21:11 on Nov 6, 2023 |
# ? Nov 6, 2023 20:56 |
|
My erase order was wrong, it caused undefined behaviour when the set’s comparator was missing values. Anyways, I need to store some affiliations (universities) in a weighted graph and make queries to it. Shortest path etc. I thought about something like C++ code:
Does it make any sense to attempt to implement a custom Fibonacci heap instead of using standard STL containers? Data insertion speed or memory usage doesn’t really matter, since it isn’t measured, only the path finding must be fast. Ihmemies fucked around with this message at 09:23 on Nov 13, 2023 |
# ? Nov 13, 2023 09:04 |
|
In any case, don't bother with fib heaps. If data insertion isn't measured, precalculate all the paths on insertion and just do constant time queries, lmao.
|
# ? Nov 13, 2023 10:03 |
|
Ihmemies posted:My erase order was wrong, it caused undefined behaviour when the set’s comparator was missing values. No. Just use a sparse matrix to represent your graph, compressed row (CSR) format is most common. https://en.wikipedia.org/wiki/Sparse_matrix
|
# ? Nov 13, 2023 10:07 |
|
Xarn posted:In any case, don't bother with fib heaps. That’s what I did in part 1, I presorted lists for queries during insertion so I can return pre-sorted lists. I asked now, and aparently the insertion is timed somehow, since some students got a fail from their insertion speed tests. But for my part1 solution, a 4x command count increase because of the sorts during insertion was still OK. So apparently a limit exists, but is quite lax, on insertion speed... I can try to do the path calculation during insertion, but it actually might be too slow.
|
# ? Nov 13, 2023 10:50 |
|
Edit: I think I need a book. Like perhaps https://www.amazon.com/Introduction-Algorithms-fourth-Thomas-Cormen/dp/026204630X I am still trying to understand this assigment. I think I managed to dynamically create all the connections between nodes, and increase the connection's weight for added publications. The problem is now, how to use the connections. I have to do 4 different pathfinding functions, with different heuristics. A* would fit it well, but if I implement my nodes like this: C++ code:
a) reset the state of my nodes after each pathfinding run b) keep a list of modified nodes, and only reset them afterwards c) mark nodes as "used" or something and reset them only when they are encountered again in a new pathfinding run.. d) maybe create some kind of temporary graph based on the Affiliations and delete it after pathfinding e) something else? In any case I think I need a different structure for storing the pathfinding related information, like this: C++ code:
CSR is a bit out of my league, unless I can find some "ELI5" guide for implementing that Ihmemies fucked around with this message at 12:27 on Nov 15, 2023 |
# ? Nov 15, 2023 12:12 |
|
The fastest/easiest way too handle the state across multiple runs would be to change state into an int and keep count of the number of searches run (this is a way of implementing c on your list). On the first search anything without state == 1 is unvisited, on the second run anything without state == 2 is unvisited etc. This means you don't need to reset anything after a search. Doesn't scale with multithreading without a bit more thought, but your current structures don't allow for that anyway. Are there enough nodes to make A* worthwhile? It's quite common for the overhead to make it slower than djikstra on small graphs.
|
# ? Nov 15, 2023 12:57 |
|
robostac posted:The fastest/easiest way too handle the state across multiple runs would be to change state into an int and keep count of the number of searches run (this is a way of implementing c on your list). On the first search anything without state == 1 is unvisited, on the second run anything without state == 2 is unvisited etc. This means you don't need to reset anything after a search. Doesn't scale with multithreading without a bit more thought, but your current structures don't allow for that anyway. Overhead meaning keeping track of F/g/h etc. and resetting values after searches? Speed will be tested on 100000 nodes (and thus more connections than that). I don’t know what is large enough. I need to find: - any path (maybe bfs/dfs is suitable for this because of smaller overhead) - path with least nodes - path with least friction (using connections with heaviest weight) - shortest path (nodes have x,y coordinates in 2D space so literally shortest) And I optimistically thought A* would be perhaps flexible enough to cover all the cases. Ihmemies fucked around with this message at 13:42 on Nov 15, 2023 |
# ? Nov 15, 2023 13:31 |
|
|
# ? Jun 7, 2024 15:12 |
|
Actually even creating the graph dynamically is tough… oh crap. Quite a jump in difficulty from project 1 to 2.
|
# ? Nov 15, 2023 16:57 |