C/C++ Programming Questions Not Worth Their Own Thread

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > C/C++ Programming Questions Not Worth Their Own Thread

«‹›641 »

Sweeper: Nov 29, 2007; The Joe Buck of Posting; Dinosaur Gum

leper khan posted:

That looks correct.

disgusting

lets talk indentation

tabs for indenting, spaces for alignment

# ? Nov 4, 2023 15:11

Adbot: ADBOT LOVES YOU

# ? Jun 7, 2024 15:12

leper khan: Dec 28, 2010; Honest to god thinks Half Life 2 is a bad game. But at least he likes Monster Hunter.

Sweeper posted:

disgusting

lets talk indentation

tabs for indenting, spaces for alignment

I want to do tabs to indent spaces to align, but no one I've worked with was able to keep that consistent. So only spaces it is.

# ? Nov 4, 2023 15:18

Sweeper: Nov 29, 2007; The Joe Buck of Posting; Dinosaur Gum

leper khan posted:

I want to do tabs to indent spaces to align, but no one I've worked with was able to keep that consistent. So only spaces it is.

tools don't support it well so if it can't be done automatically it can't be done at all, thats the problem with formatting

# ? Nov 4, 2023 15:22

Presto: Nov 22, 2002; Keep calm and Harry on.

roomforthetuna posted:

Yes it still is sometimes an else after a brace, you're just saying you should format that like a maniac. You appear to be suggesting this monstrosity?
code:
if (banana) {
  doStuff();
}
else {
  doOtherStuff();
}

Yes. That monstrosity is the correct way. I will not budge.

# ? Nov 4, 2023 16:51

Dijkstracula: Mar 18, 2003; You can't spell 'vector field' without me, Professor!

roomforthetuna posted:

code:

if (banana) {
  doStuff();
}
else {
  doOtherStuff();
}

big "code generated by a shell script" energy in this indentation style

# ? Nov 4, 2023 16:54

qsvui: Aug 23, 2003; some crazy thing

roomforthetuna posted:

code:

if (banana) doStuff();
else doOtherStuff();

# ? Nov 4, 2023 18:50

Zopotantor: Feb 24, 2013; ...und ist er drin dann lassen wir ihn niemals wieder raus...

leper khan posted:

That looks correct.

There is something wrong with your eyes.

# ? Nov 4, 2023 18:54

Apex Rogers: Jun 12, 2006; disturbingly functional

Same line braces, none of this mixed crap. That way lies madness

# ? Nov 4, 2023 19:20

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

When I first started at my current employer, one of the simulators I became owner of was consistently formatted like this:

code:

if
    (foo) doStuff();
    else
{   doOtherStuff();
    doSomethingElse(); }

As near as I could tell, only one other person had ever touched the code, and they had left the company more than 5 years before I came onboard. Their choice of indentation was not even close to being the worst thing about it.

I deleted the entire codebase and rewrote the thing.

# ? Nov 4, 2023 19:41

roomforthetuna: Mar 22, 2005; I don't need to know anything about virii! My CUSTOM PROGRAM keeps me protected! It's not like they'll try to come in through the Internet or something!

ShoulderDaemon posted:

When I first started at my current employer, one of the simulators I became owner of was consistently formatted like this:
code:
if
    (foo) doStuff();
    else
{   doOtherStuff();
    doSomethingElse(); }

I think we should all standardize on this, to better reflect the actual quality of the code.

# ? Nov 5, 2023 00:27

mmkay: Oct 21, 2010

ShoulderDaemon posted:

When I first started at my current employer, one of the simulators I became owner of was consistently formatted like this:
code:
if
    (foo) doStuff();
    else
{   doOtherStuff();
    doSomethingElse(); }
As near as I could tell, only one other person had ever touched the code, and they had left the company more than 5 years before I came onboard. Their choice of indentation was not even close to being the worst thing about it.

I deleted the entire codebase and rewrote the thing.

How did you find my university assignments?

# ? Nov 5, 2023 20:30

Nalin: Sep 29, 2007; Hair Elf

2 space indentation, except use 4 when you need to point out the really important parts of your code

# ? Nov 6, 2023 02:14

Ihmemies: Oct 6, 2012

4 space indentation is best, it prevents you from doing horrible pyramids of doom.. 2 space lets you construct huge nested crap without you realizing it.

---

Anyways, I'm at a school project yet again. Basically we have a database of publications and affiliations, and everything is done with C++ without external libs. Threading is not allowed. Program adds, removes, modifies and queries data based on different parameters and sorting orders.

I did first this:

C++ code:

    struct Affiliation {
        AffiliationID id;
        Name name;
        Coord coord;
    };

    struct Publication {
        PublicationID id;
        Name name;
        Year year;
        std::vector<AffiliationID> affiliations;
    };

    std::unordered_map<AffiliationID, Affiliation> affiliationDetails;
    std::unordered_map<PublicationID, Publication> publicationDetails;
    std::unordered_map<AffiliationID, std::vector<PublicationID>> affiliationToPublication;

    std::unordered_map<Coord, AffiliationID, CoordHash> affiliationCoords;
    std::unordered_map<PublicationID, PublicationID> childToParent;
    std::unordered_map<PublicationID, std::vector<PublicationID>> parentToChildren;

    bool affiliationsChanged = false;
    bool affiliationsCoordsChanged = false;
    vector<AffiliationID> affiliationsAlphabetically = {};
    vector<AffiliationID> affiliationsDistanceIncreasing = {};

This is my minimum effort implementation, and it works as requested. I added the affiliationsAlphabetically and affiliationsDistanceIncreasing vectors after realizing, that some functions must return data faster on subsequent runs. So basically some functions like get_affiliations_distance_increasing or get_affiliations_alphabetically save the vector before returning, and just return the saved vector, if data was not modified.

That made the subsequent function calls a magnitude faster, because it sorts the data only once, and returns pre-sorted data if data did not change.

Next I realized that performance is estimated only with the following functions:

code:

7573 * 10 / 161 (affiliation_info) +
85601857 * 1 / 161 (get_all_affiliations) +
146749592 * 5 / 161 (get_affiliations_alphabetically) +
128233483 * 5 / 161 (get_affiliations_distance_increasing) +
124079 * 10 / 161 (find_affiliation_with_coord) +
9032 * 10 / 161 (change_affiliation_coord) +
6145 * 10 / 161 (get_publications_after) +
2128 * 100 / 161 (publication_info) +
25529 * 10 / 161 (get_referenced_by_chain) = 9083562

Despite their low weight, the two heaviest functions get_affiliations_alphabetically and get_affiliations_distance_increasing cause 94% of the total performance weight (9083562).

So I could use functions like add_affiliation or remove_affiliation to keep a few always sorted indexes up to date, without it affecting the performance measured by perftests. Of course I'd have to do some extra work in change_affiliation_coord too, but maybe it would not be that bad.

My question is: does it make sense to keep a few pre-sorted index lists, and keep the lists sorted during add/remove/modify operations, instead of sorting the data only when requested?

As a downside, n adds would be O(n^2) instead of O(n), for example..

2nd question is: would it be fast enough to just iterate over the data once and add the distances to a 3 item std::priority_queue (min heap) with function get_affiliations_closest_to. It must return 3 closest affiliations to a random coordinate.

Or should I start researching & implementing some kind of spatial indexing solution for that..? :effort:

Ihmemies fucked around with this message at 13:38 on Nov 6, 2023

# ? Nov 6, 2023 12:50

nielsm: Jun 1, 2009

I would say a std::map (or std::multimap, if you can have duplicate names) from name to id, keyed on the alphabetical collation of the names. You just need to supply a Compare function to the std::map template to get the ordering you want.

# ? Nov 6, 2023 14:24

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

Instead of a vector, you could use a sorted data structure that offers O(log n) inserts, like std::set. But since it doesn't look like the insert time is actually part of the grading rubric, making inserts as slow as you like so that the actually graded functions are as fast as possible is probably the way to go.

For your second question, only you can know whether a single pass over every element is "fast enough" for a good grade.

# ? Nov 6, 2023 14:24

Ihmemies: Oct 6, 2012

Jabor posted:

Instead of a vector, you could use a sorted data structure that offers O(log n) inserts, like std::set. But since it doesn't look like the insert time is actually part of the grading rubric, making inserts as slow as you like so that the actually graded functions are as fast as possible is probably the way to go.

Hmm yes, I could use std::set with a custom comparator. Comparator would compare the names or distances to 0,0 for the affiliations, and hold the id's.

The problem here is that I need to return a vector<AffiliationID>. I forgot to mention that. Basically all the queries want a vector back, and creating a vector from set is O(n) operation (slow).. which kind of defeats the purpose of doing sorting beforehand.

Affiliations sorted by name? They want a vector with id's. Affiliations sorted by distance to 0,0? A vector with id's as return value.. etc.

quote:

For your second question, only you can know whether a single pass over every element is "fast enough" for a good grade.

Least effort way is to do it as I described, and see if it's good enough. There are 5 additional functions and that is one of them, if they are fast enough the points are just awareded directly.

# ? Nov 6, 2023 14:42

nielsm: Jun 1, 2009

Copying one vector to another is also O(n), although with smaller constants.

# ? Nov 6, 2023 14:45

Ihmemies: Oct 6, 2012

Meh. I guess I'll just wait it out, the final "performance targets" or whatever should be locked a week before deadline. So I can see a week before deadline if my program is too slow or not. And defer doing any extra optimizations to that moment. Implementing a set or a map sorted by custom comparator should not take too much time.

Grading works like this: every student's programs are performance tested, and then compared with each other. Average performance gets a 3, piss poor gets 1, very good performing programs get a 5. So the performance target is moving, and depends on how good/bad the other people are with their performance optimizations.

I used hashmaps because they check everything with Valgrind, and my pointer handling is never flawless. With STD's data structures you can be fairly sure you don't gently caress up memory management at least.

Ihmemies fucked around with this message at 14:54 on Nov 6, 2023

# ? Nov 6, 2023 14:51

leper khan: Dec 28, 2010; Honest to god thinks Half Life 2 is a bad game. But at least he likes Monster Hunter.

Ihmemies posted:

Hmm yes, I could use std::set with a custom comparator. Comparator would compare the names or distances to 0,0 for the affiliations, and hold the id's.

The problem here is that I need to return a vector<AffiliationID>. I forgot to mention that. Basically all the queries want a vector back, and creating a vector from set is O(n) operation (slow).. which kind of defeats the purpose of doing sorting beforehand.

Affiliations sorted by name? They want a vector with id's. Affiliations sorted by distance to 0,0? A vector with id's as return value.. etc.

Least effort way is to do it as I described, and see if it's good enough. There are 5 additional functions and that is one of them, if they are fast enough the points are just awareded directly.

O(n) in elements over a collection is O(1) per element. It's not that slow. memcpy is O(n) in memory size to be copied.

You need to look at what n defines to determine whether the thing is fast or slow. Otherwise I'll just redefine n to confound you to think slow operations are fast and fast operations are slow.

It's impossible to create a vector of n elements in less than O(n) time. So anything you do to create a vector will be at least O(n). You're free to perform as many [constant number valued] O(n) operations as you like without impacting the asymptotic bound.

# ? Nov 6, 2023 14:55

Subjunctive: Sep 12, 2006; ✨sparkle and shine✨

Ihmemies posted:

Grading works like this: every student's programs are performance tested, and then compared with each other. Average performance gets a 3, piss poor gets 1, very good performing programs get a 5. So the performance target is moving, and depends on how good/bad the other people are with their performance optimizations.

What an incredibly bad way to structure this.

# ? Nov 6, 2023 15:14

Dijkstracula: Mar 18, 2003; You can't spell 'vector field' without me, Professor!

What you're describing is often called a secondary index.

You could go further than your proposal: observe that your secondary index structure would always be "mostly in order" (from previous sorts), so re-sorting the whole thing involves a lot of unnecessary work. So, how about maintaining a sorted structure with the preexisting entries, as you propose, and have a side data structure where new inserts will go. On removal, just find the entry in the sorted index and mark it as invalid with a special sentinel entry (this is called "tombstoning" sometime) so you don't take the linear-time penalty of removing from the vector and shifting all the subsequent elements down. Modifying an element, if it changes the ordering, would involve both tombstoning the old version and inserting the new version; in essence, a deletion followed by an isertion.

vector<AffiliationID> affiliationsAlphabetically; // Invariant: always ordered vector<AffiliationId> affiliationAdditions // Invariant: none! Just push_back() into this on inserts.

(edit: of course, morally, `affiliationAdditions` could well be an std::set.)

When the user wants the Vector with all entries returned, sort the smaller one (or use a set where ordering is always maintained, whatever), which would be O(m * log m) for m <<< n, then merge the two structures together (which is O(n), since they're now both sorted, so you can scan through both), skipping over any tombstoned values. As others have said, if you're returning a Vector by copy so you can't do better than O(n) anyway

Dijkstracula fucked around with this message at 15:41 on Nov 6, 2023

# ? Nov 6, 2023 15:38

Ihmemies: Oct 6, 2012

Subjunctive posted:

What an incredibly bad way to structure this.

I am glad I am not the only bad thing here

Yes it makes sense that copying a vector is O(n), I am just being bad.

I implemented two sets with custom comparators, since that was relatively straightforward. As a bonus my comparison logic moved over from the mess of class implementation to my header... which looks nicer.

Before, doing a sort once, and returning cached data in subsequent queries:

code:

7573 * 10 / 161 (affiliation_info) +
85601857 * 1 / 161 (get_all_affiliations) +
146749592 * 5 / 161 (get_affiliations_alphabetically) +
128233483 * 5 / 161 (get_affiliations_distance_increasing) +
124079 * 10 / 161 (find_affiliation_with_coord) +
9032 * 10 / 161 (change_affiliation_coord) +
6145 * 10 / 161 (get_publications_after) +
2128 * 100 / 161 (publication_info) +
25529 * 10 / 161 (get_referenced_by_chain) = 9083562

After, returning the already ordered set as a vector..

code:

7569 * 10 / 161 (affiliation_info) +
85601859 * 1 / 161 (get_all_affiliations) +
80499006 * 5 / 161 (get_affiliations_alphabetically) +
80455190 * 5 / 161 (get_affiliations_distance_increasing) +
123938 * 10 / 161 (find_affiliation_with_coord) +
86683 * 10 / 161 (change_affiliation_coord) +
6141 * 10 / 161 (get_publications_after) +
2128 * 100 / 161 (publication_info) +
25529 * 10 / 161 (get_referenced_by_chain) = 5547107

Overall the program is a lot faster with performance tested functions, but as a bad side change_affiliation_coord is a lot slower now.

I forgot to cache the return vectors after the set change, so my optimization score went to zero... despite the functions performing a lot faster! Oh well.

The sets with custom comparator seem to be a "good enough" solution, basically it's as fast as retrieving all the affiliations, because the affiliation id's are already pre-sorted. I still need to re-add the caching of vectors so I don't get penalties from not being optimized

Tombstone thing looks interesting, but maybe it's not neccessary after using the sets already gave such a big perf boost in the tests which matter.

It adds significant execution time to operations like add though, but that isn't measured so :shrug:

code:

Without indexed sets:
          N ,    add (sec) ,   cmds (sec) ,  total (sec)
        100 ,    0.0003712 ,     3.31e-05 ,    0.0004043
       1000 ,    0.0030583 ,    0.0002615 ,    0.0033198
      10000 ,    0.0299062 ,    0.0026818 ,     0.032588
     100000 ,     0.303994 ,    0.0336798 ,     0.337674
    1000000 ,      3.33097 ,     0.503077 ,      3.83405

With indexed sets:

      N ,    add (sec) ,   cmds (sec) ,  total (sec)
    100 ,    0.0004797 ,    0.0001669 ,    0.0006466
   1000 ,    0.0053462 ,    0.0002854 ,    0.0056316
  10000 ,    0.0604075 ,    0.0027036 ,    0.0631111
 100000 ,     0.811617 ,    0.0314377 ,     0.843055
1000000 ,       12.334 ,     0.658712 ,      12.9927

Ihmemies fucked around with this message at 17:37 on Nov 6, 2023

# ? Nov 6, 2023 17:30

Twerk from Home: Jan 17, 2009; This avatar brought to you by the 'save our dead gay forums' foundation.

Are you submitting code and the professor chooses how to compile it and with what flags, or is there room for somebody to forget to pass flags to optimize and get a bad grade?

# ? Nov 6, 2023 17:45

Ihmemies: Oct 6, 2012

Twerk from Home posted:

Are you submitting code and the professor chooses how to compile it and with what flags, or is there room for somebody to forget to pass flags to optimize and get a bad grade?

We have an online course environment, we add gitlab commit hashes as submissions.
So the school�s automatic grader pulls code from gitlab and compiles and runs it automatically.

# ? Nov 6, 2023 18:18

OddObserver: Apr 3, 2009

Well, at least they are encouraging you to use version control.

# ? Nov 6, 2023 18:22

rjmccall: Sep 7, 2007; no worries friend; Fun Shoe

The only micro things that stand out to me are that you�re keeping some of the publication and affiliation data in separate maps instead of storing that in the same record, as if you can�t just put a std::vector in a struct or something. Maybe that makes sense given the access patterns in your API, but it could also be forcing extra lookups. You could also store the distance from the home coord used by get_affiliations_distance_increasing (I assume that�s how that API works?) so that you don�t have to recompute that in your comparator. Note that std::unordered_map has stable references � the entries don�t have to get moved as the hash table grows, so there�s no surprise penalty for making those records bigger or more expensive to move.

Just caching the results of some queries is a reasonable way to optimize them if changes are uncommon.

For getting the 3 closest to a particular point, you could definitely use a spatial index if this is an important operation. Otherwise, it�s a bit if a shame to build a vector with every ID in it and then drop most of them; you could hand-roll this, just remembering the IDs and distances for your current top three.

# ? Nov 6, 2023 18:45

Ihmemies: Oct 6, 2012

Now the problem is, that I lost optimization points from get_affiliations_alphabetically and get_affiliations_distance_increasing functions.

In Plussa's prg1 grading system (specifically perfestimate-grader), 150 optimization points are available. The functions subject to optimization are:

get_affiliations_alphabetically
get_affiliations_distance_increasing
find_affiliation_with_coord

There are two testing approaches for optimization points:

The approach for functions get_affiliations_alphabetically and get_affiliations_distance_increasing.
Another approach for the function find_affiliation_with_coord.

For both testing approaches, we calculate what we call the "optimization percentage," which is a number between -% and 100%:

- To be considered optimized, the performance should, on average, be more efficient than single executions. A more efficient execution has a lower command counter value.

The optimization percentage helps determine if a function has been optimized. When the optimization percentage is significantly negative, it means that a single execution is more efficient than the average of multiple executions. Conversely, when the percentage is significantly positive, it indicates that a single execution is less efficient than the average performance.

Optimization points are granted if the optimization percentage is greater than 2%.

Note: The values of command counters may vary depending on the computer, and optimization testing should be conducted without any compiler optimizations.

So the repetitive function calls should be faster than the 1st one.

Since the data is basically already sorted, what is there still to optimize? How could I return the data even faster? Do I need to have some kind of delay with the first function call occurs, where the function spends a bit of an extra time just waiting...

# ? Nov 6, 2023 18:47

Ihmemies: Oct 6, 2012

rjmccall posted:

The only micro things that stand out to me are that you�re keeping some of the publication and affiliation data in separate maps instead of storing that in the same record, as if you can�t just put a std::vector in a struct or something. Maybe that makes sense given the access patterns in your API, but it could also be forcing extra lookups. You could also store the distance from the home coord used by get_affiliations_distance_increasing (I assume that�s how that API works?) so that you don�t have to recompute that in your comparator. Note that std::unordered_map has stable references � the entries don�t have to get moved as the hash table grows, so there�s no surprise penalty for making those records bigger or more expensive to move.

Yes, I could definitely store more data in the Affiliation and Publication structs. I also should compute the distance to 0,0 and store it. It probably doesn't affect performance, only the resource usage. I'm not sure if anyone will look at my code though so I don't think there's an incentive to make it look "nice".

# ? Nov 6, 2023 18:53

Dijkstracula: Mar 18, 2003; You can't spell 'vector field' without me, Professor!

I assume so, but, is the testing API fixed? Could you return a shared_ptr<vector<...>> or the vector by reference instead of the vector by value?

Ihmemies posted:

Yes, I could definitely store more data in the Affiliation and Publication structs. I also should compute the distance to 0,0 and store it. It probably doesn't affect performance, only the resource usage.

hard to say until you benchmark it, but wider datatypes will be less CPU cache-friendly so, even though the prefetcher will figure out that you're walking through an array, you may take a hit from there

# ? Nov 6, 2023 18:57

Ihmemies: Oct 6, 2012

Dijkstracula posted:

I assume so, but, is the testing API fixed? Could you return a shared_ptr<vector<...>> or the vector by reference instead of the vector by value?

It is fixed. The header file's public part has all the function definitions, and they are not allowed to be changed.

quote:

hard to say until you benchmark it, but wider datatypes will be less CPU cache-friendly so, even though the prefetcher will figure out that you're walking through an array, you may take a hit from there

And this goes beyond the scope I understand, or can estimate the down/upsides.. maybe I should test it. At least calculating the distance when adding or modifying an affiliation super good idea.

# ? Nov 6, 2023 19:08

Dijkstracula: Mar 18, 2003; You can't spell 'vector field' without me, Professor!

Ihmemies posted:

And this goes beyond the scope I understand, or can estimate the down/upsides

for sure, and it isn't just you - this sort of performance tuning is hard to reason about and so "try it and see" is typically the best practice anyway

(though, now that I think about it some more, if your dataset isn't at least multiple-megabytes in size then I doubt you'd see enough cache pressure for what I said to matter anyway)

# ? Nov 6, 2023 19:33

Ihmemies: Oct 6, 2012

Edit: here I am being stupid again. OF course the id's must be first removed from the sets, then the map, because set's comparators depend on the map data... and it causes undefined behaviour otherwise.

... the indexed sets cause some weird behaviour. Has anyone ever experienced keys returning back to an unordered map, with values initialized to default values... after erasing key from the map.

C++ code:

bool Datastructures::remove_affiliation(AffiliationID id)
{
    const auto it = affiliations.find(id);
    if (it == affiliations.end()) {
        return false;
    }

    std::vector<PublicationID> aff_publications = it->second.publications;

    if(affiliations.erase(id) != 1) {
        std::cerr << "Error while erasing an affiliation with id " << id << std::endl;
        return false;
    }
    affiliationsSortedByName.erase(id);
    affiliationsSortedByCoord.erase(id);

    for (const auto& publicationID : aff_publications)
    {
        auto pub_it = publications.find(publicationID);

        if (pub_it != publications.end())
        {
            auto& affIDs = pub_it->second.affiliations;
            auto newEnd = std::remove_if(affIDs.begin(), affIDs.end(),
                [id](const AffiliationID& affID) {
                    return affID == id;
                });
            affIDs.erase(newEnd, affIDs.end());
        }
    }

    return true;
}

affiliations.erase(id) works.
At next step in affiliationsSortedByName.erase(id) - the key returns to affiliations map.........? :psyduck:

Surely set's comparators can't cause something like this.

Ihmemies fucked around with this message at 21:11 on Nov 6, 2023

# ? Nov 6, 2023 20:56

Ihmemies: Oct 6, 2012

My erase order was wrong, it caused undefined behaviour when the set�s comparator was missing values.

Anyways, I need to store some affiliations (universities) in a weighted graph and make queries to it. Shortest path etc. I thought about something like

C++ code:

std::unordered_map<AffiliationID, std::vector<std::pair<AffiliationID, double>>> adjacencyList;

since my pointer handling sucks. Double would contain the weight, which is calculated based on common Publications between 2 Affiliations. Low weight causes more friction between paths. Then I read about Fibonacci heaps, which might be faster especially in Djikstra�s priority queue and in general with my graph too, with large enough datasets.

Does it make any sense to attempt to implement a custom Fibonacci heap instead of using standard STL containers? Data insertion speed or memory usage doesn�t really matter, since it isn�t measured, only the path finding must be fast.

Ihmemies fucked around with this message at 09:23 on Nov 13, 2023

# ? Nov 13, 2023 09:04

Xarn: Jun 26, 2015

In any case, don't bother with fib heaps.

If data insertion isn't measured, precalculate all the paths on insertion and just do constant time queries, lmao.

# ? Nov 13, 2023 10:03

Beef: Jul 26, 2004

Ihmemies posted:

My erase order was wrong, it caused undefined behaviour when the set�s comparator was missing values.

Anyways, I need to store some affiliations (universities) in a weighted graph and make queries to it. Shortest path etc. I thought about something like
C++ code:
std::unordered_map<AffiliationID, std::vector<std::pair<AffiliationID, double>>> adjacencyList;
since my pointer handling sucks. Double would contain the weight, which is calculated based on common Publications between 2 Affiliations. Low weight causes more friction between paths. Then I read about Fibonacci heaps, which might be faster especially in Djikstra�s priority queue and in general with my graph too, with large enough datasets.

Does it make any sense to attempt to implement a custom Fibonacci heap instead of using standard STL containers? Data insertion speed or memory usage doesn�t really matter, since it isn�t measured, only the path finding must be fast.

No. Just use a sparse matrix to represent your graph, compressed row (CSR) format is most common. https://en.wikipedia.org/wiki/Sparse_matrix

# ? Nov 13, 2023 10:07

Ihmemies: Oct 6, 2012

Xarn posted:

In any case, don't bother with fib heaps.

If data insertion isn't measured, precalculate all the paths on insertion and just do constant time queries, lmao.

That�s what I did in part 1, I presorted lists for queries during insertion so I can return pre-sorted lists. I asked now, and aparently the insertion is timed somehow, since some students got a fail from their insertion speed tests.

But for my part1 solution, a 4x command count increase because of the sorts during insertion was still OK. So apparently a limit exists, but is quite lax, on insertion speed... I can try to do the path calculation during insertion, but it actually might be too slow.

# ? Nov 13, 2023 10:50

Ihmemies: Oct 6, 2012

Edit: I think I need a book. Like perhaps https://www.amazon.com/Introduction-Algorithms-fourth-Thomas-Cormen/dp/026204630X

I am still trying to understand this assigment. I think I managed to dynamically create all the connections between nodes, and increase the connection's weight for added publications.

The problem is now, how to use the connections. I have to do 4 different pathfinding functions, with different heuristics. A* would fit it well, but if I implement my nodes like this:

C++ code:

   struct Affiliation {
        AffiliationID id;
        AffiliationID parent;
        Coord coord;
        Name name;
        std::vector<PublicationID> aff_pubs;
        std::vector<ConnectionID> conn_ids;
        NodeState state;
        float g;    // cheapest path cost from start to this
        float h;    // heuristic cost (distance to end node)
        float f;    // G + H cost (= current best total cost guess)
    };

Connections are stored in a separate map:
    unordered_map<ConnectionID, Connection> conn_map;

Connections contain start and end nodes and the weight:
struct Connection
{
    AffiliationID aff1 = NO_AFFILIATION;
    AffiliationID aff2 = NO_AFFILIATION;
    Weight weight = NO_WEIGHT;
    float length; 

    // Overloaded == operator to determine if Connections are equal
    bool operator==(const Connection& c1) const{
        return (aff1==c1.aff1) && (aff2==c1.aff2) && (weight==c1.weight);
    }

};

Running A* once will alter the state of my data. So either I'd have to
a) reset the state of my nodes after each pathfinding run
b) keep a list of modified nodes, and only reset them afterwards
c) mark nodes as "used" or something and reset them only when they are encountered again in a new pathfinding run..
d) maybe create some kind of temporary graph based on the Affiliations and delete it after pathfinding
e) something else?

In any case I think I need a different structure for storing the pathfinding related information, like this:

C++ code:

struct AStarNodeData {
    float g = INFINITY; 
    float h = 0; 
    float f = INFINITY; 
    AffiliationID parent; 
    NodeState state = NodeState::UNVISITED; /
};

std::unordered_map<AffiliationID, AStarNodeData> a_star_data;

All of this was touched extremely lightly on the course lecture&materials, so I'm kind of on my own here... Based on this, I can most likely implement the A* with perhaps case b) (keep list of modified AStarNodeDatas and reset them afterwards) being the best compromise. I'd add the AStarNodeDatas initially while adding new affiliations.

CSR is a bit out of my league, unless I can find some "ELI5" guide for implementing that :v:

Ihmemies fucked around with this message at 12:27 on Nov 15, 2023

# ? Nov 15, 2023 12:12

robostac: Sep 23, 2009

The fastest/easiest way too handle the state across multiple runs would be to change state into an int and keep count of the number of searches run (this is a way of implementing c on your list). On the first search anything without state == 1 is unvisited, on the second run anything without state == 2 is unvisited etc. This means you don't need to reset anything after a search. Doesn't scale with multithreading without a bit more thought, but your current structures don't allow for that anyway.

Are there enough nodes to make A* worthwhile? It's quite common for the overhead to make it slower than djikstra on small graphs.

# ? Nov 15, 2023 12:57

Ihmemies: Oct 6, 2012

robostac posted:

The fastest/easiest way too handle the state across multiple runs would be to change state into an int and keep count of the number of searches run (this is a way of implementing c on your list). On the first search anything without state == 1 is unvisited, on the second run anything without state == 2 is unvisited etc. This means you don't need to reset anything after a search. Doesn't scale with multithreading without a bit more thought, but your current structures don't allow for that anyway.

Are there enough nodes to make A* worthwhile? It's quite common for the overhead to make it slower than djikstra on small graphs.

Overhead meaning keeping track of F/g/h etc. and resetting values after searches?

Speed will be tested on 100000 nodes (and thus more connections than that). I don�t know what is large enough.

I need to find:
- any path (maybe bfs/dfs is suitable for this because of smaller overhead)
- path with least nodes
- path with least friction (using connections with heaviest weight)
- shortest path (nodes have x,y coordinates in 2D space so literally shortest)

And I optimistically thought A* would be perhaps flexible enough to cover all the cases.

Ihmemies fucked around with this message at 13:42 on Nov 15, 2023

# ? Nov 15, 2023 13:31

Adbot: ADBOT LOVES YOU

# ? Jun 7, 2024 15:12

Ihmemies: Oct 6, 2012

Actually even creating the graph dynamically is tough� oh crap. Quite a jump in difficulty from project 1 to 2.

# ? Nov 15, 2023 16:57

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > C/C++ Programming Questions Not Worth Their Own Thread

«‹›641 »