Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Sweeper
Nov 29, 2007
The Joe Buck of Posting
Dinosaur Gum

leper khan posted:

That looks correct.

disgusting

lets talk indentation

tabs for indenting, spaces for alignment

Adbot
ADBOT LOVES YOU

leper khan
Dec 28, 2010
Honest to god thinks Half Life 2 is a bad game. But at least he likes Monster Hunter.

Sweeper posted:

disgusting

lets talk indentation

tabs for indenting, spaces for alignment

I want to do tabs to indent spaces to align, but no one I've worked with was able to keep that consistent. So only spaces it is.

Sweeper
Nov 29, 2007
The Joe Buck of Posting
Dinosaur Gum

leper khan posted:

I want to do tabs to indent spaces to align, but no one I've worked with was able to keep that consistent. So only spaces it is.

tools don't support it well so if it can't be done automatically it can't be done at all, thats the problem with formatting

Presto
Nov 22, 2002

Keep calm and Harry on.

roomforthetuna posted:

Yes it still is sometimes an else after a brace, you're just saying you should format that like a maniac. You appear to be suggesting this monstrosity?
code:
if (banana) {
  doStuff();
}
else {
  doOtherStuff();
}
Yes. That monstrosity is the correct way. I will not budge.

Dijkstracula
Mar 18, 2003

You can't spell 'vector field' without me, Professor!

roomforthetuna posted:

code:
if (banana) {
  doStuff();
}
else {
  doOtherStuff();
}

big "code generated by a shell script" energy in this indentation style

qsvui
Aug 23, 2003
some crazy thing

roomforthetuna posted:

code:
if (banana) doStuff();
else doOtherStuff();

Zopotantor
Feb 24, 2013

...und ist er drin dann lassen wir ihn niemals wieder raus...

leper khan posted:

That looks correct.

There is something wrong with your eyes.

Apex Rogers
Jun 12, 2006

disturbingly functional

Same line braces, none of this mixed crap. That way lies madness

ShoulderDaemon
Oct 9, 2003
support goon fund
Taco Defender
When I first started at my current employer, one of the simulators I became owner of was consistently formatted like this:
code:
if
    (foo) doStuff();
    else
{   doOtherStuff();
    doSomethingElse(); }
As near as I could tell, only one other person had ever touched the code, and they had left the company more than 5 years before I came onboard. Their choice of indentation was not even close to being the worst thing about it.

I deleted the entire codebase and rewrote the thing.

roomforthetuna
Mar 22, 2005

I don't need to know anything about virii! My CUSTOM PROGRAM keeps me protected! It's not like they'll try to come in through the Internet or something!

ShoulderDaemon posted:

When I first started at my current employer, one of the simulators I became owner of was consistently formatted like this:
code:
if
    (foo) doStuff();
    else
{   doOtherStuff();
    doSomethingElse(); }
I think we should all standardize on this, to better reflect the actual quality of the code.

mmkay
Oct 21, 2010

ShoulderDaemon posted:

When I first started at my current employer, one of the simulators I became owner of was consistently formatted like this:
code:
if
    (foo) doStuff();
    else
{   doOtherStuff();
    doSomethingElse(); }
As near as I could tell, only one other person had ever touched the code, and they had left the company more than 5 years before I came onboard. Their choice of indentation was not even close to being the worst thing about it.

I deleted the entire codebase and rewrote the thing.

How did you find my university assignments?

Nalin
Sep 29, 2007

Hair Elf
2 space indentation, except use 4 when you need to point out the really important parts of your code

Ihmemies
Oct 6, 2012

4 space indentation is best, it prevents you from doing horrible pyramids of doom.. 2 space lets you construct huge nested crap without you realizing it.

---

Anyways, I'm at a school project yet again. Basically we have a database of publications and affiliations, and everything is done with C++ without external libs. Threading is not allowed. Program adds, removes, modifies and queries data based on different parameters and sorting orders.

I did first this:

C++ code:
    struct Affiliation {
        AffiliationID id;
        Name name;
        Coord coord;
    };

    struct Publication {
        PublicationID id;
        Name name;
        Year year;
        std::vector<AffiliationID> affiliations;
    };

    std::unordered_map<AffiliationID, Affiliation> affiliationDetails;
    std::unordered_map<PublicationID, Publication> publicationDetails;
    std::unordered_map<AffiliationID, std::vector<PublicationID>> affiliationToPublication;

    std::unordered_map<Coord, AffiliationID, CoordHash> affiliationCoords;
    std::unordered_map<PublicationID, PublicationID> childToParent;
    std::unordered_map<PublicationID, std::vector<PublicationID>> parentToChildren;

    bool affiliationsChanged = false;
    bool affiliationsCoordsChanged = false;
    vector<AffiliationID> affiliationsAlphabetically = {};
    vector<AffiliationID> affiliationsDistanceIncreasing = {};
This is my minimum effort implementation, and it works as requested. I added the affiliationsAlphabetically and affiliationsDistanceIncreasing vectors after realizing, that some functions must return data faster on subsequent runs. So basically some functions like get_affiliations_distance_increasing or get_affiliations_alphabetically save the vector before returning, and just return the saved vector, if data was not modified.

That made the subsequent function calls a magnitude faster, because it sorts the data only once, and returns pre-sorted data if data did not change.

Next I realized that performance is estimated only with the following functions:

code:
7573 * 10 / 161 (affiliation_info) +
85601857 * 1 / 161 (get_all_affiliations) +
146749592 * 5 / 161 (get_affiliations_alphabetically) +
128233483 * 5 / 161 (get_affiliations_distance_increasing) +
124079 * 10 / 161 (find_affiliation_with_coord) +
9032 * 10 / 161 (change_affiliation_coord) +
6145 * 10 / 161 (get_publications_after) +
2128 * 100 / 161 (publication_info) +
25529 * 10 / 161 (get_referenced_by_chain) = 9083562
Despite their low weight, the two heaviest functions get_affiliations_alphabetically and get_affiliations_distance_increasing cause 94% of the total performance weight (9083562).

So I could use functions like add_affiliation or remove_affiliation to keep a few always sorted indexes up to date, without it affecting the performance measured by perftests. Of course I'd have to do some extra work in change_affiliation_coord too, but maybe it would not be that bad.

My question is: does it make sense to keep a few pre-sorted index lists, and keep the lists sorted during add/remove/modify operations, instead of sorting the data only when requested?

As a downside, n adds would be O(n^2) instead of O(n), for example..

2nd question is: would it be fast enough to just iterate over the data once and add the distances to a 3 item std::priority_queue (min heap) with function get_affiliations_closest_to. It must return 3 closest affiliations to a random coordinate.

Or should I start researching & implementing some kind of spatial indexing solution for that..? :effort:

Ihmemies fucked around with this message at 13:38 on Nov 6, 2023

nielsm
Jun 1, 2009



I would say a std::map (or std::multimap, if you can have duplicate names) from name to id, keyed on the alphabetical collation of the names. You just need to supply a Compare function to the std::map template to get the ordering you want.

Jabor
Jul 16, 2010

#1 Loser at SpaceChem
Instead of a vector, you could use a sorted data structure that offers O(log n) inserts, like std::set. But since it doesn't look like the insert time is actually part of the grading rubric, making inserts as slow as you like so that the actually graded functions are as fast as possible is probably the way to go.

For your second question, only you can know whether a single pass over every element is "fast enough" for a good grade.

Ihmemies
Oct 6, 2012

Jabor posted:

Instead of a vector, you could use a sorted data structure that offers O(log n) inserts, like std::set. But since it doesn't look like the insert time is actually part of the grading rubric, making inserts as slow as you like so that the actually graded functions are as fast as possible is probably the way to go.

Hmm yes, I could use std::set with a custom comparator. Comparator would compare the names or distances to 0,0 for the affiliations, and hold the id's.

The problem here is that I need to return a vector<AffiliationID>. I forgot to mention that. Basically all the queries want a vector back, and creating a vector from set is O(n) operation (slow).. which kind of defeats the purpose of doing sorting beforehand.

Affiliations sorted by name? They want a vector with id's. Affiliations sorted by distance to 0,0? A vector with id's as return value.. etc.

quote:

For your second question, only you can know whether a single pass over every element is "fast enough" for a good grade.

Least effort way is to do it as I described, and see if it's good enough. There are 5 additional functions and that is one of them, if they are fast enough the points are just awareded directly.

nielsm
Jun 1, 2009



Copying one vector to another is also O(n), although with smaller constants.

Ihmemies
Oct 6, 2012

Meh. I guess I'll just wait it out, the final "performance targets" or whatever should be locked a week before deadline. So I can see a week before deadline if my program is too slow or not. And defer doing any extra optimizations to that moment. Implementing a set or a map sorted by custom comparator should not take too much time.

Grading works like this: every student's programs are performance tested, and then compared with each other. Average performance gets a 3, piss poor gets 1, very good performing programs get a 5. So the performance target is moving, and depends on how good/bad the other people are with their performance optimizations.

I used hashmaps because they check everything with Valgrind, and my pointer handling is never flawless. With STD's data structures you can be fairly sure you don't gently caress up memory management at least.

Ihmemies fucked around with this message at 14:54 on Nov 6, 2023

leper khan
Dec 28, 2010
Honest to god thinks Half Life 2 is a bad game. But at least he likes Monster Hunter.

Ihmemies posted:

Hmm yes, I could use std::set with a custom comparator. Comparator would compare the names or distances to 0,0 for the affiliations, and hold the id's.

The problem here is that I need to return a vector<AffiliationID>. I forgot to mention that. Basically all the queries want a vector back, and creating a vector from set is O(n) operation (slow).. which kind of defeats the purpose of doing sorting beforehand.

Affiliations sorted by name? They want a vector with id's. Affiliations sorted by distance to 0,0? A vector with id's as return value.. etc.

Least effort way is to do it as I described, and see if it's good enough. There are 5 additional functions and that is one of them, if they are fast enough the points are just awareded directly.

O(n) in elements over a collection is O(1) per element. It's not that slow. memcpy is O(n) in memory size to be copied.

You need to look at what n defines to determine whether the thing is fast or slow. Otherwise I'll just redefine n to confound you to think slow operations are fast and fast operations are slow.

It's impossible to create a vector of n elements in less than O(n) time. So anything you do to create a vector will be at least O(n). You're free to perform as many [constant number valued] O(n) operations as you like without impacting the asymptotic bound.

Subjunctive
Sep 12, 2006

✨sparkle and shine✨

Ihmemies posted:

Grading works like this: every student's programs are performance tested, and then compared with each other. Average performance gets a 3, piss poor gets 1, very good performing programs get a 5. So the performance target is moving, and depends on how good/bad the other people are with their performance optimizations.

What an incredibly bad way to structure this.

Dijkstracula
Mar 18, 2003

You can't spell 'vector field' without me, Professor!

What you're describing is often called a secondary index.

You could go further than your proposal: observe that your secondary index structure would always be "mostly in order" (from previous sorts), so re-sorting the whole thing involves a lot of unnecessary work. So, how about maintaining a sorted structure with the preexisting entries, as you propose, and have a side data structure where new inserts will go. On removal, just find the entry in the sorted index and mark it as invalid with a special sentinel entry (this is called "tombstoning" sometime) so you don't take the linear-time penalty of removing from the vector and shifting all the subsequent elements down. Modifying an element, if it changes the ordering, would involve both tombstoning the old version and inserting the new version; in essence, a deletion followed by an isertion.


vector<AffiliationID> affiliationsAlphabetically; // Invariant: always ordered
vector<AffiliationId> affiliationAdditions // Invariant: none! Just push_back() into this on inserts.


(edit: of course, morally, `affiliationAdditions` could well be an std::set.)

When the user wants the Vector with all entries returned, sort the smaller one (or use a set where ordering is always maintained, whatever), which would be O(m * log m) for m <<< n, then merge the two structures together (which is O(n), since they're now both sorted, so you can scan through both), skipping over any tombstoned values. As others have said, if you're returning a Vector by copy so you can't do better than O(n) anyway

Dijkstracula fucked around with this message at 15:41 on Nov 6, 2023

Ihmemies
Oct 6, 2012

Subjunctive posted:

What an incredibly bad way to structure this.

I am glad I am not the only bad thing here :D Yes it makes sense that copying a vector is O(n), I am just being bad.

I implemented two sets with custom comparators, since that was relatively straightforward. As a bonus my comparison logic moved over from the mess of class implementation to my header... which looks nicer.

Before, doing a sort once, and returning cached data in subsequent queries:
code:
7573 * 10 / 161 (affiliation_info) +
85601857 * 1 / 161 (get_all_affiliations) +
146749592 * 5 / 161 (get_affiliations_alphabetically) +
128233483 * 5 / 161 (get_affiliations_distance_increasing) +
124079 * 10 / 161 (find_affiliation_with_coord) +
9032 * 10 / 161 (change_affiliation_coord) +
6145 * 10 / 161 (get_publications_after) +
2128 * 100 / 161 (publication_info) +
25529 * 10 / 161 (get_referenced_by_chain) = 9083562
After, returning the already ordered set as a vector..
code:
7569 * 10 / 161 (affiliation_info) +
85601859 * 1 / 161 (get_all_affiliations) +
80499006 * 5 / 161 (get_affiliations_alphabetically) +
80455190 * 5 / 161 (get_affiliations_distance_increasing) +
123938 * 10 / 161 (find_affiliation_with_coord) +
86683 * 10 / 161 (change_affiliation_coord) +
6141 * 10 / 161 (get_publications_after) +
2128 * 100 / 161 (publication_info) +
25529 * 10 / 161 (get_referenced_by_chain) = 5547107
Overall the program is a lot faster with performance tested functions, but as a bad side change_affiliation_coord is a lot slower now.

I forgot to cache the return vectors after the set change, so my optimization score went to zero... despite the functions performing a lot faster! Oh well.

The sets with custom comparator seem to be a "good enough" solution, basically it's as fast as retrieving all the affiliations, because the affiliation id's are already pre-sorted. I still need to re-add the caching of vectors so I don't get penalties from not being optimized :D

Tombstone thing looks interesting, but maybe it's not neccessary after using the sets already gave such a big perf boost in the tests which matter.

It adds significant execution time to operations like add though, but that isn't measured so :shrug:

code:
Without indexed sets:
          N ,    add (sec) ,   cmds (sec) ,  total (sec)
        100 ,    0.0003712 ,     3.31e-05 ,    0.0004043
       1000 ,    0.0030583 ,    0.0002615 ,    0.0033198
      10000 ,    0.0299062 ,    0.0026818 ,     0.032588
     100000 ,     0.303994 ,    0.0336798 ,     0.337674
    1000000 ,      3.33097 ,     0.503077 ,      3.83405

With indexed sets:

      N ,    add (sec) ,   cmds (sec) ,  total (sec)
    100 ,    0.0004797 ,    0.0001669 ,    0.0006466
   1000 ,    0.0053462 ,    0.0002854 ,    0.0056316
  10000 ,    0.0604075 ,    0.0027036 ,    0.0631111
 100000 ,     0.811617 ,    0.0314377 ,     0.843055
1000000 ,       12.334 ,     0.658712 ,      12.9927

Ihmemies fucked around with this message at 17:37 on Nov 6, 2023

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
Are you submitting code and the professor chooses how to compile it and with what flags, or is there room for somebody to forget to pass flags to optimize and get a bad grade?

Ihmemies
Oct 6, 2012

Twerk from Home posted:

Are you submitting code and the professor chooses how to compile it and with what flags, or is there room for somebody to forget to pass flags to optimize and get a bad grade?

We have an online course environment, we add gitlab commit hashes as submissions.
So the school’s automatic grader pulls code from gitlab and compiles and runs it automatically.

OddObserver
Apr 3, 2009
Well, at least they are encouraging you to use version control.

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
The only micro things that stand out to me are that you’re keeping some of the publication and affiliation data in separate maps instead of storing that in the same record, as if you can’t just put a std::vector in a struct or something. Maybe that makes sense given the access patterns in your API, but it could also be forcing extra lookups. You could also store the distance from the home coord used by get_affiliations_distance_increasing (I assume that’s how that API works?) so that you don’t have to recompute that in your comparator. Note that std::unordered_map has stable references — the entries don’t have to get moved as the hash table grows, so there’s no surprise penalty for making those records bigger or more expensive to move.

Just caching the results of some queries is a reasonable way to optimize them if changes are uncommon.

For getting the 3 closest to a particular point, you could definitely use a spatial index if this is an important operation. Otherwise, it’s a bit if a shame to build a vector with every ID in it and then drop most of them; you could hand-roll this, just remembering the IDs and distances for your current top three.

Ihmemies
Oct 6, 2012

Now the problem is, that I lost optimization points from get_affiliations_alphabetically and get_affiliations_distance_increasing functions.


In Plussa's prg1 grading system (specifically perfestimate-grader), 150 optimization points are available. The functions subject to optimization are:

get_affiliations_alphabetically
get_affiliations_distance_increasing
find_affiliation_with_coord

There are two testing approaches for optimization points:

The approach for functions get_affiliations_alphabetically and get_affiliations_distance_increasing.
Another approach for the function find_affiliation_with_coord.

For both testing approaches, we calculate what we call the "optimization percentage," which is a number between -% and 100%:

- To be considered optimized, the performance should, on average, be more efficient than single executions. A more efficient execution has a lower command counter value.

The optimization percentage helps determine if a function has been optimized. When the optimization percentage is significantly negative, it means that a single execution is more efficient than the average of multiple executions. Conversely, when the percentage is significantly positive, it indicates that a single execution is less efficient than the average performance.

Optimization points are granted if the optimization percentage is greater than 2%.

Note: The values of command counters may vary depending on the computer, and optimization testing should be conducted without any compiler optimizations.


So the repetitive function calls should be faster than the 1st one.

Since the data is basically already sorted, what is there still to optimize? How could I return the data even faster? Do I need to have some kind of delay with the first function call occurs, where the function spends a bit of an extra time just waiting...

Ihmemies
Oct 6, 2012

rjmccall posted:

The only micro things that stand out to me are that you’re keeping some of the publication and affiliation data in separate maps instead of storing that in the same record, as if you can’t just put a std::vector in a struct or something. Maybe that makes sense given the access patterns in your API, but it could also be forcing extra lookups. You could also store the distance from the home coord used by get_affiliations_distance_increasing (I assume that’s how that API works?) so that you don’t have to recompute that in your comparator. Note that std::unordered_map has stable references — the entries don’t have to get moved as the hash table grows, so there’s no surprise penalty for making those records bigger or more expensive to move.

Yes, I could definitely store more data in the Affiliation and Publication structs. I also should compute the distance to 0,0 and store it. It probably doesn't affect performance, only the resource usage. I'm not sure if anyone will look at my code though so I don't think there's an incentive to make it look "nice".

Dijkstracula
Mar 18, 2003

You can't spell 'vector field' without me, Professor!

I assume so, but, is the testing API fixed? Could you return a shared_ptr<vector<...>> or the vector by reference instead of the vector by value?

Ihmemies posted:

Yes, I could definitely store more data in the Affiliation and Publication structs. I also should compute the distance to 0,0 and store it. It probably doesn't affect performance, only the resource usage.
hard to say until you benchmark it, but wider datatypes will be less CPU cache-friendly so, even though the prefetcher will figure out that you're walking through an array, you may take a hit from there

Ihmemies
Oct 6, 2012

Dijkstracula posted:

I assume so, but, is the testing API fixed? Could you return a shared_ptr<vector<...>> or the vector by reference instead of the vector by value?

It is fixed. The header file's public part has all the function definitions, and they are not allowed to be changed.

quote:

hard to say until you benchmark it, but wider datatypes will be less CPU cache-friendly so, even though the prefetcher will figure out that you're walking through an array, you may take a hit from there

And this goes beyond the scope I understand, or can estimate the down/upsides.. maybe I should test it. At least calculating the distance when adding or modifying an affiliation super good idea.

Dijkstracula
Mar 18, 2003

You can't spell 'vector field' without me, Professor!

Ihmemies posted:

And this goes beyond the scope I understand, or can estimate the down/upsides
for sure, and it isn't just you - this sort of performance tuning is hard to reason about and so "try it and see" is typically the best practice anyway

(though, now that I think about it some more, if your dataset isn't at least multiple-megabytes in size then I doubt you'd see enough cache pressure for what I said to matter anyway)

Ihmemies
Oct 6, 2012

Edit: here I am being stupid again. OF course the id's must be first removed from the sets, then the map, because set's comparators depend on the map data... and it causes undefined behaviour otherwise.

... the indexed sets cause some weird behaviour. Has anyone ever experienced keys returning back to an unordered map, with values initialized to default values... after erasing key from the map.

C++ code:
bool Datastructures::remove_affiliation(AffiliationID id)
{
    const auto it = affiliations.find(id);
    if (it == affiliations.end()) {
        return false;
    }

    std::vector<PublicationID> aff_publications = it->second.publications;

    if(affiliations.erase(id) != 1) {
        std::cerr << "Error while erasing an affiliation with id " << id << std::endl;
        return false;
    }
    affiliationsSortedByName.erase(id);
    affiliationsSortedByCoord.erase(id);

    for (const auto& publicationID : aff_publications)
    {
        auto pub_it = publications.find(publicationID);

        if (pub_it != publications.end())
        {
            auto& affIDs = pub_it->second.affiliations;
            auto newEnd = std::remove_if(affIDs.begin(), affIDs.end(),
                [id](const AffiliationID& affID) {
                    return affID == id;
                });
            affIDs.erase(newEnd, affIDs.end());
        }
    }

    return true;
}
affiliations.erase(id) works.
At next step in affiliationsSortedByName.erase(id) - the key returns to affiliations map.........? :psyduck:

Surely set's comparators can't cause something like this.

Ihmemies fucked around with this message at 21:11 on Nov 6, 2023

Ihmemies
Oct 6, 2012

My erase order was wrong, it caused undefined behaviour when the set’s comparator was missing values.

Anyways, I need to store some affiliations (universities) in a weighted graph and make queries to it. Shortest path etc. I thought about something like

C++ code:
std::unordered_map<AffiliationID, std::vector<std::pair<AffiliationID, double>>> adjacencyList;
since my pointer handling sucks. Double would contain the weight, which is calculated based on common Publications between 2 Affiliations. Low weight causes more friction between paths. Then I read about Fibonacci heaps, which might be faster especially in Djikstra’s priority queue and in general with my graph too, with large enough datasets.

Does it make any sense to attempt to implement a custom Fibonacci heap instead of using standard STL containers? Data insertion speed or memory usage doesn’t really matter, since it isn’t measured, only the path finding must be fast.

Ihmemies fucked around with this message at 09:23 on Nov 13, 2023

Xarn
Jun 26, 2015
In any case, don't bother with fib heaps.

If data insertion isn't measured, precalculate all the paths on insertion and just do constant time queries, lmao.

Beef
Jul 26, 2004

Ihmemies posted:

My erase order was wrong, it caused undefined behaviour when the set’s comparator was missing values.

Anyways, I need to store some affiliations (universities) in a weighted graph and make queries to it. Shortest path etc. I thought about something like

C++ code:
std::unordered_map<AffiliationID, std::vector<std::pair<AffiliationID, double>>> adjacencyList;
since my pointer handling sucks. Double would contain the weight, which is calculated based on common Publications between 2 Affiliations. Low weight causes more friction between paths. Then I read about Fibonacci heaps, which might be faster especially in Djikstra’s priority queue and in general with my graph too, with large enough datasets.

Does it make any sense to attempt to implement a custom Fibonacci heap instead of using standard STL containers? Data insertion speed or memory usage doesn’t really matter, since it isn’t measured, only the path finding must be fast.

No. Just use a sparse matrix to represent your graph, compressed row (CSR) format is most common. https://en.wikipedia.org/wiki/Sparse_matrix

Ihmemies
Oct 6, 2012

Xarn posted:

In any case, don't bother with fib heaps.

If data insertion isn't measured, precalculate all the paths on insertion and just do constant time queries, lmao.

That’s what I did in part 1, I presorted lists for queries during insertion so I can return pre-sorted lists. I asked now, and aparently the insertion is timed somehow, since some students got a fail from their insertion speed tests.

But for my part1 solution, a 4x command count increase because of the sorts during insertion was still OK. So apparently a limit exists, but is quite lax, on insertion speed... I can try to do the path calculation during insertion, but it actually might be too slow.

Ihmemies
Oct 6, 2012

Edit: I think I need a book. Like perhaps https://www.amazon.com/Introduction-Algorithms-fourth-Thomas-Cormen/dp/026204630X

I am still trying to understand this assigment. I think I managed to dynamically create all the connections between nodes, and increase the connection's weight for added publications.

The problem is now, how to use the connections. I have to do 4 different pathfinding functions, with different heuristics. A* would fit it well, but if I implement my nodes like this:

C++ code:
   struct Affiliation {
        AffiliationID id;
        AffiliationID parent;
        Coord coord;
        Name name;
        std::vector<PublicationID> aff_pubs;
        std::vector<ConnectionID> conn_ids;
        NodeState state;
        float g;    // cheapest path cost from start to this
        float h;    // heuristic cost (distance to end node)
        float f;    // G + H cost (= current best total cost guess)
    };

Connections are stored in a separate map:
    unordered_map<ConnectionID, Connection> conn_map;

Connections contain start and end nodes and the weight:
struct Connection
{
    AffiliationID aff1 = NO_AFFILIATION;
    AffiliationID aff2 = NO_AFFILIATION;
    Weight weight = NO_WEIGHT;
    float length; 

    // Overloaded == operator to determine if Connections are equal
    bool operator==(const Connection& c1) const{
        return (aff1==c1.aff1) && (aff2==c1.aff2) && (weight==c1.weight);
    }

};
Running A* once will alter the state of my data. So either I'd have to
a) reset the state of my nodes after each pathfinding run
b) keep a list of modified nodes, and only reset them afterwards
c) mark nodes as "used" or something and reset them only when they are encountered again in a new pathfinding run..
d) maybe create some kind of temporary graph based on the Affiliations and delete it after pathfinding
e) something else?

In any case I think I need a different structure for storing the pathfinding related information, like this:

C++ code:
struct AStarNodeData {
    float g = INFINITY; 
    float h = 0; 
    float f = INFINITY; 
    AffiliationID parent; 
    NodeState state = NodeState::UNVISITED; /
};

std::unordered_map<AffiliationID, AStarNodeData> a_star_data;
All of this was touched extremely lightly on the course lecture&materials, so I'm kind of on my own here... Based on this, I can most likely implement the A* with perhaps case b) (keep list of modified AStarNodeDatas and reset them afterwards) being the best compromise. I'd add the AStarNodeDatas initially while adding new affiliations.

CSR is a bit out of my league, unless I can find some "ELI5" guide for implementing that :v:

Ihmemies fucked around with this message at 12:27 on Nov 15, 2023

robostac
Sep 23, 2009
The fastest/easiest way too handle the state across multiple runs would be to change state into an int and keep count of the number of searches run (this is a way of implementing c on your list). On the first search anything without state == 1 is unvisited, on the second run anything without state == 2 is unvisited etc. This means you don't need to reset anything after a search. Doesn't scale with multithreading without a bit more thought, but your current structures don't allow for that anyway.

Are there enough nodes to make A* worthwhile? It's quite common for the overhead to make it slower than djikstra on small graphs.

Ihmemies
Oct 6, 2012

robostac posted:

The fastest/easiest way too handle the state across multiple runs would be to change state into an int and keep count of the number of searches run (this is a way of implementing c on your list). On the first search anything without state == 1 is unvisited, on the second run anything without state == 2 is unvisited etc. This means you don't need to reset anything after a search. Doesn't scale with multithreading without a bit more thought, but your current structures don't allow for that anyway.

Are there enough nodes to make A* worthwhile? It's quite common for the overhead to make it slower than djikstra on small graphs.

Overhead meaning keeping track of F/g/h etc. and resetting values after searches?

Speed will be tested on 100000 nodes (and thus more connections than that). I don’t know what is large enough.

I need to find:
- any path (maybe bfs/dfs is suitable for this because of smaller overhead)
- path with least nodes
- path with least friction (using connections with heaviest weight)
- shortest path (nodes have x,y coordinates in 2D space so literally shortest)

And I optimistically thought A* would be perhaps flexible enough to cover all the cases.

Ihmemies fucked around with this message at 13:42 on Nov 15, 2023

Adbot
ADBOT LOVES YOU

Ihmemies
Oct 6, 2012

Actually even creating the graph dynamically is tough… oh crap. Quite a jump in difficulty from project 1 to 2.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply