Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
New Yorp New Yorp
Jul 18, 2003

Only in Kenya.
Pillbug

LongSack posted:

Question about MSTest and test order.

I have a series of tests for a registration service that must run in sequence, as prior tests set up conditions in the service for following tests. According to this post, the tests should run in alphabetical order, so I named the tests Test01_.... through Test06_..... They are not, however, executing in alphabetical order.

I've added them to a playlist to work around this, and that works, but it seems that it should be unnecessary given the link above. Are other test frameworks (i.e., NUnit or XUnit) better at this? TIA

Tests should never rely on previous tests having been run in order to pass. Any individual test should set up all of the preconditions necessary to pass.

Adbot
ADBOT LOVES YOU

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

LongSack posted:

Question about MSTest and test order.

I have a series of tests for a registration service that must run in sequence, as prior tests set up conditions in the service for following tests. According to this post, the tests should run in alphabetical order, so I named the tests Test01_.... through Test06_..... They are not, however, executing in alphabetical order.

I've added them to a playlist to work around this, and that works, but it seems that it should be unnecessary given the link above. Are other test frameworks (i.e., NUnit or XUnit) better at this? TIA

I wasn't aware there was a way to control the order that tests run in. What order do you see them run in? Is it the same every time, or does it vary? Is there any observable pattern (e.g. order of definition in the class)? I assume the tests you're worrying about are all in the same class? Is there a difference between the behaviour of mstest.exe and vstest.console.exe?

LongSack
Jan 17, 2003

New Yorp New Yorp posted:

Tests should never rely on previous tests having been run in order to pass. Any individual test should set up all of the preconditions necessary to pass.

I knew that would be one of the responses, and you're correct, of course. I'll fix them.

Hammerite posted:

I wasn't aware there was a way to control the order that tests run in. What order do you see them run in? Is it the same every time, or does it vary? Is there any observable pattern (e.g. order of definition in the class)? I assume the tests you're worrying about are all in the same class? Is there a difference between the behaviour of mstest.exe and vstest.console.exe?

According to the documentation, they should run in alphabetical order. As for the rest, I don't know - I use a playlist now, and I've never run vstest.console.exe. I'm going to fix the tests as suggested by NYNY.

Thanks both for the responses.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe
anyone know their way around Azure DevOps? I have a YAML build pipeline that includes tasks for building a solution and running unit tests. At present all tasks run (in sequence, depending on the success of previous tasks). I want to modify it so that it works as follows:

1. When code is merged in to master, the build pipeline is run. The solution is built, but unit tests are not run. When someone creates a PR, the build pipeline is run and has to succeed for the PR to go ahead - same deal, unit tests don't run.
2. Every night at a fixed time, the build pipeline is run. The solution is built, and unit tests are run.
3. When a user manually triggers a build, they have to choose whether the unit tests are run or not.

I can achieve any of 1, 2 and 3 individually (and I can do 1 and 2 together) but I'm not clear on how to combine 1 or 2 with 3. The documentation pages are completely lacking in any information about how parameters relate to scheduled or triggered pipeline runs. What's the value of a parameter if the pipeline was triggered as a CI build or on a schedule? The default value? What if there's no default value?

This is self-hosted ADO. The pipeline is a simple one - single YAML file, single stage, single job, several tasks. No parameters or conditional tasks as yet (but the aim is to make the unit-tests task conditional).

raminasi
Jan 25, 2005

a last drink with no ice

Hammerite posted:

anyone know their way around Azure DevOps? I have a YAML build pipeline that includes tasks for building a solution and running unit tests. At present all tasks run (in sequence, depending on the success of previous tasks). I want to modify it so that it works as follows:

1. When code is merged in to master, the build pipeline is run. The solution is built, but unit tests are not run. When someone creates a PR, the build pipeline is run and has to succeed for the PR to go ahead - same deal, unit tests don't run.
2. Every night at a fixed time, the build pipeline is run. The solution is built, and unit tests are run.
3. When a user manually triggers a build, they have to choose whether the unit tests are run or not.

I can achieve any of 1, 2 and 3 individually (and I can do 1 and 2 together) but I'm not clear on how to combine 1 or 2 with 3. The documentation pages are completely lacking in any information about how parameters relate to scheduled or triggered pipeline runs. What's the value of a parameter if the pipeline was triggered as a CI build or on a schedule? The default value? What if there's no default value?

This is self-hosted ADO. The pipeline is a simple one - single YAML file, single stage, single job, several tasks. No parameters or conditional tasks as yet (but the aim is to make the unit-tests task conditional).

You can determine the pipeline trigger source by checking the right combination of predefined variables and use the result to conditionally run stages. IIRC you sometimes have to use a dummy job or stage or something to map the defined variables into your own because of some of the arcane rules around when parameters are visible but that's the birds-eye view of how I've done this kind of thing before.

New Yorp New Yorp
Jul 18, 2003

Only in Kenya.
Pillbug

Hammerite posted:

anyone know their way around Azure DevOps? I have a YAML build pipeline that includes tasks for building a solution and running unit tests. At present all tasks run (in sequence, depending on the success of previous tasks). I want to modify it so that it works as follows:

1. When code is merged in to master, the build pipeline is run. The solution is built, but unit tests are not run. When someone creates a PR, the build pipeline is run and has to succeed for the PR to go ahead - same deal, unit tests don't run.
2. Every night at a fixed time, the build pipeline is run. The solution is built, and unit tests are run.
3. When a user manually triggers a build, they have to choose whether the unit tests are run or not.

I can achieve any of 1, 2 and 3 individually (and I can do 1 and 2 together) but I'm not clear on how to combine 1 or 2 with 3. The documentation pages are completely lacking in any information about how parameters relate to scheduled or triggered pipeline runs. What's the value of a parameter if the pipeline was triggered as a CI build or on a schedule? The default value? What if there's no default value?

This is self-hosted ADO. The pipeline is a simple one - single YAML file, single stage, single job, several tasks. No parameters or conditional tasks as yet (but the aim is to make the unit-tests task conditional).

Why would you want PRs to be merged without confirming that all of the unit tests run and pass? That sounds completely wrong. You are hugely reducing the utility and defeating the purpose of unit testing if you're relying on people to periodically manually run the tests or on nightly scheduled builds. Your worst case feedback loop is 24 hours between a test breaking and people knowing that the test broke. Meanwhile, there is no excuse to be merging code with broken tests.

New Yorp New Yorp fucked around with this message at 18:45 on Jun 8, 2021

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

New Yorp New Yorp posted:

Why would you want PRs to be merged without confirming that all of the unit tests run and pass? That sounds completely wrong. You are hugely reducing the utility and defeating the purpose of unit testing if you're relying on people to periodically manually run the tests or on nightly scheduled builds. Your worst case feedback loop is 24 hours between a test breaking and people knowing that the test broke. Meanwhile, there is no excuse to be merging code with broken tests.

The unit tests take a long time to run (nearly an hour) and this is a compromise between allowing speedy resolution of work items, having confidence that code that gets into master builds successfully, and having confidence that the code in master passes all the tests.

I wrote a bunch more words, but then I deleted them, because the details really aren't important. Do not respond to this post, please, unless your response is going to consist of something more than the kind of declarations of religious conviction that are in the above quote.

LOOK I AM A TURTLE
May 22, 2003

"I'm actually a tortoise."
Grimey Drawer
If you haven't already you could look into the "run only impacted tests" option, although I've had little luck with it myself.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe
btw I did read raminasi's post and that is the kind of thing I expect I will end up doing, I'm just ticked off that MS haven't bothered to explain in their documentation what happens to parameters if the pipeline is started automatically. I found it so irritating I filed feedback on github complaining about it like some kind of huge nerd.

LOOK I AM A TURTLE posted:

If you haven't already you could look into the "run only impacted tests" option, although I've had little luck with it myself.

I don't feel like that offers anything. In spite of what I just wrote in response to New Yorp New Yorp about compromising between two things, it seems like that would be an unsatisfying compromise. I feel like you would end up with time being devoted to the test run and yet without the confidence of knowing that all your tests ran and they all passed.

raminasi
Jan 25, 2005

a last drink with no ice

Hammerite posted:

btw I did read raminasi's post and that is the kind of thing I expect I will end up doing, I'm just ticked off that MS haven't bothered to explain in their documentation what happens to parameters if the pipeline is started automatically. I found it so irritating I filed feedback on github complaining about it like some kind of huge nerd.

I don't feel like that offers anything. In spite of what I just wrote in response to New Yorp New Yorp about compromising between two things, it seems like that would be an unsatisfying compromise. I feel like you would end up with time being devoted to the test run and yet without the confidence of knowing that all your tests ran and they all passed.

If you're running all the impacted tests, then you can assume that unimpacted tests that were green before the change will be green after it and not lose any confidence. Depending on what exactly your tests are doing this may be difficult or impossible, though.

zokie
Feb 13, 2006

Out of many, Sweden
What I would look into is if you really need al those tests from the start, are you measuring coverage? Why are they sooo slow? Are you a massive monolith? Maybe you can split the tests out into multiple projects and only run some?

I would really like to have green tests before a PR is merged, and you should have them run on all CI builds at least, so even if I close a PR without them I get an email about a broken build today rather than tomorrow.

But really, if you don’t feel that you need the tests to be confident in a PR being merged to master what use are they? When it comes to C# I’ve seen a lot of “tests” that are just magic reflection creating test cases but not really testing anything…

I really like AutoFixture though and have adopted doing something similar for TypeScript.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe
I should have invented some other use case for parameters in ADO, because all anyone is doing is quibbling over whether it's ok to treat unit tests like that or not, and not the thing I actually asked about

WorkerThread
Feb 15, 2012

If you ask how to use a hammer for driving screws, you just have to accept that people are going to ask why you're doing that.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

WorkerThread posted:

If you ask how to use a hammer for driving screws, you just have to accept that people are going to ask why you're doing that.

that's not a good analogy because no-one is disputing that parameters are a means of controlling which tasks in an azure pipeline run. yet that has no equivalent in your analogy (hammer => parameters, driving screws => controlling whether unit tests run, ??? => controlling whether tasks run)

your analogy needs 3 elements, not 2

here, i'll do it for you: "if you ask how to use a toolbox to carry a hammer for driving screws, you have to blah blah blah"

we could still agree that it is valid to want to use a toolbox to carry a hammer. hence why I should have invented another thing for the hammer to do, so you wouldn't get distracted accusing me of trying to drive screws with a hammer. like i said

Bruegels Fuckbooks
Sep 14, 2004

Now, listen - I know the two of you are very different from each other in a lot of ways, but you have to understand that as far as Grandpa's concerned, you're both pieces of shit! Yeah. I can prove it mathematically.

Hammerite posted:

that's not a good analogy because no-one is disputing that parameters are a means of controlling which tasks in an azure pipeline run. yet that has no equivalent in your analogy (hammer => parameters, driving screws => controlling whether unit tests run, ??? => controlling whether tasks run)

your analogy needs 3 elements, not 2

here, i'll do it for you: "if you ask how to use a toolbox to carry a hammer for driving screws, you have to blah blah blah"

we could still agree that it is valid to want to use a toolbox to carry a hammer. hence why I should have invented another thing for the hammer to do, so you wouldn't get distracted accusing me of trying to drive screws with a hammer. like i said

people get stuck on the word "unit" and for good reason - it's pretty unlikely that "unit" tests are going to take an hour to run because generally, unit tests should not be doing anything expensive like file io etc, it should just be covering your in-house logic and everything slow should stubs/fakes/etc. unit tests shouldn't do poo poo like actually write/read files or actually read/write from the database.

the tests you have are called "integration tests." that's the magic word to make people not freak about about your slow tests or not running them every time you build the product.

ideally, you should have both.

adaz
Mar 7, 2009

Yep, just say you only want your integration or end to end tests to run on deploys that makes a lot more sense to folk and fits with how most apps are built although there are exceptions of course. And as others pointed out using the built in variables to control your job stages and which run is by far the best method of controlling this. Same thing in Gitlab/Github for the same problem.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

Bruegels Fuckbooks posted:

people get stuck on the word "unit" and for good reason - it's pretty unlikely that "unit" tests are going to take an hour to run because generally, unit tests should not be doing anything expensive like file io etc, it should just be covering your in-house logic and everything slow should stubs/fakes/etc. unit tests shouldn't do poo poo like actually write/read files or actually read/write from the database.

the tests you have are called "integration tests." that's the magic word to make people not freak about about your slow tests or not running them every time you build the product.

ideally, you should have both.

we call them unit tests. you could instead call them "regression tests" I suppose. basically the product is a validation system for certain kinds of complicated data files and we have hundreds of validation checks programmed in, most with their own tests that run the validation check against examples of real data (both passing and failing) and assert about what results come out. there are also tests in there that are unit tests, as in ones that even a person really fussy about terminology would call that.

zokie
Feb 13, 2006

Out of many, Sweden
I was more thinking along the line of: why do you need to drive so many screws?

Which is why I suggested looking into coverage, you might see that the difference between having 100 real files tested and 10 is several minutes of time for only a minuscule increase in test coverage.

Maybe there are a lot of test files that overlap enough that if you test A, D, and E you can skip B and C.

For what you are doing it would make sense to have a regression test suite with a real world example for ever bug ever and you run that nightly. But you could probably do something where if you add new test files those are run for the PR, but the old ones are left for the nightly.

But for your CI build pick out enough tests to give a nice level of coverage, maybe you can remove tests that just re-verify things that are covered elsewhere. Personally I mostly use true unit tests for when I need them to TDD the logic and otherwise try to have my tests match requirements or behaviors of the system at a higher level.

I think it’s important to keep the tests and test results close to the developer and give them the shortest possible feedback cycle. Otherwise you’ll end up with useless tests because no one will run them.

Jabor
Jul 16, 2010

#1 Loser at SpaceChem
Are there individual test cases in those that take an hour to run, or does it just take a long time to run the tests in aggregate because there are so many and you don't have enough machines to run them all in parallel?

insta
Jan 28, 2009

New Yorp New Yorp posted:

Why trigger on a timer? It seems like you'd want to use a queue and skip the blob storage bit entirely.

NO QUEUE.

ONLY FIRST-IN FIRST-OUT MECHANISM WITH PERSISTENCE.


I have no idea why so many teams gently caress up queues *so badly*

epswing
Nov 4, 2003

Soiled Meat

insta posted:

NO QUEUE.

ONLY FIRST-IN FIRST-OUT MECHANISM WITH PERSISTENCE.


I have no idea why so many teams gently caress up queues *so badly*

FIFO is how queues work, what are you taking about.

insta
Jan 28, 2009
thats the joke.


So many teams will reinvent queuing mechanisms with databases and polling instead of just using a queue.

mystes
May 31, 2006

insta posted:

So many teams will reinvent queuing mechanisms with databases and polling instead of just using a queue.
This must be one of those queue-anon conspiracy theories I've been hearing about.

epswing
Nov 4, 2003

Soiled Meat

insta posted:

thats the joke.

There are no jokes in software development. Only varying degrees of pain!

epswing fucked around with this message at 22:38 on Jun 12, 2021

distortion park
Apr 25, 2011


To be fair, SQL databases with polling can give you the queueing semantics you want, while still using a known quantity in terms of persistence, operations and transactions. Whereas some new queuing platform is likely to be very painful at first. Plus you will hopefully work out at some point that using a queue at all was a mistake, like 90% of queue usage is bad.

Ola
Jul 19, 2004

No database polling! Only an event-driven architecture with persistence.

namlosh
Feb 11, 2014

I name this haircut "The Sad Rhino".
Yeah, as someone who’s used queues a ton, mostly in Tibco and mostly for persistence or other workflows, someone wanting to poll a db/rest/soap service instead seems like absolute craziness to me.

Our databases have better things to do

Horn
Jun 18, 2004

Penetration is the key to success
College Slice
I'm looking for a new personal project to freshen up my skills and I think I've landed on creating a cocktail database with focus on being able to easily import recipes. Does anyone know of any libraries that make it easy to parse out semi-structured text into tokens? I'd like to be able to just paste in something like "1 3/4 oz whiskey" and "2 dashes bitters" and end up with the amount, unit, and ingredient broken out without having to write ugly regex's. Antlr seems like it'll be able to do what I'm looking for but no idea what kind of work is involved there or if some better option exists.

Bruegels Fuckbooks
Sep 14, 2004

Now, listen - I know the two of you are very different from each other in a lot of ways, but you have to understand that as far as Grandpa's concerned, you're both pieces of shit! Yeah. I can prove it mathematically.

Horn posted:

I'm looking for a new personal project to freshen up my skills and I think I've landed on creating a cocktail database with focus on being able to easily import recipes. Does anyone know of any libraries that make it easy to parse out semi-structured text into tokens? I'd like to be able to just paste in something like "1 3/4 oz whiskey" and "2 dashes bitters" and end up with the amount, unit, and ingredient broken out without having to write ugly regex's. Antlr seems like it'll be able to do what I'm looking for but no idea what kind of work is involved there or if some better option exists.

antlr is not what you're looking for - it's intended as a parser for structured languages. the use case would generally be along the lines of "I want to make my own simple programming language" or "I would like to write tools that analyze code." The problem then is that all the recipes would have to be written in this language you make up. You probably don't want to do that.

recipes probably fall more under the category of natural language processing/understanding - e.g. microsoft has a text analytics api (https://azure.microsoft.com/en-us/services/cognitive-services/text-analytics/). The idea would be that you use the library to parse the string and figure out the quantity / what is being requested. The advantage of using NLP to read recipes imo is that there's a surprising number of ways quantity can be specified in recipes (and it would also handle issues like different spellings, etc.) I don't have a ready library that exactly solves the problem you're posing, but that would be the area I'd look under.

NihilCredo
Jun 6, 2011

iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

Horn posted:

I'm looking for a new personal project to freshen up my skills and I think I've landed on creating a cocktail database with focus on being able to easily import recipes. Does anyone know of any libraries that make it easy to parse out semi-structured text into tokens? I'd like to be able to just paste in something like "1 3/4 oz whiskey" and "2 dashes bitters" and end up with the amount, unit, and ingredient broken out without having to write ugly regex's. Antlr seems like it'll be able to do what I'm looking for but no idea what kind of work is involved there or if some better option exists.

Parser combinators are often a decent middle ground choice for tasks that are too complex for regexes but not complex enough to warrant a parser generator like ANTLR.

Sprache is, I think, the most approachable .NET library for parser combinators. There's a bunch of tutorials linked from its github page: https://github.com/sprache/Sprache

Horn
Jun 18, 2004

Penetration is the key to success
College Slice
Thanks for the feedback, this is completely out of my wheelhouse so I wasn't sure how to even evaluate these libraries without a ton of work.

epswing
Nov 4, 2003

Soiled Meat
I've got an EF/SQL/LINQ question. Say I've got a table with Id (int not null sequence) and Message (string) columns. I want to produce the last 5 distinct messages in reverse order (most recent first).

For example, given the following data:
pre:
Id	Message
1	apple
2	apple
3	apple
4	apple
5	banana
6	banana
7	dusty
8	elephant
9	elephant
10	hello
The result should be: hello, elephant, dusty, banana, apple

First cut

C# code:
IEnumerable<string> RecentMessages()
{
    return messageTable
        .OrderByDescending(x => x.Id)
        .Select(x => x.Message)
        .Distinct()
        .Take(5);
}
Oops, Distinct scrambles the ordering. OK, maybe I'll Distinct before OrderByDescending, but since this is now on the whole record I need to specify how to compare with an IEqualityComparer.

C# code:
IEnumerable<string> RecentMessages()
{
    return messageTable
        .Distinct(new MessageComparer())
        .OrderByDescending(x => x.Id)
        .Select(x => x.Message)
        .Take(5);
}

class MessageComparer : IEqualityComparer<Message>
{
    public bool Equals(Message x, Message y) => x.Message == y.Message;

    public int GetHashCode(Message obj)
    {
        if (obj == null)
            return 0;
        return obj.Message.GetHashCode() ^ obj.Message.GetHashCode();
    }
}
Oops, this can't be converted into a store expression, silly. OK, getting into whack-a-mole territory, I guess I could OrderByDescending, read the whole table into C#, and include my own DistinctIterator (partially stolen from a SO answer):

C# code:
IEnumerable<string> RecentMessages()
{
    var messages = messageTable
        .OrderByDescending(x => x.Id)
        .Select(x => x.Message);
    
    return DistinctIterator(messages).Take(5);
}

IEnumerable<TSource> DistinctIterator<TSource>(IEnumerable<TSource> source, IEqualityComparer<TSource> comparer = null)
{
    HashSet<TSource> set = comparer != null
        ? new HashSet<TSource>(comparer)
        : new HashSet<TSource>();
    
    foreach (TSource element in source)
        if (set.Add(element))
            yield return element;
}
This works, but I really don't want to load the whole table into C#. I could do something hacky like Take(1000) in RecentMessages and only operate on at most 1000 records instead of the whole table, but that still sucks. How can I get this done in the DB via EF?

Edit: Actually, re-reading the final solution, because DistinctIterator uses yield, and everything is IEnumerable, and I'm not ToList'ing anywhere, isn't everything 'streaming' (i.e. not buffering the whole table somewhere) and this will actually just load 5 records via Take and stop there? Either way, I'm still interested in a way to do this entirely with EF if possible.

epswing fucked around with this message at 15:35 on Jun 21, 2021

EssOEss
Oct 23, 2006
128-bit approved
My experience with Entity Framework is that it tends to get in trouble very fast once you start using fancy words like group by and join. I immediately ran into client-side evaluation or EF just giving up with some internal query builder exception when I last tried to use it for nontrivial cases. I only have positive experience using it as a simple row serializer/deserializer and suggest you expect no more from it.

Thus I do not know if EF can actually turn this into nice SQL but my first try at this would be something like:

code:
record Row(int Id, string Message);

class Program
{
    static void Main(string[] args)
    {
        var data = new Row[]
        {
            new(1, "apple"),
            new(2, "apple"),
            new(3, "apple"),
            new(4, "apple"),
            new(5, "banana"),
            new(6, "banana"),
            new(7, "dusty"),
            new(8, "elephant"),
            new(9, "elephant"),
            new(10, "hello"),
        };

        var groupedByMessage = data.GroupBy(row => row.Message);

        var messageAndMaxId = groupedByMessage.Select(group => new
        {
            Message = group.Key,
            MaxId = group.Max(row => row.Id)
        });

        var fiveMostRecentMessages = messageAndMaxId
            .OrderByDescending(x => x.MaxId)
            .Select(x => x.Message)
            .Take(5);

        Console.WriteLine(string.Join(", ", fiveMostRecentMessages));
    }
}

Mr Shiny Pants
Nov 12, 2012

epswing posted:

I've got an EF/SQL/LINQ question. Say I've got a table with Id (int not null sequence) and Message (string) columns. I want to produce the last 5 distinct messages in reverse order (most recent first).

Snip....

The first one, after the Select do a ToList()? It should make a new list that you can Distinct() on and take(5).

Just read your edit: nvm then.



Mr Shiny Pants fucked around with this message at 16:27 on Jun 21, 2021

epswing
Nov 4, 2003

Soiled Meat

Mr Shiny Pants posted:

The first one, after the Select do a ToList()? It should make a new list that you can Distinct() on and take(5).

I believe Distinct for both Linq to Entities and Linq to Objects will not maintain ordering.

New Yorp New Yorp
Jul 18, 2003

Only in Kenya.
Pillbug
Although I'm not a huge fan of "throw another nuget package at it!" and I'm not sure if it works with IQueryable vs IEnumerable, MoreLINQ has a DistinctBy extension method that probably fits the bill here.

Personally I'd just do the groupby.

ThePeavstenator
Dec 18, 2012

:burger::burger::burger::burger::burger:

Establish the Buns

:burger::burger::burger::burger::burger:

epswing posted:

I've got an EF/SQL/LINQ question. Say I've got a table with Id (int not null sequence) and Message (string) columns. I want to produce the last 5 distinct messages in reverse order (most recent first).

For example, given the following data:
pre:
Id	Message
1	apple
2	apple
3	apple
4	apple
5	banana
6	banana
7	dusty
8	elephant
9	elephant
10	hello
The result should be: hello, elephant, dusty, banana, apple

First cut

C# code:
IEnumerable<string> RecentMessages()
{
    return messageTable
        .OrderByDescending(x => x.Id)
        .Select(x => x.Message)
        .Distinct()
        .Take(5);
}
Oops, Distinct scrambles the ordering. OK, maybe I'll Distinct before OrderByDescending, but since this is now on the whole record I need to specify how to compare with an IEqualityComparer.

C# code:
IEnumerable<string> RecentMessages()
{
    return messageTable
        .Distinct(new MessageComparer())
        .OrderByDescending(x => x.Id)
        .Select(x => x.Message)
        .Take(5);
}

class MessageComparer : IEqualityComparer<Message>
{
    public bool Equals(Message x, Message y) => x.Message == y.Message;

    public int GetHashCode(Message obj)
    {
        if (obj == null)
            return 0;
        return obj.Message.GetHashCode() ^ obj.Message.GetHashCode();
    }
}
Oops, this can't be converted into a store expression, silly. OK, getting into whack-a-mole territory, I guess I could OrderByDescending, read the whole table into C#, and include my own DistinctIterator (partially stolen from a SO answer):

C# code:
IEnumerable<string> RecentMessages()
{
    var messages = messageTable
        .OrderByDescending(x => x.Id)
        .Select(x => x.Message);
    
    return DistinctIterator(messages).Take(5);
}

IEnumerable<TSource> DistinctIterator<TSource>(IEnumerable<TSource> source, IEqualityComparer<TSource> comparer = null)
{
    HashSet<TSource> set = comparer != null
        ? new HashSet<TSource>(comparer)
        : new HashSet<TSource>();
    
    foreach (TSource element in source)
        if (set.Add(element))
            yield return element;
}
This works, but I really don't want to load the whole table into C#. I could do something hacky like Take(1000) in RecentMessages and only operate on at most 1000 records instead of the whole table, but that still sucks. How can I get this done in the DB via EF?

Edit: Actually, re-reading the final solution, because DistinctIterator uses yield, and everything is IEnumerable, and I'm not ToList'ing anywhere, isn't everything 'streaming' (i.e. not buffering the whole table somewhere) and this will actually just load 5 records via Take and stop there? Either way, I'm still interested in a way to do this entirely with EF if possible.

IEnumerable/yield just means that evaluation/enumeration will be done lazily, but the IEnumerable data source is almost certainly from an in-memory collection in this case (if it was still being done on the DB it would still be an IQueryable).

Ask yourself this - can you write a raw query to accomplish what you want solely on the DB? If the answer is no, then there’s nothing EF will be able to do to make that happen either.

raminasi
Jan 25, 2005

a last drink with no ice

epswing posted:

I've got an EF/SQL/LINQ question. Say I've got a table with Id (int not null sequence) and Message (string) columns. I want to produce the last 5 distinct messages in reverse order (most recent first).

For example, given the following data:
pre:
Id	Message
1	apple
2	apple
3	apple
4	apple
5	banana
6	banana
7	dusty
8	elephant
9	elephant
10	hello

What would you want returned if your table looked like this instead?

pre:
Id	Message
1	apple
2	apple
3	apple
4	apple
5	banana
6	banana
7	dusty
8	elephant
9	dusty
10	hello

epswing
Nov 4, 2003

Soiled Meat

raminasi posted:

What would you want returned if your table looked like this instead?

pre:
Id	Message
1	apple
2	apple
3	apple
4	apple
5	banana
6	banana
7	dusty
8	elephant
9	dusty
10	hello

hello, dusty, elephant, banana, apple

To produce that result, I'm traversing the list in reverse order, skipping those I've seen before.

Is "the last 5 distinct messages in reverse order" not descriptive enough, or missing information? I guess there's some ambiguity relating to "distinct" here. I want the earlier duplicates removed. Maybe "unique" is a better word than "distinct"?

epswing fucked around with this message at 17:53 on Jun 21, 2021

Adbot
ADBOT LOVES YOU

raminasi
Jan 25, 2005

a last drink with no ice

epswing posted:

hello, dusty, elephant, banana, apple

To produce that result, I'm traversing the list in reverse order, skipping those I've seen before.

Is "the last 5 distinct messages in reverse order" not descriptive enough, or missing information? I guess there's some ambiguity relating to "distinct" here. I want the earlier duplicates removed. Maybe "unique" is a better word than "distinct"?

I just saw ambiguity in the order you wanted to apply "last 5" and "distinct" and wanted to make sure I was on the same page as you (and that you even had an answer in mind). I'll echo that you should try to write the raw SQL query to do what you want before trying to wrangle EF into doing it, although since I don't know anything about EF I can't be much help with the second step. (If I were writing the raw query I'd probably create something that arranged the data into (Message, Greatest ID Attached To Message) tuples, ordered the list by the second item descending, and took the messages from the first five ordered tuples. You could do that with subqueries or probably also window functions.)

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply