Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
dougdrums
Feb 25, 2005
CLIENT REQUESTED ELECTRONIC FUNDING RECEIPT (FUNDS NOW)

a dangerous thot posted:

That looks really neat, thanks, I'll be sure to check both suggestions out

One thing I forgot to add is how nice this modular design would be because you can make great use out of a visualization library when you could treat any event, function, or field as a signal to make an arbitrary graph. just make probes at runtime as you desired, or just make further events for signal behaviour.. onrise, onfall, double click etc. Having the things you want emerge from design and use rather than always having the foresight of explicitly extensions

If you know and want to use C#, you could implement an IQueryable provider that spits out ExpandoObjects. Your IQueryable provider manipulates its ExpressionTree to provide the "stacks of delegates", perhaps as more IQueryables. When you get a mutated version of your object from your IQueryable, you can replace the properties of the old ExpandoObject while handling its PropertyChanged event. Also you could just straight up use events in ExpandoObjects (as in, ExpandoObject allows appending arbitrary event handlers to a property).

Some MSDN links:
https://msdn.microsoft.com/en-us/library/system.dynamic.expandoobject(v=vs.110).aspx
https://msdn.microsoft.com/en-us/library/bb351562(v=vs.110).aspx
https://msdn.microsoft.com/en-us/library/system.linq.expressions.expression(v=vs.110).aspx

This is a lot of "reinventing the wheel" though. You could just use some lisp or templating ... but I think erlang is the most suitable, from what I understand of what you're trying to do ...

Adbot
ADBOT LOVES YOU

huhu
Feb 24, 2006
I'm about a month and a half in to my first software engineering job. I've been doing engineering for 4 years now in other fields and have been doing programming for about 3 years now in my free time.

I'm on a team of 3 and I'm trying to make suggestions and everything has gotten shot down with either "we tried that once and encountered a bug but we don't remember what it was" or "we like the way things are done now". Three of those suggestions have been to let me develop locally with Ubuntu instead of Windows, since our production is cent os, use ParsleyJS instead of creating our own form validation from scratch, and use WTForms with Flask instead of doing the old JS/HTML/AJAX/jQuery way. Is this how dev usually is?

TooMuchAbstraction
Oct 14, 2012

I spent four years making
Waves of Steel
Hell yes I'm going to turn my avatar into an ad for it.
Fun Shoe

TheMostFrench posted:

I had to look up some things like Yield, which seem interesting.

I get an error at this line:
loser = min(votes, key=votes.get())

code:
loser = min(votes, key=votes.get()) 
TypeError: get expected at least 1 arguments, got 0
I'm not familiar with .get() so I'll look that up. My understanding of that line is that assigns the candidate with the least votes to 'loser', but if it's not getting any information does that mean nothing is being passed to it from the dictionary?

get() here is a method of the "votes" dict object. You're supposed to pass in a key, and it returns the corresponding value. The nice thing about get is that you can pass a default value to return if the key is not found. Like so:
code:
foo = {"a": 2}
foo["a"] # 2
foo.get("a") # 2
foo.get("b") # KeyError
foo.get("b", 3) # 3
EDIT:

huhu posted:

Is this how dev usually is?

It's not uncommon. Development processes usually have a lot of inertia behind them, for good and bad reasons. The good reasons mostly boil down to "actually, our processes do work effectively and we're not interested in wasting time on the millions of fad processes that show up every year." The bad reasons mostly boil down to "we're afraid of change and/or having to learn something new." It's up to you which of those is happening here, but as the newbie, you're unfortunately going to have to put in a lot of extra work to convince people that you know what you're talking about, if you want to change anything.

TooMuchAbstraction fucked around with this message at 16:33 on May 26, 2017

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

huhu posted:

Is this how dev usually is?

No. It's how jobs usually are.

Eela6
May 25, 2007
Shredded Hen

TheMostFrench posted:

I had to look up some things like Yield, which seem interesting.

I get an error at this line:
loser = min(votes, key=votes.get())

code:
loser = min(votes, key=votes.get()) 
TypeError: get expected at least 1 arguments, got 0
I'm not familiar with .get() so I'll look that up. My understanding of that line is that assigns the candidate with the least votes to 'loser', but if it's not getting any information does that mean nothing is being passed to it from the dictionary?

Ah! The key argument for min accepts a function: I wanted to pass
Python code:
loser = min(votes, key=votes.get)
# note we pass the method votes.get by ommiting the parentheses
To go into a little more detail, when we call the min() function, it finds the minimum element among the keys of our dictionary, not the values.
IN:
Python code:
remaining_candidates = {'bernie', 'hillary', 'trump'}
votes = {'bernie': 40, 'hillary': 11, 'trump': 12}
loser = min(votes)
remaining_candidates -= {loser}
print(remaining_candidates)
OUT:
code:
 {'trump', 'hillary'}
This is wrong. We need to pass a key function that returns the value of the dictionary. I chose the dict.get() method because it's a built-in that's a little faster. We could use a named function, build a lambda to do it, or use the built-in .get()

Python code:
def get_votes(candidate):
    return votes[candidate]

# these are equivalent

loser = min(votes, key = lambda candidate: votes[candidate])
loser = min(votes, key = get_votes)
loser = min(votes, key = votes.get)
Let's try it again!

IN:
Python code:
remaining_candidates = {'bernie', 'hillary', 'trump'}
votes = {'bernie': 40, 'hillary': 11, 'trump': 12}
loser = min(votes, key=votes.get)
remaining_candidates -= {loser}
print(remaining_candidates)
OUT:
Python code:
{'bernie', 'trump'}
As we can see, bernie would have won.

Eela6 fucked around with this message at 20:27 on May 26, 2017

TheMostFrench
Jul 12, 2009

Stop for me, it's the claw!



Eela6 posted:


As we can see, bernie would have won.

Thanks for the tip about min(votes, key = votes.get), that was causing me some issues with my solution (which I now figured out, below). I know it isn't very neat and probably not practical at all, but I'm glad that I was able to figure it out using just what I know right now.

code:

def ballot_count(round):
        for line in ballot:
            line_num = line.index(str(round))
            if str(round) in line:
                for index in line:
                    index_b = int(index)-1
                    if cand[(index_b)] in dic3:
                        if cand[(line_num)] == cand[(index_b)]:
                            dic3[cand[(index_b)]] += 1
                            loser = min(dic3, key = dic3.get)

        
        print("Vote round ", round)
        
        if len(dic3) > 1:
            print(dic3)
            dic3.pop(loser, None)
            print()
            print(loser, "was eliminated")
            print(len(dic3), "candidates remaining.\n")
            
        if len(dic3) == 1:
            print(dic3, "is the winner!")
        
        ballot_count(1)
        ballot_count(2)
        ballot_count(3)
        ballot_count(4)
My next challenge is in dealing with invalid votes.

I have yet to get your solution to fully work, it throws an error when reading the ballots, because it reads each ballot as a string and doesn't know how to deal with the commas between numbers. I'm confident that I know how to figure that out, since I did it for my own solution.

idiotmeat
Apr 3, 2010

TheMostFrench posted:


My next challenge is in dealing with invalid votes.

I have yet to get your solution to fully work, it throws an error when reading the ballots, because it reads each ballot as a string and doesn't know how to deal with the commas between numbers. I'm confident that I know how to figure that out, since I did it for my own solution.


code:

split()
Is your friend in terms of separating the commas. For validation, you can use regex parsing methods provided by the "re" package.

As a side note, you might be overcomplicating the problem by reading the file every round. If every round is independent, why not do all tge tallies at once?

Eela6
May 25, 2007
Shredded Hen

TheMostFrench posted:

I have yet to get your solution to fully work, it throws an error when reading the ballots, because it reads each ballot as a string and doesn't know how to deal with the commas between numbers. I'm confident that I know how to figure that out, since I did it for my own solution.

Try

Python code:

import csv
for reading and writing comma-separated values.

It's better to use code you understand than to copy other's' code blindly. As you grow in your understanding of python your code will gradually grow cleaner and more 'idomatic'.

Hughmoris
Apr 21, 2007
Let's go to the abyss!
Working my way through the Think Perl 6 book and I'm trying to figure out a clever way to solve the second part of an exercise:

quote:

Write a subroutine named avoids that takes a word and a string of forbidden letters, and that returns True if the word doesn’t use any of the forbidden letters. Next, modify your program to prompt the user to enter a string of forbidden letters and then print the number of words that don’t contain any of them.

Can you find a combination of five forbidden letters that excludes the smallest number of words?

I have a words.txt file that has 113k words. My initial idea was to brute force it but I'm guessing there is likely a better way. Any ideas?

*The word list: http://greenteapress.com/thinkpython2/code/words.txt

Hughmoris fucked around with this message at 19:17 on May 29, 2017

LOOK I AM A TURTLE
May 22, 2003

"I'm actually a tortoise."
Grimey Drawer

Hughmoris posted:

Working my way through the Think Perl 6 book and I'm trying to figure out a clever way to solve the second part of an exercise:


I have a words.txt file that has 113k words. My initial idea was to brute force it but I'm guessing there is likely a better way. Any ideas?

Count the frequency distribution of the letters in the input file and start with the least common ones.

ulmont
Sep 15, 2010

IF I EVER MISS VOTING IN AN ELECTION (EVEN AMERICAN IDOL) ,OR HAVE UNPAID PARKING TICKETS, PLEASE TAKE AWAY MY FRANCHISE

LOOK I AM A TURTLE posted:

Count the frequency distribution of the letters in the input file and start with the least common ones.

That will definitely get you a "good enough" solution. For optimal you might need to look at least common 2 letter or 3 letter combinations.

The Laplace Demon
Jul 23, 2009

"Oh dear! Oh dear! Heisenberg is a douche!"

Hughmoris posted:

Working my way through the Think Perl 6 book and I'm trying to figure out a clever way to solve the second part of an exercise:

quote:

Write a subroutine named avoids that takes a word and a string of forbidden letters, and that returns True if the word doesn’t use any of the forbidden letters. Next, modify your program to prompt the user to enter a string of forbidden letters and then print the number of words that don’t contain any of them.

Can you find a combination of five forbidden letters that excludes the smallest number of words?

I have a words.txt file that has 113k words. My initial idea was to brute force it but I'm guessing there is likely a better way. Any ideas?

This is the minimum k-union problem, and it's NP-hard. You can prune the search space a bit by noticing that if there are 5 letters that occur in 1 word, then you don't need to try choosing letters occurring in 6 or more words. Don't forget to recalculate the frequences after each choice.

ultrafilter
Aug 23, 2007

It's okay if you have any questions.


There are only about 65,000 groups of five letters. That's easily brute forceable if you have a data structure that lets you not iterate over every word for every letter. Have you learned about hashes and arrays yet?

Hughmoris
Apr 21, 2007
Let's go to the abyss!
Thanks for the ideas, I'll see what I can figure out.

ultrafilter posted:

There are only about 65,000 groups of five letters. That's easily brute forceable if you have a data structure that lets you not iterate over every word for every letter. Have you learned about hashes and arrays yet?

I'm familiar with hashes and arrays from poking around Python but this particular book hasn't covered that topic yet.

ultrafilter
Aug 23, 2007

It's okay if you have any questions.


If you build a frequency table, you can see how many words are excluded by a given letter without having to look for that letter in every word.

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.

ultrafilter posted:

If you build a frequency table, you can see how many words are excluded by a given letter without having to look for that letter in every word.

That still only gets you an approximate solution. The problem is iterating over all combinations of five letter, not iterating over the words. Of course there's only ~8M combinations of five letters, so you might as well just brute force it.

ultrafilter
Aug 23, 2007

It's okay if you have any questions.


You're counting abcde and abdce as separate. That's not right.

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.
Derp. Yeah, but that only reinforces my point. Just brute force it.

Eela6
May 25, 2007
Shredded Hen

KernelSlanders posted:

That still only gets you an approximate solution. The problem is iterating over all combinations of five letter, not iterating over the words. Of course there's only ~8M combinations of five letters, so you might as well just brute force it.

Sure. A combination of the approaches seems best to me. Use a frequency table to cut down the letters in consideration, then test the top 12* or so remaining. The problem space of (12 choose 5) is way more manageable than (26 choose 5).

*12 is arbitrary. It just 'feels right' to me, for what that's worth.

Steve French
Sep 8, 2003

Eela6 posted:

Sure. A combination of the approaches seems best to me. Use a frequency table to cut down the letters in consideration, then test the top 12* or so remaining. The problem space of (12 choose 5) is way more manageable than (26 choose 5).

*12 is arbitrary. It just 'feels right' to me, for what that's worth.

And this is still not guaranteed to yield the correct result.

Eela6
May 25, 2007
Shredded Hen

Steve French posted:

And this is still not guaranteed to yield the correct result.

No. It will, though (assuming you're using real words and not just the arbitrary string "well, actually" repeated over and over again). Letters in English are not remotely uniformly distributed.

Eela6 fucked around with this message at 18:51 on May 29, 2017

ultrafilter
Aug 23, 2007

It's okay if you have any questions.


Now that I think about it, the frequency table approach won't work. The problem is that it can tell you how many words you exclude by picking 'b' or 'c', but it won't tell you how many words you exclude by picking 'b' and 'c'. You need to be able to compute set intersections.

My original idea was to have a hash where the keys are letters and the values are arrays of words that contain that letter. That lets you compute set intersections, and then you can get the answer relatively easily.

Hughmoris
Apr 21, 2007
Let's go to the abyss!

Hughmoris posted:

Write a subroutine named avoids that takes a word and a string of forbidden letters, and that returns True if the word doesn’t use any of the forbidden letters. Next, modify your program to prompt the user to enter a string of forbidden letters and then print the number of words that don’t contain any of them.

Can you find a combination of five forbidden letters that excludes the smallest number of words?

Here is the answer I came up with but I'm beginning to think there are multiple levels that I'm missing: [Q,J,X,Z,W]

Here is the word list if anyone is bored and wants to try their hand: http://greenteapress.com/thinkpython2/code/words.txt

Hughmoris fucked around with this message at 19:17 on May 29, 2017

Eela6
May 25, 2007
Shredded Hen
A (python) implementation of my approach:
Python code:
from collections import Counter
from itertools import combinations
from typing import Iterable, List

def combination_of_5_letters_with_fewest_forbidden_words(words: Iterable[str]) -> str:
    def avoids(word: str, forbidden: set) -> bool:
        return any(char in forbidden for char in word)
    
    def total_avoided(combo: List[str]) -> int:
        combo = set(combo)
        return sum(avoids(word, combo) for word in words)

    words_letter_appears_in = Counter(char for word in words for char in set(words))
    letters_under_consideration = (sorted(words_letter_appears_in,
                             key=words_letter_appears_in.get)[:15])
    return max((combo for combo in combinations(letters_under_consideration, 5)),
             key = total_avoided)

Eela6 fucked around with this message at 19:24 on May 29, 2017

Dr. Stab
Sep 12, 2010
👨🏻‍⚕️🩺🔪🙀😱🙀

Eela6 posted:

No. It will, though (assuming you're using real words and not just the arbitrary string "well, actually" repeated over and over again). Letters in English are not remotely uniformly distributed.

Wouldn't a non-uniform distribution be more likely to have this method give you a wrong answer? If you assume that the letters are uniformly distributed, then the top 5 letters are most likely correct. But if, like in a language, letters appear in clusters, you're more likely to have weird outlier scenarios. Like, eg, if a tenth of all words started with "qwert" and "qwert" didn't appear anywhere else, those letters would be very frequent, but also be a good candidate for the best set. I know that's a contrived scenario, but I just mean to demonstrate that uniform distribution would help rather than hinder this approach.

Also, I really don't agree with this idea considering that the original problem is not especially unfeasible. An 83 fold speedup is cool, but not at the expense of correctness, especially since you can run the brute force solution for the same amount of time and get a similar result.

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.

Dr. Stab posted:

Wouldn't a non-uniform distribution be more likely to have this method give you a wrong answer? If you assume that the letters are uniformly distributed, then the top 5 letters are most likely correct. But if, like in a language, letters appear in clusters, you're more likely to have weird outlier scenarios. Like, eg, if a tenth of all words started with "qwert" and "qwert" didn't appear anywhere else, those letters would be very frequent, but also be a good candidate for the best set. I know that's a contrived scenario, but I just mean to demonstrate that uniform distribution would help rather than hinder this approach.

Also, I really don't agree with this idea considering that the original problem is not especially unfeasible. An 83 fold speedup is cool, but not at the expense of correctness, especially since you can run the brute force solution for the same amount of time and get a similar result.

The question isn't whether they are uniformly distributed, it's whether they are independently distributed (hint: they're not). That means that the pair of letters that eliminates the fewest words is not necessarily the two letters that each individually eliminate the fewest and second fewest words.

Hughmoris
Apr 21, 2007
Let's go to the abyss!

Eela6 posted:

A (python) implementation of my approach...

What are the 5 letters it yields with the provided word list? http://greenteapress.com/thinkpython2/code/words.txt

Eela6
May 25, 2007
Shredded Hen

Dr. Stab posted:

Wouldn't a non-uniform distribution be more likely to have this method give you a wrong answer? If you assume that the letters are uniformly distributed, then the top 5 letters are most likely correct. But if, like in a language, letters appear in clusters, you're more likely to have weird outlier scenarios. Like, eg, if a tenth of all words started with "qwert" and "qwert" didn't appear anywhere else, those letters would be very frequent, but also be a good candidate for the best set. I know that's a contrived scenario, but I just mean to demonstrate that uniform distribution would help rather than hinder this approach.

Also, I really don't agree with this idea considering that the original problem is not especially unfeasible. An 83 fold speedup is cool, but not at the expense of correctness, especially since you can run the brute force solution for the same amount of time and get a similar result.

This is a valid criticism. Thank you for explaining; I basically agree now. I don't think there's anything to lose by, say, eliminating the top 10 letters; the vowels are not going to show up in an answer to this question, full stop.

Hughmoris posted:

What are the 5 letters it yields with the provided word list?
http://greenteapress.com/thinkpython2/code/words.txt

Double Edit:
Actually, now I disagree with Dr. Stab again. The following is an alternative approach based off set intersection. Interestingly, for memory reasons, I had to cut down the search space somewhat here, because otherwise I was going over the edge of my 16GB of RAM, so I implemented the same pre-filter. The results were somewhat surprising!

Python code:

from collections import defaultdict, Counter
from itertools import combinations
from functools import reduce
from operator import or_

class Words():
    def __init__(self, filename):
        self.filename = filename
    def __iter__(self):
        with open(self.filename) as file:
            for line in file:
                yield line[:-1] # chop extra newlines\
        
def find_character_set_that_excludes_fewest_words(filename, k):
    words = Words(filename)

    def find_least_common_letters(words, k):
        words_letter_appears_in = Counter(char for word in words for char in set(word))
        least_common_letters = (sorted(words_letter_appears_in,
                                key=words_letter_appears_in.get)[:k])
        return least_common_letters

    def map_characters_to_words_that_contain_them(words):
        words_with_char= defaultdict(list)
        for word in words:
            for char in set(word):
                words_with_char[char].append(word)
        return {char: set(words) for char, words in words_with_char.items()}

    letters_under_consideration = find_least_common_letters(words, k)
    words_with_char = map_characters_to_words_that_contain_them(words)
    n_combos = words_with_char
    for n in range(2, 6):
        n_minus_1_combos = n_combos
        n_combos = {}
        for combination in combinations(letters_under_consideration, n):
            combo_str = ''.join(sorted(combination))
            base_str, new_addition = combo_str[:-1], combo_str[-1]
            union = n_minus_1_combos[base_str] | words_with_char[new_addition]
            n_combos[combo_str] = union
        n_minus_1_combos = n_combos

    n_combos = {key: len(item) for key, item in n_combos.items()}
    bottom_5 = sorted(n_combos, key = n_combos.get)[:5]
    return bottom_5

filename = #$filename
if __name__ == '__main__':
    
    result_10 = find_character_set_that_excludes_fewest_words(filename, 10)
    result_12 = find_character_set_that_excludes_fewest_words(filename, 12)
    result_15 = find_character_set_that_excludes_fewest_words(filename, 15)
    assert result_10 == result_12 and result_10 == result_15
    print(result_15)

The bottom 5 combinations are the following (note: these are the # of EXCLUDED words):
code:
[('jqwxz', 17384),
('jqvxz', 17742), 
('jkqxz', 17945), 
('fjqxz', 20026), 
('jqxyz', 21515)], 
As we can see, these are all clustered around the least frequent characters in general. In fact, we get the same bottom 5 strings if we limit the characters searched to the bottom 15, the bottom 12, or even the bottom 10.

Even more strikingly, the correct result is the one obtained by a simple frequency approach! That is, the string 'jqwxz'. This could not be guaranteed, of course, but it reinforces my belief that a frequency-based filter was a way to go. There is of course a chance that if I had included all 26 letters that my result would be different; the chance is 0. :toxx:

Eela6 fucked around with this message at 22:10 on May 29, 2017

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.
There's nothing wrong with that, but how to efficiently approximate something and how to efficiently calculate an exact answer are fundamentally different problems in computer science.

a dangerous thot
Dec 19, 2016
.

a dangerous thot fucked around with this message at 07:37 on Jul 27, 2018

Eela6
May 25, 2007
Shredded Hen

KernelSlanders posted:

There's nothing wrong with that, but how to efficiently approximate something and how to efficiently calculate an exact answer are fundamentally different problems in computer science.


Sure. On the other hand, we can make some guarantees about the filter. Let's look at the actual counts of words-per-character.

Let f(x) be the # of words in which a particular character, occurs, and g(a, b, c, d, e) be the # of words in which at least one of those characters occurs. The 'worst-case' scenario is that all words excluded by a given letter are independent.

In this case,


Suppose we examine a character λ.


That is, any solution which includes a character λ where f(λ) > 17789 cannot be correct, since it is definitely worse than 'qjxzw'.


As such, the following letters can not occur in a solution:
code:
h 19096
m 22474
p 22969
g 24979
u 28875
c 30466
d 30648
l 40133
o 44657
t 47279
n 49682
r 54901
a 56613
i 60314
s 64803
e 76168
This leaves us with a worst-case scenario of having to check 10 letters, which is fewer than my original guess of 12.

In conclusion, I'm not 'efficiently approximating', I'm getting the correct answer. I was right to begin with and I'm right now.

Eela6 fucked around with this message at 23:42 on May 29, 2017

Dr. Stab
Sep 12, 2010
👨🏻‍⚕️🩺🔪🙀😱🙀
Saying "just use the first 12" is an approximation. It happened to be right in this particular instance, but that's not the same. I could easily throw another set of words at it that would break it.

Now, if your algorithm actually performed that calcuation, and chose which letters to exclude based on that then that would be another thing.

Also, your idea here gives me a better idea. As you're iterating, whenever you find a candidate letter set, cull all letters with higher frequency from consideration. You'll probably save a bit more computation this way versus culling once ahead of time using a a rough upper bound.

Nippashish
Nov 2, 2005

Let me see you dance!
If you want an algorithm that works for every possible word list then it's easy to construct one list where Eela6's filtering doesn't help (make all the letters have equal frequency). If you want an algorithm that works for this particular word list then it obviously works but this list isn't very long so it doesn't really matter. If you want an algorithm that works on any list of English words then it's probably a good choice because letter frequencies are very not-uniform.

In short, if you make different assumptions about the problem then you end up with different constraints on your solution.

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.

Nippashish posted:

If you want an algorithm that works for every possible word list then it's easy to construct one list where Eela6's filtering doesn't help (make all the letters have equal frequency). If you want an algorithm that works for this particular word list then it obviously works but this list isn't very long so it doesn't really matter. If you want an algorithm that works on any list of English words then it's probably a good choice because letter frequencies are very not-uniform.

In short, if you make different assumptions about the problem then you end up with different constraints on your solution.

Exactly. But then again, "find the letters that ... in this list of words" and "devise an algorithm that finds the letters that ... in some list of words" are also fundamentally different problems.

Eela6
May 25, 2007
Shredded Hen

Dr. Stab posted:

Saying "just use the first 12" is an approximation. It happened to be right in this particular instance, but that's not the same. I could easily throw another set of words at it that would break it.

Now, if your algorithm actually performed that calcuation, and chose which letters to exclude based on that then that would be another thing.

Also, your idea here gives me a better idea. As you're iterating, whenever you find a candidate letter set, cull all letters with higher frequency from consideration. You'll probably save a bit more computation this way versus culling once ahead of time using a a rough upper bound.

Okay. Give me a set of ten thousand or more unique English words that breaks this.*

Edit: Thank you, KernelSlanders, NippaShish. I never claimed to have a solution for any possible arbitrary combinations of strings. I claimed (and, in fact, proved my claim) to have a solution to the problem presented to me.

*if we have less, there's clearly no reason to do any filtering at all. Or, better yet, cut out the "well, actually" and present your own superior solution.

Eela6 fucked around with this message at 01:36 on May 30, 2017

Dr. Stab
Sep 12, 2010
👨🏻‍⚕️🩺🔪🙀😱🙀

Eela6 posted:

Okay. Give me a set of ten thousand or more unique English words that breaks this.*

Edit: Thank you, KernelSlanders, NippaShish. I never claimed to have a solution for any possible arbitrary combinations of strings. I claimed (and, in fact, proved my claim) to have a solution to the problem presented to me.

*if we have less, there's clearly no reason to do any filtering at all. Or, better yet, cut out the "well, actually" and present your own superior solution.

I was responding specifically to the fact that you were saying that it wasn't an approximation. You made an assumption, and did some calculation to demonstrate that it was correct. I'm not trying to poo poo you because you made an optimizing assumption about the real world behaviour of the problem. I'm just saying that it's a different thing than a program that solves for any instance.

Eela6
May 25, 2007
Shredded Hen
And I'm saying that claiming you could easily build a list that beats it is different from actually doing so ;)

Edit: in fact, do so and you can ban me, I'll buy myself whatever red text you want, and I'll donate $100 to a charity of your choice on top of that. :toxx:

Eela6 fucked around with this message at 02:08 on May 30, 2017

Dr. Stab
Sep 12, 2010
👨🏻‍⚕️🩺🔪🙀😱🙀

Eela6 posted:

And I'm saying that claiming you could easily build a list that beats it is different from actually doing so ;)

You also added the caveat that the list needs to be very large and use english words. I'm not sure exactly what you're trying to prove but you're being a gigantic dick here. Are you saying that you don't actually believe that it's possible to synthesize an instance that doesn't work, or are you just trying to win some argument that you think you're having?

I'm still not sure why you're being so hostile in asking for this, but here's a rough idea of what I was thinking for my algorithm:

code:
for letter in alphabet:
	for word in words:
		if letter in word 
			letter.frequency += 1

sort alphabet by frequency

cap = alphabet.length; 
candidate_set = {}
candidate_frequency = words.length+1

//there's probably a less stupid way to do this
for(i = 0; i < cap-4; i ++):
	for(j = i+1; j < cap-3; j ++):
		for(k = j+1; k < cap-2; k ++):
			for(l = k+1; l < cap-1; l ++):
				for(m = l+1; l < cap; m ++):
					
					current_set = alphabet[n] for n in [i,j,k,l,m]
					current_frequency = 0

					for word in words:
						if word contains letter from current set:
							current_frequency += 1
					
					if current_frequency < candidate_frequency:
						candidate_set = current_set
						candidate_frequency = current_frequency
						for letter in alphabet:
							if letter.frequency > candidate_frequency:
							cap = index of letter
							break
					
return candidate_set
I'm not a very good programmer, and I don't know python, but I hope it's clear enough to be understood.

e: Also you really seem to care about the example instance I was thinking of, so something like this:

https://pastebin.com/j9y70An5

It doesn't meet your restrictions but that's the sort of thing I was thinking of.

Again, I wasn't saying that there's anything wrong with solving it as though it were a real world problem. I'm wasn't trying to poo poo on you in any way. It's just a different thing.

Dr. Stab fucked around with this message at 02:32 on May 30, 2017

Jabor
Jul 16, 2010

#1 Loser at SpaceChem

Eela6 posted:

Okay. Give me a set of ten thousand or more unique English words that breaks this.*

So just to be totally clear, you're asking for a lost of ten thousand or more unique English words, such that the minimum 5-union over the entire word list is identical to the minimum 5-union if you were to only look at the 12 least-frequent letters.

I won't have time to work on it for a bit, but it doesn't seem too hard to create a pathological case where your assumption is wrong.

Adbot
ADBOT LOVES YOU

Eela6
May 25, 2007
Shredded Hen

Jabor posted:

So just to be totally clear, you're asking for a lost of ten thousand or more unique English words, such that the minimum 5-union over the entire word list is [not] identical to the minimum 5-union if you were to only look at the 12 least-frequent letters.

I won't have time to work on it for a bit, but it doesn't seem too hard to create a pathological case where your assumption is wrong.

Sure! I am relatively confident you will not be able to build a solution; but if that's not the case, I'd love to see it.

I am completely confident Dr. Stab will not be able to create a pathological case.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply