|
just have the unicode consortium provide a lookup table of every word in every language and then pattern match on that easy peasy
|
# ? Nov 18, 2019 13:05 |
|
|
# ? Jun 9, 2024 02:45 |
|
Kazinsal posted:it won't understand word counts on logographic alphabets that don't use word separators, but then again, neither do humans
|
# ? Nov 18, 2019 13:11 |
|
any level of committal to the "unix philosophy" would have had one of the first programs be one that takes a dfa, some simple declarative actions one can take on state entry, and runs it on its input. forming the actual implementation of stuff like wc, cut and even grep. the real unix philosophy is writing a new incompatible dumb program for every dumb thing though.
|
# ? Nov 18, 2019 13:23 |
|
animist posted:it's a young field with a bunch of money being pumped into it so that a lot of computer touchers can invent trendy new ways to do the exact same things, but worse I don’t see how it’s going to mature if nobody ever evaluates their design in hindsight. sure, it happens sometimes, or privately, but in general knowledge formation about what works and what doesn’t is more evolutionary than intentional
|
# ? Nov 18, 2019 13:27 |
|
Cybernetic Vermin posted:any level of committal to the "unix philosophy" would have had one of the first programs be one that takes a dfa, some simple declarative actions one can take on state entry, and runs it on its input. forming the actual implementation of stuff like wc, cut and even grep. is this the thread that I bitched about generic declarative frameworks last night, or was that a different one. anyway thanks for being an example of what happens when a programmer marinates in yak shaving for so long that they start thinking yak shaving is the point
|
# ? Nov 18, 2019 13:30 |
|
Athas posted:I've deigned to measure the performance of my program suitably for presentation here. probably , but for those who missed it: https://futhark-lang.org/blog/2019-10-25-beating-c-with-futhark-on-gpu.html
|
# ? Nov 18, 2019 13:47 |
|
Internet Janitor posted:the current context is lisp, the language whose strongest exponents like to believe sprang fully-formed from the head of john mccarthy and has always had every feature ascribable to any programming language. a language which, for some, will always be a priori superior for all tasks irrespective of any measure of its fitness like many tasks, it's perfectly adequate to the task of writing a decent 'wc' there's just no particular reason to favor it for this task
|
# ? Nov 18, 2019 17:30 |
|
animist posted:lol no static types common lisp has extensive static typechecking, just nobody ever uses it, aside from like, performance optimization on tight loops. opt-in type systems are not terribly useful
|
# ? Nov 18, 2019 17:31 |
|
TheFluff posted:does U+2028 LINE SEPARATOR count as a newline? posix defines these things for itself, so, no, neither one counts Kazinsal posted:for the most part it'll handle word count of non-english languages using latin-ish alphabets just fine. it won't understand word counts on logographic alphabets that don't use word separators, but then again, neither does coreutils wc coreutils wc has full unicode support and supports logographic alphabets, to the extent that it understands multibyte characters Kazinsal posted:it could do that if we threw in full utf8 handling and started considering what constitutes a character instead of counting words in our command line word counting program gnu coreutils wc does exactly this bsd and macos wc don't, but, well, those are bad unix systems anyway Soricidus posted:there isn’t even a single universally accepted way to count words in english. the idea that you can solve the problem in the general case with a trivial program is laughable no one pretended to solve it in the general case. word counting is an approximation at best. the main use of wc is to get a line count, so even the name of the utility is dumb
|
# ? Nov 18, 2019 17:35 |
|
it seems like the purpose of wc is kinda lovely
|
# ? Nov 18, 2019 18:04 |
|
I tend to use the WC for processing my core dumps
|
# ? Nov 18, 2019 18:07 |
|
Zlodo posted:it seems like the purpose of wc is kinda lovely yes wc is a computer program
|
# ? Nov 18, 2019 18:08 |
|
CPColin posted:I tend to use the WC for processing my core dumps
|
# ? Nov 18, 2019 18:09 |
|
Nomnom Cookie posted:is this the thread that I bitched about generic declarative frameworks last night, or was that a different one. anyway thanks for being an example of what happens when a programmer marinates in yak shaving for so long that they start thinking yak shaving is the point not this thread and not going to bother going through your post history to figure out what you're passive-aggressively accusing me of
|
# ? Nov 18, 2019 18:15 |
|
Cybernetic Vermin posted:not this thread and not going to bother going through your post history to figure out what you're passive-aggressively accusing me of being an example of what happens when a programmer marinates in yak shaving for so long that they start thinking yak shaving is the point
|
# ? Nov 18, 2019 18:16 |
|
Nomnom Cookie posted:being an example of what happens when a programmer marinates in yak shaving for so long that they start thinking yak shaving is the point i have never yak-shaved in my life though
|
# ? Nov 18, 2019 18:19 |
|
Cybernetic Vermin posted:i have never yak-shaved in my life though lol
|
# ? Nov 18, 2019 18:27 |
|
Internet Janitor posted:just have the unicode consortium provide a lookup table of every word in every language and then pattern match on that that's basically what icu actually does for word breaking. it has rules, lookup tables for exceptions to those rules, and then a chinese dictionary.
|
# ? Nov 18, 2019 18:39 |
|
Plorkyeran posted:that's basically what icu actually does for word breaking. it has rules, lookup tables for exceptions to those rules, and then a chinese dictionary. automatic tools i don't think are expected to go all that way though, unicode defines "simple boundaries" for e.g. regexes: http://unicode.org/reports/tr18/#Simple_Word_Boundaries iirc that's what java regexes uses
|
# ? Nov 18, 2019 19:08 |
|
Nomnom Cookie posted:I agree that cat *.butt | wc -l is cool but also cat butt is not a silver bullet. https://www.aosabook.org/en/index.html is the only thing i have seen that is reflective also has some great reads in it, i recommend it unironically
|
# ? Nov 18, 2019 19:21 |
|
Bloody posted:https://www.aosabook.org/en/index.html is the only thing i have seen that is reflective tyvm that's exactly what I was looking for
|
# ? Nov 18, 2019 19:28 |
|
Notorious b.s.d. posted:bad unix systems but you repeat yourself
|
# ? Nov 19, 2019 01:04 |
|
Bloody posted:https://www.aosabook.org/en/index.html is the only thing i have seen that is reflective is there anything like this for like SOAs/microservices? postmortems of why a system was designed in some way, what worked and what didn't, etc?
|
# ? Nov 19, 2019 01:09 |
|
redleader posted:is there anything like this for like SOAs/microservices? postmortems of why a system was designed in some way, what worked and what didn't, etc? it's your company's coe aggregator op
|
# ? Nov 19, 2019 02:50 |
|
there's academic papers which can be informative, esp. if they're diagnostics of real systems like this is a neat article although it's partly sales copy
|
# ? Nov 19, 2019 05:38 |
|
redleader posted:but you repeat yourself shaggar alt spotted
|
# ? Nov 19, 2019 17:39 |
|
c++ https://travisdowns.github.io/blog/2019/11/19/toupper.html
|
# ? Nov 20, 2019 22:32 |
|
macros
|
# ? Nov 21, 2019 05:12 |
|
cpp is so bad. lol at still using #include and #define well into the 21st century
|
# ? Nov 21, 2019 11:29 |
|
Soricidus posted:cpp is so bad. lol at still using #include and #define well into the 21st century modules are finally a thing in c++20, but it's gonna be a while before the tooling catches up and makes them usable
|
# ? Nov 21, 2019 15:42 |
|
I thought there was a hullabaloo about those being terrible
|
# ? Nov 21, 2019 17:04 |
|
they’re differently terrible than #include and #define and such
|
# ? Nov 21, 2019 19:03 |
|
people sure are Mad about objc_direct, huh.
|
# ? Nov 21, 2019 20:56 |
|
Plorkyeran posted:people sure are Mad about objc_direct, huh. well yea it means cracking apps by swizzling -isRegistered to always return YES is harder
|
# ? Nov 22, 2019 00:34 |
|
eschaton posted:they’re differently terrible than #include and #define and such
|
# ? Nov 22, 2019 12:13 |
|
this was a fun read in an eldritch horror kind of way https://www.jsoftware.com/papers/50/ quote:Ken was showing some slides — and one of his slides had something on it that I was later to learn was an APL one-liner. And he tossed this off as an example of the expressiveness of the APL notation. I believe the one-liner was one of the standard ones for indicating the nesting level of the parentheses in an algebraic expression. But the one-liner was very short — ten characters, something like that — and having been involved with programming things like that for a long time and realizing that it took a reasonable amount of code to do, I looked at it and said, “My God, there must be something in this language.” Bauer, on my left, didn’t see that. What he saw or heard was Ken’s remark that APL is an extremely appropriate language for teaching algebra, and he muttered under his breath to me, in words I will never forget, “As long as I am alive, APL will never be used in Munich.” And Dijkstra, who was sitting on my other side, leaned toward Bauer and said, “Nor in Holland.” The three of us were listening to the same lecture, but we obviously heard different things. i'm convinced apl works much like magic in your average fantasy novel - some person of old has discovered a combination of 7 arcane runes with a certain effect and you can incorporate that in your own
|
# ? Nov 22, 2019 23:19 |
|
suffix posted:this was a fun read in an eldritch horror kind of way it's just numpy but every library function is a randomly-selected Unicode symbol
|
# ? Nov 22, 2019 23:45 |
|
suffix posted:this was a fun read in an eldritch horror kind of way They call these combinations "idioms", make huge lists of them, and even make the APL interpreter detect them and dispatch to specialised implementations.
|
# ? Nov 22, 2019 23:48 |
|
Athas posted:They call these combinations "idioms", make huge lists of them, and even make the APL interpreter detect them and dispatch to specialised implementations. do they have to be written with finnish characters
|
# ? Nov 22, 2019 23:51 |
|
|
# ? Jun 9, 2024 02:45 |
|
the idiom in question is probably something like +\-⌿'()'∘.= In K it would be code:
code:
|
# ? Nov 23, 2019 00:02 |