Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Soricidus
Oct 21, 2010
freedom-hating statist shill

Krankenstyle posted:

ideas:
- martin luther but blingee
- 95 theses but on a computer screen(!)
- lets get weird: danny devito in a 1500s getup and a bag of beers (ill chip in if he asks for cash)

win95 pcs nailed to a door

Adbot
ADBOT LOVES YOU

JawnV6
Jul 4, 2004

So hot ...
wow i he no idea c was so bad http://verisimilitudes.net/2019-11-12

parsing arguments like a Chump

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
only 30% slower if you remove process launch overheads and any sort of argument parsing (but keep them for the c version)

and maybe use a different definition of words that’s more convenient for the lisp implementation, didn’t really follow that part and lol at re-reading it more carefully

Notorious b.s.d.
Jan 25, 2003

by Reene

rjmccall posted:

only 30% slower if you remove process launch overheads and any sort of argument parsing (but keep them for the c version)

and maybe use a different definition of words that’s more convenient for the lisp implementation, didn’t really follow that part and lol at re-reading it more carefully

but the posix standards conspire againstt good languages!!!!

i mean it does not surprise me that a naive implementation of wc on a language platform with a good compiler isn't horrendously slow. was this supposed to be a surprising result?

Sweeper
Nov 29, 2007
The Joe Buck of Posting
Dinosaur Gum

rjmccall posted:

only 30% slower if you remove process launch overheads and any sort of argument parsing (but keep them for the c version)

and maybe use a different definition of words that’s more convenient for the lisp implementation, didn’t really follow that part and lol at re-reading it more carefully

the “origin” article written in haskell has to use multiple threads to beat it, but somehow that is a victory over the c which is basically a while loop (hand optimized ???)

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
fwiw, as far as i know, this is still the up-to-date, “hand-optimized” wc implementation on darwin

Notorious b.s.d.
Jan 25, 2003

by Reene

rjmccall posted:

fwiw, as far as i know, this is still the up-to-date, “hand-optimized” wc implementation on darwin

i figured he must be complaining about the gnu coreutils implementation, but no, it's not substantially worse

http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/wc.c

portable C is very ugly, but this is a reasonably good example of good C. the comments explain what is going on. it will build on many crappy unix systems. etc.

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
neither of these even tries to mmap to avoid copying data, shameful levels of optimization

i had no idea you could set a file to not start reading at its beginning though, wtf unix

Kazinsal
Dec 13, 2011


this is the oldest wc I could find, from version 5 research unix: https://github.com/dspinellis/unix-history-repo/blob/Research-V5-Snapshot-Development/usr/source/s2/wc.c

it took some work to get it to compile and not segfault constantly because libc functions on v5 are different than in modern libc but the result is:

code:
kazinsal@cygnus:~$ time ./wc-v5 timecube.html
   3362   13559 timecube.html

real    0m0.011s
user    0m0.000s
sys     0m0.000s
kazinsal@cygnus:~$ time wc timecube.html
  3362  13559 196075 timecube.html

real    0m0.016s
user    0m0.000s
sys     0m0.000s
obviously not scientific testing but the gnu version is about 50% slower than the 50-odd line naive implementation by a couple of stoner nerds at bell labs 46 years ago

Notorious b.s.d.
Jan 25, 2003

by Reene

Kazinsal posted:

this is the oldest wc I could find, from version 5 research unix: https://github.com/dspinellis/unix-history-repo/blob/Research-V5-Snapshot-Development/usr/source/s2/wc.c

it took some work to get it to compile and not segfault constantly because libc functions on v5 are different than in modern libc but the result is:

code:
kazinsal@cygnus:~$ time ./wc-v5 timecube.html
   3362   13559 timecube.html

real    0m0.011s
user    0m0.000s
sys     0m0.000s
kazinsal@cygnus:~$ time wc timecube.html
  3362  13559 196075 timecube.html

real    0m0.016s
user    0m0.000s
sys     0m0.000s
obviously not scientific testing but the gnu version is about 50% slower than the 50-odd line naive implementation by a couple of stoner nerds at bell labs 46 years ago

the gnu version has a lot more flags, handles a lot more corner cases, and builds on every weird rear end lovely unix you can imagine

Notorious b.s.d.
Jan 25, 2003

by Reene
if you want to blow your mind compare gnu grep to v6 grep

Kazinsal
Dec 13, 2011


Notorious b.s.d. posted:

the gnu version has a lot more flags, handles a lot more corner cases, and builds on every weird rear end lovely unix you can imagine

"gnu's not unix", right down to the philosophy of "small and smart programs that do a basic task well"

Notorious b.s.d.
Jan 25, 2003

by Reene

Kazinsal posted:

"gnu's not unix", right down to the philosophy of "small and smart programs that do a basic task well"

by that standard, unix hasn't been unix, ever

(no unix program has ever done a basic task well)

Notorious b.s.d.
Jan 25, 2003

by Reene
like gnu coreutils are very composable -- they combine in useful ways -- but they are not small, and what little "smart" is baked in contributes to the size and complexity

Nomnom Cookie
Aug 30, 2009



dO oNe ThInG AnD dO It WeLl

Nomnom Cookie
Aug 30, 2009



the only reason I’m dismissive of Unix philosophy is that in two decades of paying attention to computers I’ve never encountered an argument for it that isn’t either essentially aesthetic or entirely a priori

yes that does mean I ignore almost everything relating to computers. saves a lot of time

Notorious b.s.d.
Jan 25, 2003

by Reene
the unix philosophy was developed after the fact, after they had already put together a pretty nice system for day to day use

composable cli tools are a good invention and the unix pipe, as crude as it is, works pretty well for a wide variety of tasks

that does not make it a coherent approach to all design problems

Nomnom Cookie
Aug 30, 2009



I agree that cat *.butt | wc -l is cool but also cat butt is not a silver bullet.

it’s been almost 50 years. where’s the history of Unix philosophy, its successes, its failures. why is programming so relentlessly forward looking

Notorious b.s.d.
Jan 25, 2003

by Reene

Nomnom Cookie posted:

I agree that cat *.butt | wc -l is cool but also cat butt is not a silver bullet.

it’s been almost 50 years. where’s the history of Unix philosophy, its successes, its failures. why is programming so relentlessly forward looking

itym wc -l *.butt, or in the event there are too many files, find . -iname '*.butt' -exec wc -l {} \;

JawnV6
Jul 4, 2004

So hot ...

rjmccall posted:

fwiw, as far as i know, this is still the up-to-date, “hand-optimized” wc implementation on darwin



verisimilitudes 7 hours ago [-]

I do have some testing that showed me the Common Lisp was roughly the same speed as the C, when testing against a file of a few dozen megabytes. As I explain, I wasn't interested in optimizing it further, as I feel showing this achieved in just a few minutes to be valuable on its own. I also note that a C programmer may boast about being a small fraction of a second faster, ignoring everything else that goes into the program, which I find foolish.
The reason I didn't strictly adhere to the POSIX behavior was because I don't know where this is documented and don't feel like scanning through the C to find out. On all of the files I've tested, which include a wide array of punctuation and other such things, the results were identical, but I'm merely not making any promises. I'd prefer to not be accused of being one of those Lispers who only complete part of the program; if you look at the libraries I've written, which actually concern me, then you'll find they're well-documented and rather comprehensive for their purposes.
reply

Notorious b.s.d.
Jan 25, 2003

by Reene

JawnV6 posted:

I'd prefer to not be accused of being one of those Lispers who only complete part of the program

omg

holy poo poo

don't accuse me of doing exactly the smug lisp weenie thing that everyone else does! it's different when i'm a smug lisp weenie!

Notorious b.s.d.
Jan 25, 2003

by Reene
dude's original premise isn't wrong. his common lisp version of wc is a lot easier to read and presumably easier to maintain/port

however once he adds in all the posix conformance that is present in gnu coreutils it is not going to be a whole hell of a lot simpler or better than the original coreutils wc. easier for him to write, sure, but not necessarily easier to maintain going forward

turns out it's really hard to comply with a bunch of hastily written standards from decades ago

pseudorandom name
May 6, 2007

Kazinsal posted:

this is the oldest wc I could find, from version 5 research unix: https://github.com/dspinellis/unix-history-repo/blob/Research-V5-Snapshot-Development/usr/source/s2/wc.c

it took some work to get it to compile and not segfault constantly because libc functions on v5 are different than in modern libc but the result is:

code:
kazinsal@cygnus:~$ time ./wc-v5 timecube.html
   3362   13559 timecube.html

real    0m0.011s
user    0m0.000s
sys     0m0.000s
kazinsal@cygnus:~$ time wc timecube.html
  3362  13559 196075 timecube.html

real    0m0.016s
user    0m0.000s
sys     0m0.000s
obviously not scientific testing but the gnu version is about 50% slower than the 50-odd line naive implementation by a couple of stoner nerds at bell labs 46 years ago

now make it support non-English languages

Notorious b.s.d.
Jan 25, 2003

by Reene

pseudorandom name posted:

now make it support non-English languages

lol

eschaton
Mar 7, 2007

Don't you just hate when you wind up in a store with people who are in a socioeconomic class that is pretty obviously about two levels lower than your own?

Notorious b.s.d. posted:

that does not make it a coherent approach to all design problems

but rob pike says that

eschaton
Mar 7, 2007

Don't you just hate when you wind up in a store with people who are in a socioeconomic class that is pretty obviously about two levels lower than your own?

Notorious b.s.d. posted:

itym wc -l *.butt, or in the event there are too many files, find . -iname '*.butt' -exec wc -l {} \;

itym find . -iname '*.butt' -print0 | xargs -0 wc -l'

redleader
Aug 18, 2005

Engage according to operational parameters

pseudorandom name posted:

now make it support non-English languages

that's not in line with the unix philosophy

pseudorandom name
May 6, 2007

redleader posted:

that's not in line with the unix philosophy

on the one hand POSIX requires wc to respect all the usual locale environment variables, on the other hand the only locales POSIX mandates are "C" and "POSIX"; but it would be a poor implementation that ignored 3/4 of the world's population

but I'm not surprised that techbros are falling back on their white privilege to complain about wc's complexity and performance

TheFluff
Dec 13, 2006

FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE
does U+2028 LINE SEPARATOR count as a newline?
for that matter, does U+0085 NEXT LINE? that one is even in ascii

Internet Janitor
May 17, 2008

"That isn't the appropriate trash receptacle."

Nomnom Cookie posted:

where’s the history of Unix philosophy, its successes, its failures. why is programming so relentlessly forward looking

the current context is lisp, the language whose strongest exponents like to believe sprang fully-formed from the head of john mccarthy and has always had every feature ascribable to any programming language. a language which, for some, will always be a priori superior for all tasks irrespective of any measure of its fitness

animist
Aug 28, 2018

Internet Janitor posted:

the current context is lisp, the language whose strongest exponents like to believe sprang fully-formed from the head of john mccarthy and has always had every feature ascribable to any programming language. a language which, for some, will always be a priori superior for all tasks irrespective of any measure of its fitness

lol no static types

animist
Aug 28, 2018

Nomnom Cookie posted:

why is programming so relentlessly forward looking

it's a young field with a bunch of money being pumped into it so that a lot of computer touchers can invent trendy new ways to do the exact same things, but worse

give it time imo, either the field will mature or civilization will end, problem solved either way

Internet Janitor
May 17, 2008

"That isn't the appropriate trash receptacle."

animist posted:

lol no static types

surely a trivial exercise for the boundless capabilities of lisp's macro system

so trivial, in fact, that i shall leave it as an exercise for the reader

Kazinsal
Dec 13, 2011


pseudorandom name posted:

now make it support non-English languages

for the most part it'll handle word count of non-english languages using latin-ish alphabets just fine. it won't understand word counts on logographic alphabets that don't use word separators, but then again, neither does coreutils wc

Athas
Aug 6, 2007

fuck that joker

JawnV6 posted:

wow i he no idea c was so bad http://verisimilitudes.net/2019-11-12

parsing arguments like a Chump

I've deigned to measure the performance of my program suitably for presentation here.

Who writes like that? The entire website reads a bit like a parody.

Also, everybody knows a good wc has to run on the GPU.

pseudorandom name
May 6, 2007

what about -m?

Kazinsal
Dec 13, 2011


it could do that if we threw in full utf8 handling and started considering what constitutes a character instead of counting words in our command line word counting program

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe
the concept of a character or even a word is unsurprisingly undefined or not a concept in a lot of languages

Soricidus
Oct 21, 2010
freedom-hating statist shill
there isn’t even a single universally accepted way to count words in english. the idea that you can solve the problem in the general case with a trivial program is laughable

Adbot
ADBOT LOVES YOU

cinci zoo sniper
Mar 15, 2013




Soricidus posted:

there isn’t even a single universally accepted way to count words in english. the idea that you can solve the problem in the general case with a trivial program is laughable

duh obviously you trim your string from both ends and then the number of words is number of spaces + 1

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply