p-lang thread: (now (have you (problems two)))

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > YOSPOS > p-lang thread: (now (have you (problems two)))

«‹›1784 »

sarehu: Apr 20, 2007; (call/cc call/cc)

Use .NET regexes.

# ? Sep 20, 2017 01:51

Adbot: ADBOT LOVES YOU

# ? May 29, 2024 12:45

JewKiller 3000: Nov 28, 2006; by Lowtax

akadajet posted:

cool

no. not cool. go is bad

not as bad as javascript though

# ? Sep 20, 2017 02:49

Carthag Tuek: Oct 15, 2005; Tider skal komme,
tider skal henrulle,
sl�gt skal f�lge sl�gters gang

gonadic io posted:

don't do anything with user input other than display it back to them. make them enter different parts in different fields so you don't have to split it up with anything more complicated than a .split('/')

like what validation are you doing anyway? with email addresses for example imo the most you should care about is that it has an "@" and then a "." after the @. even that is probably wrong honestly.

very much this ^

email addresses are NP hard lol

imo just send off a url for the user to click before accepting the email as valid

# ? Sep 20, 2017 04:42

Carthag Tuek: Oct 15, 2005; Tider skal komme,
tider skal henrulle,
sl�gt skal f�lge sl�gters gang

also dont have too many fields cause youre bound to create a situation where theyre impossible to fill out correctly for some subset of users

# ? Sep 20, 2017 04:46

pokeyman: Nov 26, 2006; That elephant ate my entire platoon.

Powaqoatse posted:

email addresses are NP hard lol

really? I'm willing to believe you but my reduction muscles are highly atrophied and my searching has failed

# ? Sep 20, 2017 04:53

Gazpacho: Jun 18, 2004; by Fluffdaddy; Slippery Tilde

CommunistPancake posted:

he is the go guy

wish he would !! lmao

# ? Sep 20, 2017 04:58

Vanadium: Jan 8, 2005

I don't get this "don't run regexes provided by the user" talk, sometimes users want to search something in text too. Is this why github search is basically useless.

# ? Sep 20, 2017 05:31

Carthag Tuek: Oct 15, 2005; Tider skal komme,
tider skal henrulle,
sl�gt skal f�lge sl�gters gang

pokeyman posted:

really? I'm willing to believe you but my reduction muscles are highly atrophied and my searching has failed

nah it was a joke but email addresses can get surprisingly weird so trying to validate too much on them is a fools errand

# ? Sep 20, 2017 05:35

The MUMPSorceress: Jan 6, 2012; ^SHTPSTS; Gary’s Answer

Vanadium posted:

I don't get this "don't run regexes provided by the user" talk, sometimes users want to search something in text too. Is this why github search is basically useless.

epics in house custom code searcher is a million times better than anything I've found for github or java in general and that's p lol to me

# ? Sep 20, 2017 05:39

AWWNAW: Dec 30, 2008

sarehu posted:

Use .NET regexes.

I like how they added timeout arguments to their match methods haha

# ? Sep 20, 2017 05:45

Cybernetic Vermin: Apr 18, 2005

tef posted:

us:

there is posix syntax, pcre syntax, and countless variants

pcre won, posix lost

you:

so i read this webpage about why they used re2 and missed the point, it was about running arbitrary code from users

it was slower, sure, but, like it was faster on pathological cases, a bit like when someone writes a really bad for loop

you guys are being very adamant about this with a pretty confusing set of arguments. who has mentioned (posix) syntax at all? everything is indeed perl-derived syntax and perl-like semantics, which is fine, not least since posix semtantics is actually quite inconsistently interpreted and implemented (the standard is unfortunately a bit unclear at one point, and some have taken it as a chance to do a perl/posix mix)

exponential cases are indeed not *that* common, but, on the one hand: what is the argument for marching into a situation where the potential for them exist? (it is *not* easy to tell ahead of time if the potential for exponential matching time is there); and, on the other hand: quadratic/cubic matching time issues are fairly common, and are fairly easily triggered by accident in practice (e.g. a subexpression of the form (E|F) where E and F have a string in common, repeated in some way, any failing match containing instances of such a string will be retried for every occurrence that can be matched against that subgroup)

in a strange use of money (not mine) obsessing about these issues is my day job, my current research grant is in practical parser use, which has ended up being used to do a bit of diving in practical implementations (to e.g. discuss such performance aspects and semantics differences), a bit of data mining and static analysis of expressions, and then algorithm construction in support of such efforts. one of the annoying happy discoveries early on in the research direction was how cox had beaten us to a lot of our ideas in re2, so i am indeed a bit enamored with it

AWWNAW posted:

I like how they added timeout arguments to their match methods haha

somewhat disingenuously i have skipped mentioning that pcre does this too by default, it takes 10 million "steps" then just errors out if it hasn't completed matching (can be overridden with PCRE_EXTRA_MATCH_LIMIT). which seems safer, but is hardly a complete solution to the problem

# ? Sep 20, 2017 08:39

Athas: Aug 6, 2007; fuck that joker

Cybernetic Vermin posted:

in a strange use of money (not mine) obsessing about these issues is my day job, my current research grant is in practical parser use, which has ended up being used to do a bit of diving in practical implementations (to e.g. discuss such performance aspects and semantics differences), a bit of data mining and static analysis of expressions, and then algorithm construction in support of such efforts. one of the annoying happy discoveries early on in the research direction was how cox had beaten us to a lot of our ideas in re2, so i am indeed a bit enamored with it

Do you have a seriouspinion on Kleenex? As I understand, it's an attempt to move beyond classic regexes, but without going all the way to full PCRE, and with very good performance.

# ? Sep 20, 2017 08:51

Cybernetic Vermin: Apr 18, 2005

Athas posted:

Do you have a seriouspinion on Kleenex? As I understand, it's an attempt to move beyond classic regexes, but without going all the way to full PCRE, and with very good performance.

Oh, met one of the Kleenex guys at a conference this summer actually! I must admit that I don't know the software very well beyond that presentation and a bit of chat over lunch, but I came away with a very positive impression of the effort. The theory appears sound (in that they have picked up something established and carefully shape it into doing what they need) and what it actually does seems useful and interesting. Going by their numbers it indeed appears extremely fast, though with a heavy-weight compilation approach.

The ambitious compilation approach they take ended up the main chat subject, it does not appear to scale great; they construct a deterministic transducer which is then compiled into C to be compiled down further, but then a sufficiently nasty Kleenex program causes the transducer and then the C code to explode in a way which may make things impractical (e.g. an example given had a small program produce 130 kloc of C). It looks very well thought out though, so I wouldn't be surprised if they find ways to keep things under some control (perhaps not realize sufficiently dense parts of the automaton in C, rather falling back on a lookup table). One of those projects which likely lives and dies with the team getting research funding though, so fingers crossed they land more money.

# ? Sep 20, 2017 09:09

gonadic io: Feb 16, 2011; >>=

Vanadium posted:

I don't get this "don't run regexes provided by the user" talk, sometimes users want to search something in text too. Is this why github search is basically useless.

counterpoint: if you want to search for .*.*.*.*.* then do that on your own cpu.

# ? Sep 20, 2017 09:10

redleader: Aug 18, 2005; Engage according to operational parameters

gonadic io posted:

don't do anything with user input other than display it back to them. make them enter different parts in different fields so you don't have to split it up with anything more complicated than a .split('/')

like what validation are you doing anyway? with email addresses for example imo the most you should care about is that it has an "@" and then a "." after the @. even that is probably wrong honestly.

Ralith posted:

try to use it, and return an error if it doesn't work

yeah, i guess this makes sense. i couldn't really come up with any counterexamples that would really need a regex. i blame my brain damage on our lovely codebase

# ? Sep 20, 2017 11:36

gonadic io: Feb 16, 2011; >>=

It's fairly common to do bullshit like that, partly because it's not so bad if 99% of your customers are in an English first language country and have ascii names. For example at work we take the user's name and split it on the first space char to make first and last name. We also force the names to have first letter upper case and all others lowercase which especially pisses me off because that means that my name is Mcgonadic instead of the proper McGonadic

# ? Sep 20, 2017 12:00

gonadic io: Feb 16, 2011; >>=

See also: timezone assumptions, address assumptions (including postcode ones)

We had fun with a customer without a street name - their small village's houses were just numbered all together. In the end we just got them to put the village name into the street field too

# ? Sep 20, 2017 12:02

leper khan: Dec 28, 2010; Honest to god thinks Half Life 2 is a bad game. But at least he likes Monster Hunter.

pretty sure email doesn't need a . after the @

# ? Sep 20, 2017 16:23

Workaday Wizard: Oct 23, 2009; by Pragmatica

Athas posted:

Do you have a seriouspinion on Kleenex? As I understand, it's an attempt to move beyond classic regexes, but without going all the way to full PCRE, and with very good performance.

is this made by the tissue company or is someone is trolling for a lawsuit

# ? Sep 20, 2017 16:53

JawnV6: Jul 4, 2004; So hot ...

Cybernetic Vermin posted:

yeah, sorry for trollish reply, the conversation i was in was responding to a post saying "just use pcre" (with little context other that "how to regex?!") noting it has pitfalls and suggesting re2. i really do not care a lot what your increasingly miniscule usecase is

see, i'd believe you if you hadn't invested a lot of time maligning me with the mongo ref and constructing elaborate fantasies where regexes are the wrong thing no matter what implementation is in use, but sure, you're totally above it lol

anyway an old friend showed up and i wanted to introduce y'all again A Comprehensive Study of Real-World Numerical Bug Characteristics

# ? Sep 20, 2017 17:14

akadajet: Sep 14, 2003

JewKiller 3000 posted:

no. not cool. go is bad

not as bad as javascript though

go away "jewkiller"

# ? Sep 20, 2017 17:55

Xarn: Jun 26, 2015

gonadic io posted:

counterpoint: if you want to search for .*.*.*.*.* then do that on your own cpu.

Counter-Counterpoint: you can do it in very, very, very minimal space and O(N) where N is the size of corpus.

# ? Sep 20, 2017 18:05

hackbunny: Jul 22, 2007; I haven't been on SA for years but the person who gave me my previous av as a joke felt guilty for doing so and decided to get me a non-shitty av

Cybernetic Vermin posted:

who has mentioned (posix) syntax at all?

maybe don't barge in on a discussion you haven't been following

# ? Sep 20, 2017 18:33

Cybernetic Vermin: Apr 18, 2005

hackbunny posted:

maybe don't barge in on a discussion you haven't been following

uh, i apologize, i must have missed it, legit point me to this part of the discussion

# ? Sep 20, 2017 18:40

NihilCredo: Jun 6, 2011; iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

Cybernetic Vermin posted:

uh, i apologize, i must have missed it, legit point me to this part of the discussion

Just backtrack until you find it.

# ? Sep 20, 2017 18:42

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

NihilCredo posted:

Just backtrack until you find it.

# ? Sep 20, 2017 19:25

JawnV6: Jul 4, 2004; So hot ...

NihilCredo posted:

Just backtrack until you find it.

# ? Sep 20, 2017 19:35

Workaday Wizard: Oct 23, 2009; by Pragmatica

NihilCredo posted:

Just backtrack until you find it.

# ? Sep 20, 2017 19:52

Shaggar: Apr 26, 2006

NihilCredo posted:

Just backtrack until you find it.

heh

# ? Sep 20, 2017 19:53

Athas: Aug 6, 2007; fuck that joker

Shinku ABOOKEN posted:

is this made by the tissue company or is someone is trolling for a lawsuit

What would the lawsuit be about? Nobody is making any money here.

# ? Sep 20, 2017 20:21

CPColin: Sep 9, 2003; Big ol' smile.

You don't have to make money to be sued for infringing on a trademark.

# ? Sep 20, 2017 20:28

ulmont: Sep 15, 2010; IF I EVER MISS VOTING IN AN ELECTION (EVEN AMERICAN IDOL) ,OR HAVE UNPAID PARKING TICKETS, PLEASE TAKE AWAY MY FRANCHISE

CPColin posted:

You don't have to make money to be sued for infringing on a trademark.

You do, however, have to be causing a likelihood of consumer confusion or at least a likelihood of diluting or tarnishing a trademark.

# ? Sep 20, 2017 21:09

tef: May 30, 2004; -> some l-system crap ->

NihilCredo posted:

Just backtrack until you find it.

# ? Sep 20, 2017 21:18

redleader: Aug 18, 2005; Engage according to operational parameters

leper khan posted:

pretty sure email doesn't need a . after the @

yeah, but gently caress the people who try that

since google bought the .google tld, some employees are apparently starting to use name@google addresses. they deserve everything they get

# ? Sep 20, 2017 21:52

FamDav: Mar 29, 2008

redleader posted:

yeah, but gently caress the people who try that

since google bought the .google tld, some employees are apparently starting to use name@google addresses. they deserve everything they get

less than they wanted?

# ? Sep 20, 2017 22:16

hackbunny: Jul 22, 2007; I haven't been on SA for years but the person who gave me my previous av as a joke felt guilty for doing so and decided to get me a non-shitty av

Cybernetic Vermin posted:

uh, i apologize, i must have missed it, legit point me to this part of the discussion

the discussion was iirc about the many frustratingly similar but incompatible re syntaxes, and it's 100% true: sed has one, grep another, javascript, icu etc. someone proposed, in the usual hyperbolic yospos style, pcre as "the" standard. the word "pcre" triggered you and the rest is history

# ? Sep 20, 2017 22:24

tef: May 30, 2004; -> some l-system crap ->

hackbunny posted:

the discussion was iirc about the many frustratingly similar but incompatible re syntaxes, and it's 100% true

and most of them are 'predates pcre' or 'like pcre for most of it'

# ? Sep 21, 2017 00:09

hackbunny: Jul 22, 2007; I haven't been on SA for years but the person who gave me my previous av as a joke felt guilty for doing so and decided to get me a non-shitty av