Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Cybernetic Vermin
Apr 18, 2005

tef posted:

rob pike: "Rule 3. Fancy algorithms are slow when n is small, and n is usually small. Fancy algorithms have big constants. Until you know that n is frequently going to be big, don't get fancy. (Even if n does get big, use Rule 2 first.)"

well, n here is on the level of 50 for pcre to in practice never halt, where re2, a simpler and better planned library, would run in linear time (well, in |E|*|w| for an expression E and string w). i also note that we are talking about libraries here, and i think the pike quote is best applied as "since importing pcre takes one more keypress than re2 it is best to stick with re2 until pcre proves necessary"

JawnV6 posted:

also yeah ive never had to put a regex in some perf critical CDN control plane but for mashing text from a data sheet into friendly #def's with vim/sed they're needs-suiting as heck and ive never had to care if it was exponentially matching MODULE_ or not

it is not that i particularly care whatever people are doing in their editors, but suggesting pcre as the first choice among regex libraries without qualifications is pretty bad, since it with few other benefits if nothing else opens software up to denial of service attacks if applied to user input. even without expectations of malicious inputs it is pretty silly to spend the cycles if you are doing anything new and can go re2

Adbot
ADBOT LOVES YOU

Carthag Tuek
Oct 15, 2005

Tider skal komme,
tider skal henrulle,
slægt skal følge slægters gang



dont regex user input ever, that's insane no matter what lib imo

Truga
May 4, 2014
Lipstick Apathy
as a regular regex abuser

Powaqoatse posted:

dont regex ever

Carthag Tuek
Oct 15, 2005

Tider skal komme,
tider skal henrulle,
slægt skal følge slægters gang



regexes are for loving around with a bunch of text on your local machine, not for automatic parsing/handling/extraction of anything ever

gonadic io
Feb 16, 2011

>>=
but how else will i scrape web pages??

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS

MALE SHOEGAZE posted:

Java packages and golang packages are very different so the scoping rules arent really comparable. One major difference is that functions and variables are allowed at the top level of a golang package. also, packages will be initialized once at runtime, and have mutable state, so it's more accurate to think of them as objects (this is why you can't import a single identifier from a go package -- that identifier is tied implicitly to the package because it might rely on the state of the package, so it doesn't make sense to be able to import a single identifier).

the end result of this is that go makes it very difficult to organize your code, which is especially an issue in a language that has no generics and demands that you have lots of functions named add_int32, add_int64, etc.

add issues with cyclic dependencies on top of this, and you have a real poo poo show. at least in my experience. i was clearly not arranging my go code properly because i had lots of problems.

also i could be wrong on any of this. it's been quite a while since i've written any go.

idk i haven't run into any of these issues. maybe make your packages smaller and more specific?

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

every time the discussion comes up about pcre vs re2 I can't help but think that I've never run into the problems that PCRE has in actual practice.

the above posts made me realize why...the idea of exposing regexes to my users seems scary and bad so I never do it.

Cybernetic Vermin
Apr 18, 2005

idk though, regexes are pretty good for some initial filtering of strings, and it is certainly preferable in many situations to doing a lot of messing about by hand. it would be nice if people could be brought onto one of the nicer ways of processing text, but history suggests that regexes are one of those that programmers by and large like and use where a lot of other things have fallen by the wayside

either way, there are sort of three choices: 1) don't use regexes, 2) use whatever library you want but take a bit of care in how/where they are used, 3) use re2 or similarly predictable library (dfa-compiling ones are often a good choice really)

i think 3 is fine, and at any rate should be where unqualified library picking advice ought to head

Shaggar
Apr 26, 2006

tef posted:

rob pike: "Rule 3. Fancy algorithms are slow when n is small, and n is usually small. Fancy algorithms have big constants. Until you know that n is frequently going to be big, don't get fancy. (Even if n does get big, use Rule 2 first.)"

rob pike is a big dumb idiot

Gazpacho
Jun 18, 2004

by Fluffdaddy
Slippery Tilde
if n were usually small we wouldn't use computer

carry on then
Jul 10, 2010

by VideoGames

(and can't post for 10 years!)

gonadic io posted:

but how else will i scrape web pages??

very carefully

Carthag Tuek
Oct 15, 2005

Tider skal komme,
tider skal henrulle,
slægt skal følge slægters gang



obligatory https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454

JawnV6
Jul 4, 2004

So hot ...

the worst kind of pedantic poo poo

OP never asked about "parsing" html, some chuffed up nerd decided they wanted to drop unrelated truth bombs now it's shared all over as if missing the point for a sweet dunk is admirable

JawnV6
Jul 4, 2004

So hot ...

Cybernetic Vermin posted:

well, n here is on the level of 50 for pcre to in practice never halt

like have you ever hit this in production? or is it the same toy case i've seen a million times that has zero bearing on regex in practice

have you heard grep sells u out if you have lots of partial matches, check out what this fish has to say

Cybernetic Vermin posted:

it is not that i particularly care whatever people are doing in their editors, but suggesting pcre as the first choice among regex libraries without qualifications is pretty bad, since it with few other benefits if nothing else opens software up to denial of service attacks if applied to user input. even without expectations of malicious inputs it is pretty silly to spend the cycles if you are doing anything new and can go re2
yeah, im going to stay up at night shivering in terror that someone's going to pick up on my shitposting and put pcre into a mission critical application susceptible to DOS, this is a totally worthwhile threat model to consider and discuss at length

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

JawnV6 posted:

the worst kind of pedantic poo poo

OP never asked about "parsing" html, some chuffed up nerd decided they wanted to drop unrelated truth bombs now it's shared all over as if missing the point for a sweet dunk is admirable

Like...I'm not sure they missed the point instead of just offering information. sure, SO answers are supposed to be strict answers to the question, but that's rarely the actual case.

it's widely shared because its funny

Cybernetic Vermin
Apr 18, 2005

JawnV6 posted:

like have you ever hit this in production? or is it the same toy case i've seen a million times that has zero bearing on regex in practice

have you heard grep sells u out if you have lots of partial matches, check out what this fish has to say

yeah, im going to stay up at night shivering in terror that someone's going to pick up on my shitposting and put pcre into a mission critical application susceptible to DOS, this is a totally worthwhile threat model to consider and discuss at length

yeah, sure, parse your data with pcre, stick it in mongodb, and glue it all together with bash scripts. i am not sure *why* you are really set on making poor choices, but variety certainly is the spice of life~


e: eh, to not just be a troll: it is way more natural patterns than you may expect that trip these things up, but that turns into a much bigger discussion. use what you please. it is rare that there are huge issues, but i maintain that re2 is a good piece of software to suggest to people, much more systematic engineering and no traps like that

Cybernetic Vermin fucked around with this message at 18:35 on Sep 19, 2017

VikingofRock
Aug 24, 2008




So what if I really want the ability to use negative lookahead / lookbehind (which it looks like re2 does not support), but I don't need full on recursive PCRE? Should I use like oniguruma or something? I know very little about the differences between regex engines.

Cybernetic Vermin
Apr 18, 2005

no no, the backtracking issues are endemic and re2 is the odd man out avoiding them without giving up a *lot* of features, so if you need something that re2 does not do then pcre is likely a good choice. main issue with it other than that is it has adopted a lot of the weird perl features (e.g. (*PRUNE)), in many cases with very subtle semantics differences from perl. so one may be wise to be a bit restrictive with what one uses as there are bound to be some strange stuff on the fringes of the library

gonadic io
Feb 16, 2011

>>=
The issues are a lot less too if you precompile all you regex and NEVER EVER let users specify the regex itself

JawnV6
Jul 4, 2004

So hot ...

Cybernetic Vermin posted:

yeah, sure, parse your data with pcre, stick it in mongodb, and glue it all together with bash scripts. i am not sure *why* you are really set on making poor choices, but variety certainly is the spice of life~


e: eh, to not just be a troll: it is way more natural patterns than you may expect that trip these things up, but that turns into a much bigger discussion. use what you please. it is rare that there are huge issues, but i maintain that re2 is a good piece of software to suggest to people, much more systematic engineering and no traps like that

again, you're the one presuming a shitload about where i'm jamming these? are you actually running sed scripts as a part of your production workflow or needlessly maligning a console tool based on your ridiculous preconceptions

the use cases I describe don't amount to "put pcre in front of web client users and run them against arbitrary input before jamming output into other terrible choices" but you're so stuck on that irrelevant point that i'm not sure how to ground you back in reality

Xarn
Jun 26, 2015

Thermopyle posted:

every time the discussion comes up about pcre vs re2 I can't help but think that I've never run into the problems that PCRE has in actual practice.

the above posts made me realize why...the idea of exposing regexes to my users seems scary and bad so I never do it.

I have actually ran into non-linear performance of PCRE once, when I wrote what I thought was a fairly simple regex (ie no backrefs) and then had it run over all .txt files in specific folder (and its subfolders). I didn't run into the worst case exponential behaviour, but since the corpus was fairly large I was not amused either.


JawnV6 posted:

have you heard grep sells u out if you have lots of partial matches, check out what this fish has to say

It sells you out a bit, but keeps the same asymptotic complexity. PCRE doesn't even attempt to.

The MUMPSorceress
Jan 6, 2012


^SHTPSTS

Gary’s Answer

Thermopyle posted:

Like...I'm not sure they missed the point instead of just offering information. sure, SO answers are supposed to be strict answers to the question, but that's rarely the actual case.

it's widely shared because its funny

if he was just sharing info he wouldn't be such a huge rear end about it.

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

cis autodrag posted:

if he was just sharing info he wouldn't be such a huge rear end about it.

i've never read that post as being an rear end, just being funny

but regardless i wasn't trying to say it was a pure emotionless info dump, only that it didn't have to be a direct answer to the question there

DONT THREAD ON ME
Oct 1, 2002

by Nyc_Tattoo
Floss Finder
yeah i'm not sure I agree with the rear end in a top hat readings. it's just a bit tongue-in-cheek.

DONT THREAD ON ME fucked around with this message at 20:35 on Sep 19, 2017

Cybernetic Vermin
Apr 18, 2005

JawnV6 posted:

again, you're the one presuming a shitload about where i'm jamming these? are you actually running sed scripts as a part of your production workflow or needlessly maligning a console tool based on your ridiculous preconceptions

the use cases I describe don't amount to "put pcre in front of web client users and run them against arbitrary input before jamming output into other terrible choices" but you're so stuck on that irrelevant point that i'm not sure how to ground you back in reality

yeah, sorry for trollish reply, the conversation i was in was responding to a post saying "just use pcre" (with little context other that "how to regex?!") noting it has pitfalls and suggesting re2. i really do not care a lot what your increasingly miniscule usecase is

regexes have a bad reputation which is to a great extent caused by bad uses and bad implementations, but going "it is fine to use x for regex matching because you shouldn't use regex matching (for y) anyway" is a pretty drat lovely advice when applied in general without a specific y in sight

if you have a well judged fragment as your actual case that's fine, good luck to you

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

MALE SHOEGAZE posted:

yeah i'm not sure I agree with the rear end in a top hat readings. it's just a bit tongue-in-cheek.

is that like reading tea leaves?

ozymandOS
Jun 9, 2004

MALE SHOEGAZE posted:

Java packages and golang packages are very different so the scoping rules arent really comparable. One major difference is that functions and variables are allowed at the top level of a golang package.
this is effectively also true in java since public classes can have public static state (which is effectively a global variable scoped to the package-level)

quote:

also, packages will be initialized once at runtime, and have mutable state, so it's more accurate to think of them as objects
java classes are also initialized once at runtime via static initializers

quote:

the end result of this is that go makes it very difficult to organize your code, which is especially an issue in a language that has no generics and demands that you have lots of functions named add_int32, add_int64, etc.
agreed that lack of generics can sometimes be painful, but TBH the lack of generics is only really a problem for pure-container classes (at least in the things I work on--I'd be curious to hear about where you ran into further problems)

quote:

add issues with cyclic dependencies on top of this, and you have a real poo poo show. at least in my experience. i was clearly not arranging my go code properly because i had lots of problems.
none of the problems are specific to go; circular dependencies in particular are a problem in most languages (are there langs where circular dependencies are allowed?)

Doom Mathematic
Sep 2, 2008

JawnV6 posted:

the worst kind of pedantic poo poo

OP never asked about "parsing" html, some chuffed up nerd decided they wanted to drop unrelated truth bombs now it's shared all over as if missing the point for a sweet dunk is admirable

To say nothing of the fact that XHTML open tags are strictly regular and can be parsed very easily using a regular expression.

necrotic
Aug 2, 2005
I owe my brother big time for this!
If it's valid xhtml just use an xml parser

redleader
Aug 18, 2005

Engage according to operational parameters
never hit it in prod, but i found our email validation regex was badly constructed when a unit test took 15 minutes to run

redleader
Aug 18, 2005

Engage according to operational parameters
if not regex, then what should i use for user input validation? or is this a question for the tp thread

gonadic io
Feb 16, 2011

>>=

redleader posted:

if not regex, then what should i use for user input validation? or is this a question for the tp thread

don't do anything with user input other than display it back to them. make them enter different parts in different fields so you don't have to split it up with anything more complicated than a .split('/')

like what validation are you doing anyway? with email addresses for example imo the most you should care about is that it has an "@" and then a "." after the @. even that is probably wrong honestly.

my homie dhall
Dec 9, 2010

honey, oh please, it's just a machine

cis autodrag posted:

if he was just sharing info he wouldn't be such a huge rear end about it.

no jokes allowed on the internet

Ralith
Jan 12, 2011

I see a ship in the harbor
I can and shall obey
But if it wasn't for your misfortune
I'd be a heavenly person today

redleader posted:

if not regex, then what should i use for user input validation? or is this a question for the tp thread
try to use it, and return an error if it doesn't work

tef
May 30, 2004

-> some l-system crap ->

Cybernetic Vermin posted:

yeah, sorry for trollish reply, the conversation i was in was responding to a post saying "just use pcre" (with little context other that "how to regex?!") noting it has pitfalls and suggesting re2. i really do not care a lot what your increasingly miniscule usecase is

regexes have a bad reputation which is to a great extent caused by bad uses and bad implementations, but going "it is fine to use x for regex matching because you shouldn't use regex matching (for y) anyway" is a pretty drat lovely advice when applied in general without a specific y in sight

if you have a well judged fragment as your actual case that's fine, good luck to you

try posting on hacker news

tef
May 30, 2004

-> some l-system crap ->
us:

there is posix syntax, pcre syntax, and countless variants

pcre won, posix lost


you:

so i read this webpage about why they used re2 and missed the point, it was about running arbitrary code from users

it was slower, sure, but, like it was faster on pathological cases, a bit like when someone writes a really bad for loop

akadajet
Sep 14, 2003

Shaggar posted:

rob pike is a big dumb idiot

who the gently caress is rob pike?

akadajet
Sep 14, 2003



oh...

aardvaard
Mar 4, 2013

you belong in the bog of eternal stench

he is the go guy

Adbot
ADBOT LOVES YOU

akadajet
Sep 14, 2003

CommunistPancake posted:

he is the go guy

cool

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply