Ask General Programming Questions Not Worth Their Own Thread

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Ask General Programming Questions Not Worth Their Own Thread

«‹›672 »

necrotic: Aug 2, 2005; I owe my brother big time for this!

Empress Brosephine posted:

ooo that's a good idea using Python to do it. Thanks!

Maybe I should learn one day what language VS extensiosn are programmed in and just make one for myself...all it needs to do is scan the .html for "class=" or "Class=" or whatever typing of it and then dump what follows until the next line break!

Thanks!

They are written in javascript.

# ? Apr 23, 2021 23:07

Adbot: ADBOT LOVES YOU

# ? Jun 10, 2024 03:44

CarForumPoster: Jun 26, 2013; â¡POWERâ¡

Empress Brosephine posted:

ooo that's a good idea using Python to do it. Thanks!

Maybe I should learn one day what language VS extensiosn are programmed in and just make one for myself...all it needs to do is scan the .html for "class=" or "Class=" or whatever typing of it and then dump what follows until the next line break!

Thanks!

Tangentially related but I've used regex101.com dozens of times so far for regex building stuff. Pretty useful site because of its ability to easily paste test cases

# ? Apr 30, 2021 03:02

Macichne Leainig: Jul 26, 2012; by VG

CarForumPoster posted:

Tangentially related but I've used regex101.com dozens of times so far for regex building stuff. Pretty useful site because of its ability to easily paste test cases

+1 Regex101.com, I love that it can explain how the regular expression works as well. That was the piece I needed to be able to wrap my head around regex.

# ? Apr 30, 2021 16:54

Empress Brosephine: Mar 31, 2012; by Jeffrey of YOSPOS

I'm going to check that out, thanks

# ? Apr 30, 2021 22:34

CarForumPoster: Jun 26, 2013; â¡POWERâ¡

FWIW 50+ times over ~4 years using regexs to accomplish something and I still don�t understand them well enough to write one from scratch. I just always Google->stack overflow->regex101 if needed-> test in code.

It�s a crutch.

# ? May 1, 2021 04:00

Slimy Hog: Apr 22, 2008

CarForumPoster posted:

FWIW 50+ times over ~4 years using regexs to accomplish something and I still don’t understand them well enough to write one from scratch. I just always Google->stack overflow->regex101 if needed-> test in code.

It’s a crutch.

I usually either write a simple one by hand then move to regex101 to test a bunch of edge cases and fix my inevitable mistakes or skip all that and jump to regex101 and use the sidebar to write a regex.

If anyone tells you that they can write flawless regex without help us lying to you.

# ? May 1, 2021 04:12

rjmccall: Sep 7, 2007; no worries friend; Fun Shoe

No, I think a lot of people have learned regular expressions well enough to code them without help. That doesn't mean that they never make mistakes or that they know every detail of every extension in PCRE, but well enough to rattle off something basic like ^(?:\w+:\w+,)*\w+:\w+$, sure. It's just another programming language, one with at most 20 features you actually need to remember: \, ., \w, \d, \s, \S, [], [^], ^, $, (), ?:, |, *, *?, +, +?, maybe {}. I can never remember the lookahead or back-reference stuff, but some people really swear by it.

Like anything else, it's hard to retain and build expertise if you're not using it for months at a time, but if you ever start needing it day-to-day, you'll learn it quickly enough.

rjmccall fucked around with this message at 06:33 on May 1, 2021

# ? May 1, 2021 06:31

pokeyman: Nov 26, 2006; That elephant ate my entire platoon.

I can do pretty good, though I also enjoy regex crosswords so maybe I'm special.

I could never remember what \b and \w do until I tried out the Execute Program regex course. Somehow that seared it into my brain. It teaches other parts of regexes too!

# ? May 1, 2021 07:01

Hammerite: Mar 9, 2007; And you don't remember what I said here, either, but it was pompous and stupid.; Jade Ear Joe

I can never remember the language-specific extensions so I always have to look them up every time. for example the syntax for named subpatterns is different between C# and Python and I have to look it up whenever I use it in either. I should really write myself a cheat sheet or something I guess.

# ? May 1, 2021 11:10

Hammerite: Mar 9, 2007; And you don't remember what I said here, either, but it was pompous and stupid.; Jade Ear Joe

Also I don't know whether the point was lost but regular expressions aren't a suitable solution for the general problem of extracting class names from an HTML page. I suggested that as a solution because it was a specific page that the poster wanted to do that to, and that page is unlikely to present false positives or if it does you could just work around them. If you had to do it for pages in general then you should use an HTML parsing library.

# ? May 1, 2021 11:12

Super-NintendoUser: Jan 16, 2004; COWABUNGERDER COMPADRES; Soiled Meat

The only time I have to usually deal with regex is with SSO, since the identity provider typically has usernames stored differently than the legacy systems I'm trying to integrate and I can use regex to manipulate the name (turn f.lastname into firstname.lastname). Doing this regex is an absolute nightmare, but it's possible. Typically I just use a different feature and map the names separately since I can't trust the customer's ldap anyways. Once in a while a guy comes along with a misspelled principle name, or he's lastname.f for some reason and the regex doesn't work.

# ? May 1, 2021 16:39

Plorkyeran: Mar 22, 2007; To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed

Step one of learning something is to stop trying to convince yourself that it's impossible to learn.

Step two of learning regular expressions is probably to get in the habit of using them for find/replace in your editor of choice as often as possible. It'll be slower than whatever you're currently doing at first, but you won't get better at writing regexps without just spending time doing it.

# ? May 1, 2021 17:04

Bruegels Fuckbooks: Sep 14, 2004; Now, listen - I know the two of you are very different from each other in a lot of ways, but you have to understand that as far as Grandpa's concerned, you're both pieces of shit! Yeah. I can prove it mathematically.

Jerk McJerkface posted:

The only time I have to usually deal with regex is with SSO, since the identity provider typically has usernames stored differently than the legacy systems I'm trying to integrate and I can use regex to manipulate the name (turn f.lastname into firstname.lastname). Doing this regex is an absolute nightmare, but it's possible. Typically I just use a different feature and map the names separately since I can't trust the customer's ldap anyways. Once in a while a guy comes along with a misspelled principle name, or he's lastname.f for some reason and the regex doesn't work.

oh man that doesn't sound like a good idea but you gotta do what you gotta do i guess

# ? May 2, 2021 05:49

KillHour: Oct 28, 2007

I personally like https://regexr.com/

Add me to the list of people who use regex on a weekly basis and still need to look up basic poo poo.

# ? May 2, 2021 06:52

ultrafilter: Aug 23, 2007; It's okay if you have any questions.

I like https://www.debuggex.com/ but I don't do anything particularly complicated.

# ? May 2, 2021 14:08

HappyHippo: Nov 19, 2003; Do you have an Air Miles Card?

rjmccall posted:

No, I think a lot of people have learned regular expressions well enough to code them without help. That doesn't mean that they never make mistakes or that they know every detail of every extension in PCRE, but well enough to rattle off something basic like ^(?:\w+:\w+,)*\w+:\w+$, sure. It's just another programming language, one with at most 20 features you actually need to remember: \, ., \w, \d, \s, \S, [], [^], ^, $, (), ?:, |, *, *?, +, +?, maybe {}. I can never remember the lookahead or back-reference stuff, but some people really swear by it.

Like anything else, it's hard to retain and build expertise if you're not using it for months at a time, but if you ever start needing it day-to-day, you'll learn it quickly enough.

Regexs are like a programming language, although one with a syntax akin to brainfuck.

# ? May 2, 2021 15:14

nielsm: Jun 1, 2009

Sure, regular expressions have a terse syntax, but the structure would be the same if you used a keyword-based syntax with whitespace between tokens and such, but it would also be so much more to write. Some regex syntaxes do allow whitespace to not be significant and even comments, but still use the common symbols for operators.

# ? May 2, 2021 15:47

Super-NintendoUser: Jan 16, 2004; COWABUNGERDER COMPADRES; Soiled Meat

Bruegels Fuckbooks posted:

oh man that doesn't sound like a good idea but you gotta do what you gotta do i guess

The issue here (which I agree is bad) is that I've been doing replatforming of a CMS that has an internal identity pool/structure/permissions already inside it.

The replatforms typically also include using SSO w/SAML. It's not a hard technical task, since our SAML module is a standard open source one.

The issue is that users used to access the client via their username inside the client. Typically firstname.lastname. However when they go to saml, usually it's the LDAP name or auth token from SAML that contains a different assertion ID, and I don't know what it is. Sometimes this is initial.lastname or it's from before they were married or divorced and the names don't match. Or in ldap they are just an ID number and their people name isn't used.

In our latest release we just added an alias field to the internal user identity config. So they do a saml auth, and then the cms checks the assertion Id against the user saml alias field, and then impersonates the name of the actual user so the legacy config/permissions still work.
We also provide an API that they can pipe names/identities etc into it. It works well now and there's no regex required.

# ? May 2, 2021 15:51

HappyHippo: Nov 19, 2003; Do you have an Air Miles Card?

nielsm posted:

Sure, regular expressions have a terse syntax, but the structure would be the same if you used a keyword-based syntax with whitespace between tokens and such, but it would also be so much more to write. Some regex syntaxes do allow whitespace to not be significant and even comments, but still use the common symbols for operators.

My point is that with programming languages it's generally considered a good trade-off if the syntax is more readable but takes longer to write. Sometimes I think that attitude is taken too far, but certainly regexs have a reputation for being difficult to read, and I don't consider that surprising given the syntax.

# ? May 2, 2021 16:07

ultrafilter: Aug 23, 2007; It's okay if you have any questions.

In a lot of cases the problem is that the logic you're trying to implement is inherently complicated. Consider the following regex for email address validation taken from the accepted answer to How to validate an email address using a regular expression?:

code:

(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

That's very hard to understand, but it's no shorter than it needs to be to cover the standard. Anything that implements all parts of that standard is going to be similarly complex. Maybe a different syntax would make the different parts more apparent, but the logic just isn't simple.

# ? May 2, 2021 16:39

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

Generally speaking, you're better off using a selection of multiple regular expressions combined with some actual code in your programming language of choice.

Or a formal grammar and a parser generator.

Knowing when it's no longer appropriate to write "a regex" as your solution is part of being able to use regexes effectively.

# ? May 2, 2021 23:36

Munkeymon: Aug 14, 2003; Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.

You validate an email address by sending an email with a link to click IMO, but that's not a regex issue.

# ? May 3, 2021 16:00

pokeyman: Nov 26, 2006; That elephant ate my entire platoon.

Munkeymon posted:

You validate an email address by sending an email with a link to click IMO, but that's not a regex issue.

And this lets you turn your regex into something like \S+@\S+ which is much easier to remember!

# ? May 3, 2021 16:54

Hammerite: Mar 9, 2007; And you don't remember what I said here, either, but it was pompous and stupid.; Jade Ear Joe

pokeyman posted:

And this lets you turn your regex into something like \S+@\S+ which is much easier to remember!

you can technically have spaces in an email address if you want. something like "My Butt"@somethingawful.com is a valid email address (the quotes are part of the address)

I think .+@.+ or some variant like [^@]+(@[^@]*)*@[^@]+ would be ok but for all I know they're not, the rules for email addresses are extremely permissive.

tbh if you really need to check whether an email address is well-formed you probably want to use a finite state machine, not a regular expression (because of the quoting rules); and if you want to check whether it is valid you should just try sending email to it.

of course none of this stops you saying "gently caress your fancy-rear end email address that you only created for the sake of being technically correct, I'm arbitrarily disallowing it" and if you're in a position to do that then hell, you should.

# ? May 3, 2021 17:35

Macichne Leainig: Jul 26, 2012; by VG

Hammerite posted:

you can technically have spaces in an email address if you want. something like "My Butt"@somethingawful.com is a valid email address (the quotes are part of the address)

Technically sure, but basically every email service out there prevents you from registering an email address with a space in it, so I'm not gonna support some weird rear end edge case.

I think it makes sense to do some basic validations to make sure the input data looks like an email address, but beyond that implementing a regex that validates an email address down to the exact email address specifications doesn't really provide a tangible benefit.

Anyway, you are obviously on the same page about dumb "technically correct for the sake of being technically correct" email addresses so I digress.

# ? May 3, 2021 17:45

Hammerite: Mar 9, 2007; And you don't remember what I said here, either, but it was pompous and stupid.; Jade Ear Joe

my email address is ＠@＠.at don't @ me

# ? May 3, 2021 17:47

Spikes32: Jul 25, 2013; Happy trees

Are there any programs or websites out there that will let me automatically diagram out IIF statements (and ideally work in reverse too)? I work with a dumb program that gives me access to limited customization options and that's the most used one, but following IIF statements 5/6 levels deep with multiple branches is a real pain.

# ? May 4, 2021 16:57

raminasi: Jan 25, 2005; a last drink with no ice

Hammerite posted:

tbh if you really need to check whether an email address is well-formed you probably want to use a finite state machine, not a regular expression (because of the quoting rules)

Regular expressions are a language for encoding finite state machines (weird backreference stuff aside)

# ? May 4, 2021 17:00

ultrafilter: Aug 23, 2007; It's okay if you have any questions.

raminasi posted:

Regular expressions are a language for encoding finite state machines (weird backreference stuff aside)

The two representations are equivalent in power but sometimes one representation is considerably more simple than another. Think about the set of all binary strings that contain an even number of 1s and an odd number of 0s. That's very easy to describe as an FSM but the regex for it isn't quite as simple.

# ? May 4, 2021 17:42

Hammerite: Mar 9, 2007; And you don't remember what I said here, either, but it was pompous and stupid.; Jade Ear Joe

raminasi posted:

Regular expressions are a language for encoding finite state machines (weird backreference stuff aside)

After looking on wikipedia, it turns out "finite state machine" has a more restrictive definition than I thought. What I thought qualified as a finite state machine is actually a "pushdown automaton". I thought that a finite-state machine could have finitely many variables (e.g. an integer telling you how many levels of nesting you have entered), apparently not? It seems weird to conceptualise that as being backed by a "stack" with symbols from the singleton alphabet but I guess that's what wiki is telling me.

# ? May 4, 2021 17:48

ultrafilter: Aug 23, 2007; It's okay if you have any questions.

Finite state machines in an introductory theory class are very restricted--they're not even pushdown automata because they don't have stacks--but finite state machines in general are not. There are models for those as well but they don't get taught early on.

# ? May 4, 2021 17:52

Hammerite: Mar 9, 2007; And you don't remember what I said here, either, but it was pompous and stupid.; Jade Ear Joe

ultrafilter posted:

Finite state machines in an introductory theory class are very restricted--they're not even pushdown automata because they don't have stacks--but finite state machines in general are not. There are models for those as well but they don't get taught early on.

so are you saying that "finite state machine" has 2 different meanings? how come? Wikipedia's article on "finite-state machine" defines them as being strictly less capable than pushdown automata.

My degree is in maths and stats, all my cs knowledge I picked up as a hobbyist or later on the job, it doesn't come from introductory classes but it is limited in some areas.

# ? May 4, 2021 18:58

rjmccall: Sep 7, 2007; no worries friend; Fun Shoe

Formally, a finite state machine is characterized by storing a statically-bounded amount of information. A pushdown automata can store unbounded information because the stack can grow without limit.

Programmers often say �finite state machine� for the design pattern of a finite control: a system that receives discrete events and switches on some enumerated internal state to decide how to respond, and where a response can include changing that enumerated state. It�s not meant to imply that the total amount of information stored is finite, though; and formally, even Turing machines are built around a finite control.

# ? May 4, 2021 19:39

nielsm: Jun 1, 2009

A finite state machine definitely can't have any variables, other than the current state. If you want one with variables that all have a finite set of valid values then sure, you can build a multidimensional matrix with each variable in one dimension, then enumerate all positions in the matrix into new state values (in a single dimension), and have a stupidly complex state machine. At least as a mathematical model.
If you want one with variables that have a (conceptually) infinite range of valid values, then your state machine is no longer finite.

Push-down automatons is a different class of computational capability. They have a stack of infinite capacity, so they do have an infinite number of possible states.

As far as I remember, Turing machines are the next step up in computational capability.

You can make the argument that digital computers are finite state machines, they just tend to have somewhere around 2^{1000000000000} different states which (in isolation) makes them pretty good at pretending to be Turing machines, but on the other hand they actually also have I/O devices that let them output data to external systems of unknown capabilities, and that output could potentially affect the input stream, so I'd argue they are fully Turing complete given an appropriate I/O device. (Thematically appropriate would be a tape station with some kind of automatic cutting and splicing to make it look like there was a spool with infinite capacity in either direction.)

Edit: ^ above me is better at being concise

# ? May 4, 2021 19:52

ultrafilter: Aug 23, 2007; It's okay if you have any questions.

rjmccall posted:

Formally, a finite state machine is characterized by storing a statically-bounded amount of information. A pushdown automata can store unbounded information because the stack can grow without limit.

Programmers often say “finite state machine” for the design pattern of a finite control: a system that receives discrete events and switches on some enumerated internal state to decide how to respond, and where a response can include changing that enumerated state. It’s not meant to imply that the total amount of information stored is finite, though; and formally, even Turing machines are built around a finite control.

This is the distinction I was getting at. There are also different types of finite state machines that people study to develop theories about various systems. For instance, timed automata are very important for reasoning about real-time systems.

# ? May 4, 2021 20:56

Yaoi Gagarin: Feb 20, 2014

There are linear bounded automata, in between pushdown and turning machines.

Each of these machine classes corresponds to a type of language in the chomsky hierarchy:
FSM = regular language
Pushdown automata = context free language
LBA = context sensitive language
Turing machine = unrestricted language

# ? May 4, 2021 21:24

ultrafilter: Aug 23, 2007; It's okay if you have any questions.

Finite state machines and Turing machines accept the same class of languages whether you allow nondeterminism or not, nondeterministic pushdown automata are strictly more powerful than the deterministic ones, and it's still an open question for linear bounded automata (because no one really cares).

# ? May 4, 2021 21:29

Bruegels Fuckbooks: Sep 14, 2004; Now, listen - I know the two of you are very different from each other in a lot of ways, but you have to understand that as far as Grandpa's concerned, you're both pieces of shit! Yeah. I can prove it mathematically.

Hammerite posted:

so are you saying that "finite state machine" has 2 different meanings? how come? Wikipedia's article on "finite-state machine" defines them as being strictly less capable than pushdown automata.

My degree is in maths and stats, all my cs knowledge I picked up as a hobbyist or later on the job, it doesn't come from introductory classes but it is limited in some areas.

the class where all the cs majors learn this poo poo is called "theory of computation" or "automata theory."

it was the first course i ever took where the professor assigned his own textbook. the warning sign was the subtitle "a gentle introduction" - that was a lie, the class was not gentle.

# ? May 4, 2021 23:36

Gothmog1065: May 14, 2009

Is there a string length limit that regex cannot handle? Namely in older systems (It is monk code, based off of lisp).

# ? May 7, 2021 03:46

Adbot: ADBOT LOVES YOU

# ? Jun 10, 2024 03:44

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

If there is, it would be specific to that particular system, rather than a general regex thing.

# ? May 7, 2021 03:57

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Ask General Programming Questions Not Worth Their Own Thread

«‹›672 »