I still fucking love Perl

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > I still fucking love Perl

«‹›3 »

shoeberto: Jun 13, 2020; which way to the MACHINES?

Hughmoris posted:

Rookie perl question:

Given a string like '192.168.0.1:21,22,80,445,1432,30220' what would be a slick perl way to check if port 22 is in that string?

I can write it the naive way but I'm curious about a short-hand way to do it.

iirc perl has regex support, might try that

# ? Apr 15, 2024 03:44

Adbot: ADBOT LOVES YOU

# ? Apr 27, 2024 23:33

biznatchio: Mar 31, 2001; Buglord

Hughmoris posted:

Rookie perl question:

Given a string like '192.168.0.1:21,22,80,445,1432,30220' what would be a slick perl way to check if port 22 is in that string?

I can write it the naive way but I'm curious about a short-hand way to do it.

An example

To explain for a perl rookie, the check is:

pre:

if ($var =~ /(:|,)22(,|$)/) {

The =~ operator tests if a string matches a regex pattern. The regex is /(:|,)22(,|$)/ which tests if the string contains:

Either the character : or , -- followed by
The digits 22 -- followed by
Either the character , or the end of the string

biznatchio fucked around with this message at 04:33 on Apr 15, 2024

# ? Apr 15, 2024 04:29

Pablo Bluth: Sep 7, 2007; I've made a huge mistake.

shoeberto posted:

iirc perl has regex support, might try that

While the concept and implementation of regexp predates perl, the advanced form of regexp that we all know and love arose in perl. I doubt there will be another language where regexpes will be quite so central to a language's identity.

# ? Apr 15, 2024 08:58

Hughmoris: Apr 21, 2007; Let's go to the abyss!

shoeberto posted:

iirc perl has regex support, might try that

biznatchio posted:

An example

To explain for a perl rookie, the check is:
pre:
if ($var =~ /(:|,)22(,|$)/) {
The =~ operator tests if a string matches a regex pattern. The regex is /(:|,)22(,|$)/ which tests if the string contains:

Either the character : or , -- followed by

The digits 22 -- followed by

Either the character , or the end of the string

This is what I was looking for, thanks!

I'm writing some simple utility scripts for work and, while Python would be easier for me, I just find Perl fun to write.

# ? Apr 15, 2024 12:58

shoeberto: Jun 13, 2020; which way to the MACHINES?

Pablo Bluth posted:

While the concept and implementation of regexp predates perl, the advanced form of regexp that we all know and love arose in perl. I doubt there will be another language where regexpes will be quite so central to a language's identity.

Jokes on you, I was being a gigantic dumbass. .. on PURPOSE!!

But yeah sorry I just assumed OP would have had experience to write an appropriate regex. The other post nailed it.

# ? Apr 15, 2024 14:05

shoeberto: Jun 13, 2020; which way to the MACHINES?

In the interest of being helpful, I really like iterating on regexes here: https://regex101.com/

# ? Apr 15, 2024 14:06

Pablo Bluth: Sep 7, 2007; I've made a huge mistake.

shoeberto posted:

Jokes on you, I was being a gigantic dumbass. .. on PURPOSE!!

But yeah sorry I just assumed OP would have had experience to write an appropriate regex. The other post nailed it.

My regexp for matching sarcasm in a string was malformed...

# ? Apr 15, 2024 14:26

PhantomOfTheCopier: Aug 13, 2008; Pikabooze!

Actually measuring minor performance differences of this scale is very difficult and depends on the distribution of inputs, however, there is a different way to structure the pattern:

code:


/:.*\b22\b/

This is "find a literal 22 as a word that is somewhere after a colon". The backslash-b is a "word boundary" that automatically handles all non-letters-digits-or-underscores as well as the beginning and ending boundaries of the string. Also as there's known to be the single colon separating the IP from the port list, this pattern might make more intuitive sense because it shows that everything to the left of the colon doesn't matter.

If you want additional validation, the full list of ports, etc., then it changes, but this is still twice as fast as fully splitting the string.

# ? Apr 16, 2024 05:04

Hughmoris: Apr 21, 2007; Let's go to the abyss!

PhantomOfTheCopier posted:

Actually measuring minor performance differences of this scale is very difficult and depends on the distribution of inputs, however, there is a different way to structure the pattern:
code:
/:.*\b22\b/
This is "find a literal 22 as a word that is somewhere after a colon". The backslash-b is a "word boundary" that automatically handles all non-letters-digits-or-underscores as well as the beginning and ending boundaries of the string. Also as there's known to be the single colon separating the IP from the port list, this pattern might make more intuitive sense because it shows that everything to the left of the colon doesn't matter.

If you want additional validation, the full list of ports, etc., then it changes, but this is still twice as fast as fully splitting the string.

I was curious about performance of the different solutions. Anyone know best practices for benchmarking scripts in modern perl? I'm thinking I could generate a 1 GB file of IP addresses and ports, and benchmark the different solutions.

# ? Apr 16, 2024 21:50

PhantomOfTheCopier: Aug 13, 2008; Pikabooze!

Hughmoris posted:

I was curious about performance of the different solutions. Anyone know best practices for benchmarking scripts in modern perl? I'm thinking I could generate a 1 GB file of IP addresses and ports, and benchmark the different solutions.

Use the Benchmark module, which should be included with most distributions. (Every mistake that you can make regarding benchmarking is possible to make here as well.)

# ? Apr 16, 2024 23:37

biznatchio: Mar 31, 2001; Buglord

In my experience, the only time you ever need to worry about regex performance is if you have backtracking; and even then only when you fall into a catastrophic backtracking case that generally only happens when you stack multiple variable length operators on top of each other.

Neither of the provided regexes use a concerning backtracking pattern, so their performance differences are almost certainly immaterial, even across 1GB of data. From a performance perspective, Knuth's quote on the matter applies here:

quote:

The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming.

Choose one or the other based on readability; because in this case you're going to spend more wallclock time looking at the pattern as the developer than the CPU will spend actually executing it.

(And it is worth pointing out that neither pattern will work correctly if the strings might ever contain IPv6 addresses; some additional pattern work would be necessary to distinguish colons in the address from the colon separating the address from the port list.)

# ? Apr 18, 2024 02:45

Adbot: ADBOT LOVES YOU

# ? Apr 27, 2024 23:33

PhantomOfTheCopier: Aug 13, 2008; Pikabooze!

Catastrophic backtracking is certainly possible, but happily much less likely in Perl because of the regexp optimizations. I remember being surprised how awful some of the libraries were in (popular languages) a few/five years ago.

# ? Apr 19, 2024 05:18

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > I still fucking love Perl

«‹›3 »