Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
shoeberto
Jun 13, 2020

which way to the MACHINES?

Hughmoris posted:

Rookie perl question:

Given a string like '192.168.0.1:21,22,80,445,1432,30220' what would be a slick perl way to check if port 22 is in that string?

I can write it the naive way but I'm curious about a short-hand way to do it.

iirc perl has regex support, might try that

Adbot
ADBOT LOVES YOU

biznatchio
Mar 31, 2001


Buglord

Hughmoris posted:

Rookie perl question:

Given a string like '192.168.0.1:21,22,80,445,1432,30220' what would be a slick perl way to check if port 22 is in that string?

I can write it the naive way but I'm curious about a short-hand way to do it.

An example

To explain for a perl rookie, the check is:

pre:
if ($var =~ /(:|,)22(,|$)/) {
The =~ operator tests if a string matches a regex pattern. The regex is /(:|,)22(,|$)/ which tests if the string contains:

  • Either the character : or , -- followed by
  • The digits 22 -- followed by
  • Either the character , or the end of the string

biznatchio fucked around with this message at 04:33 on Apr 15, 2024

Pablo Bluth
Sep 7, 2007

I've made a huge mistake.

shoeberto posted:

iirc perl has regex support, might try that
While the concept and implementation of regexp predates perl, the advanced form of regexp that we all know and love arose in perl. I doubt there will be another language where regexpes will be quite so central to a language's identity.

Hughmoris
Apr 21, 2007
Let's go to the abyss!

shoeberto posted:

iirc perl has regex support, might try that


biznatchio posted:

An example

To explain for a perl rookie, the check is:

pre:
if ($var =~ /(:|,)22(,|$)/) {
The =~ operator tests if a string matches a regex pattern. The regex is /(:|,)22(,|$)/ which tests if the string contains:

  • Either the character : or , -- followed by
  • The digits 22 -- followed by
  • Either the character , or the end of the string

This is what I was looking for, thanks!

I'm writing some simple utility scripts for work and, while Python would be easier for me, I just find Perl fun to write.

shoeberto
Jun 13, 2020

which way to the MACHINES?

Pablo Bluth posted:

While the concept and implementation of regexp predates perl, the advanced form of regexp that we all know and love arose in perl. I doubt there will be another language where regexpes will be quite so central to a language's identity.

Jokes on you, I was being a gigantic dumbass. .. on PURPOSE!!

But yeah sorry I just assumed OP would have had experience to write an appropriate regex. The other post nailed it.

shoeberto
Jun 13, 2020

which way to the MACHINES?
In the interest of being helpful, I really like iterating on regexes here: https://regex101.com/

Pablo Bluth
Sep 7, 2007

I've made a huge mistake.

shoeberto posted:

Jokes on you, I was being a gigantic dumbass. .. on PURPOSE!!

But yeah sorry I just assumed OP would have had experience to write an appropriate regex. The other post nailed it.
My regexp for matching sarcasm in a string was malformed...

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
Actually measuring minor performance differences of this scale is very difficult and depends on the distribution of inputs, however, there is a different way to structure the pattern:

code:

/:.*\b22\b/

This is "find a literal 22 as a word that is somewhere after a colon". The backslash-b is a "word boundary" that automatically handles all non-letters-digits-or-underscores as well as the beginning and ending boundaries of the string. Also as there's known to be the single colon separating the IP from the port list, this pattern might make more intuitive sense because it shows that everything to the left of the colon doesn't matter.

If you want additional validation, the full list of ports, etc., then it changes, but this is still twice as fast as fully splitting the string.

Hughmoris
Apr 21, 2007
Let's go to the abyss!

PhantomOfTheCopier posted:

Actually measuring minor performance differences of this scale is very difficult and depends on the distribution of inputs, however, there is a different way to structure the pattern:

code:
/:.*\b22\b/
This is "find a literal 22 as a word that is somewhere after a colon". The backslash-b is a "word boundary" that automatically handles all non-letters-digits-or-underscores as well as the beginning and ending boundaries of the string. Also as there's known to be the single colon separating the IP from the port list, this pattern might make more intuitive sense because it shows that everything to the left of the colon doesn't matter.

If you want additional validation, the full list of ports, etc., then it changes, but this is still twice as fast as fully splitting the string.

I was curious about performance of the different solutions. Anyone know best practices for benchmarking scripts in modern perl? I'm thinking I could generate a 1 GB file of IP addresses and ports, and benchmark the different solutions.

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!

Hughmoris posted:

I was curious about performance of the different solutions. Anyone know best practices for benchmarking scripts in modern perl? I'm thinking I could generate a 1 GB file of IP addresses and ports, and benchmark the different solutions.
Use the Benchmark module, which should be included with most distributions. (Every mistake that you can make regarding benchmarking is possible to make here as well.)

biznatchio
Mar 31, 2001


Buglord
In my experience, the only time you ever need to worry about regex performance is if you have backtracking; and even then only when you fall into a catastrophic backtracking case that generally only happens when you stack multiple variable length operators on top of each other.

Neither of the provided regexes use a concerning backtracking pattern, so their performance differences are almost certainly immaterial, even across 1GB of data. From a performance perspective, Knuth's quote on the matter applies here:

quote:

The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming.

Choose one or the other based on readability; because in this case you're going to spend more wallclock time looking at the pattern as the developer than the CPU will spend actually executing it.

(And it is worth pointing out that neither pattern will work correctly if the strings might ever contain IPv6 addresses; some additional pattern work would be necessary to distinguish colons in the address from the colon separating the address from the port list.)

Adbot
ADBOT LOVES YOU

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
Catastrophic backtracking is certainly possible, but happily much less likely in Perl because of the regexp optimizations. I remember being surprised how awful some of the libraries were in (popular languages) a few/five years ago.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply