Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
kumba
Nov 8, 2003

I posted my food for USPOL Thanksgiving!

enjoy the ride

Lipstick Apathy
First time trying to use powershell and I have what is probably a simple question: I have a folder on a network drive that has a bunch of .pdf files in a complicated hierarchy of subfolders. I want to copy items from these subfolders to a folder with the same hierarchy on my desktop, but I only want to copy items with a certain list of keywords in the file name.

So, for example if I have a folder G:\Docs\2016\9\29\JimBob\ that contains 3 files named Test.pdf, asdf.pdf, and Example.pdf, I want to copy only items that have the word Test or Example in the name.

I'm doing some logic so it knows which folders to copy over (new folders are created for each day new files are generated, so I wrote something to loop through the structure and find the most recent stuff). That part is fine and I can recreate the folder structure locally, but I just can't figure out the syntax to copy the actual .pdfs over - whatever I've tried it seems like I get all or nothing.

After I've recreated the folder structure, I loop through the final subfolders in the hierarchy (that's what $currentEntity is). $ratesToImport is an array containing the words I want to match in the filenames (so for the above example, I would have Test and Example as items in the array). This is what I'm using:

code:
Copy-Item -Path "G:\Docs\Pew Pew Inc\$mostRecentYear\$mostRecentMonth\$mostRecentDay\Archive\$currentEntity\*" -Recurse 
-Destination "C:\Users\$UserName\Desktop\Spanish Docs\"  -Include "*$ratesToImport[0]*", "*$ratesToImport[1]*", "*$ratesToImport[2]*"
And it doesn't seem to actually copy anything over. When I try this:

code:
Copy-Item -Path "G:\Docs\Pew Pew Inc\$mostRecentYear\$mostRecentMonth\$mostRecentDay\Archive\$currentEntity\*" -Recurse 
-Destination "C:\Users\$UserName\Desktop\Spanish Docs\"
                    Where-Object {
                                 $_.Name -like "*$ratesToImport[0]*" -or $_.Name -like "*$ratesToImport[1]*" -or $_.Name -like "*$ratesToImport[2]*"               
                                 }
It seems to copy everything in the folder instead of filtering anything out. What on earth am I doing wrong??

Adbot
ADBOT LOVES YOU

kumba
Nov 8, 2003

I posted my food for USPOL Thanksgiving!

enjoy the ride

Lipstick Apathy
I appreciate the nudge. I managed to get it to work like so:

code:
        Get-ChildItem "G:\Docs\Pew Pew Inc\*\*\*\Archive\$currentEntity\*" -Recurse -Include $ratesToImport[$j] | 
        select FullName, LastWriteTime | 
        sort LastWriteTime -Descending | 
        select -First 1 | 
        select -ExpandProperty FullName | 
        Copy-Item -Destination "C:\Users\$UserName\Desktop\Spanish Docs\$currentEntity\"
It performs like absolute garbage, presumably because of the triple wildcard in the filepath and all the sorting I need to do. I couldn't find a way around that, and unfortunately I have no power to modify the folder structure. Any tips on optimizing something like this or am I pretty much SOL?

kumba
Nov 8, 2003

I posted my food for USPOL Thanksgiving!

enjoy the ride

Lipstick Apathy
stupid newbie question:

I have a summarization process I run on a monthly basis that combines data from a poo poo load of disparate spreadsheets (67 of them to be exact). Part of this summarization process relies on Excel's INDIRECT() function in the main workbook building strings of filepaths based on other criteria yadda yadda. What this means is that for my summarization to work, I have to physically open every single workbook I want to pull data from. I hate doing this manually and would rather just have something do this for me while I walk away for a minute.

I wrote a script to open all the proper workbooks containing the string that I need, and that part works just fine. The problem I'm running into is having 68 spreadsheets open all at once crashes my system. I'd like to add a pause about halfway through. How do I do that? I found a way to pause in between, but the way variable increments work in powershell is apparently not what I would have thought and the pause fires off in between EVERY workbook instead of at the increment I wanted. For example:

code:
$string = "*string containing wildcards*.xlsm"
$array = Get-ChildItem -Path "\\my\file\path\goeshere" -Include $string -Recurse
for($i=0; $i -lt $array.Length; $i++)
    { 
      $item = $array[$i]
      Invoke-Item $item 
      if($i = 30)
        {
        read-host "PRESS ENTER TO CONTINUE..."
        }
    }
I would have thought that this would open up the first 30 (31 i guess technically) workbooks, then pop up a message window that I would then have to hit enter or click OK to proceed forward, but instead this window opens between every workbook. That doesn't make sense to me but this is my 2nd time using PS ever so I'm sure I'm doing something stupid. I tried using just foreach($item in $array) as well but the same thing happened.

kumba
Nov 8, 2003

I posted my food for USPOL Thanksgiving!

enjoy the ride

Lipstick Apathy
see, I told you it was something stupid. thanks yall


FISHMANPET posted:

Is the problem all 68 workbooks opening at the same time, or having all 68 workbooks open at the same time? If it's just that launching them all at once causes your system to freak out, I'd add start-sleep -seconds 5 after every invoke item. This will cause your script to wait 5 seconds between calls of invoke-item.

Also, powershell is way better at iterating than the way you're doing it. This doesn't really change how your script runs but it's much better powershell:
code:
$string = "*string containing wildcards*.xlsm"
$workbooks = Get-ChildItem -Path "\\my\file\path\goeshere" -Include $string -Recurse
foreach ($workbook in $workbooks)
    { 
      Invoke-Item $workbook
      start-sleep -seconds 5
    }
I guess you do lose your iterator i but I suspect you don't actually need that.

yeah i tried using that method first and then figured i'd go the old for loop route to see if it worked any differently, when the problem was just the comparison I was using instead. definitely going back to the foreach method

thanks again everyone

e: after iterating a few times I could not get if(($array.IndexOf($item)%20) -eq 0) part to work for some reason, that just refused to trigger. instead i used a foreach loop and just incremented a variable anyway and used that instead. ended up with

code:
$string = "*string containing wildcards*.xlsm"
$array = Get-ChildItem -Path "\\my\file\path\goeshere" -Include $string -Recurse | Sort-Object -Property name
$i = 0
foreach ($item in $array)
    { 
      Invoke-Item $item 
      start-sleep -seconds 3
      $i++
      if(($i % 20) -eq 0)
        {
        read-host "Pausing after 20 workbooks, press enter to continue!"
        }
    }
which works perfect, and as an added bonus after i discovered the sort-object cmdlet they now actually open in the order i want them to. hooray automation

thanks again goons

kumba fucked around with this message at 21:36 on Aug 24, 2018

kumba
Nov 8, 2003

I posted my food for USPOL Thanksgiving!

enjoy the ride

Lipstick Apathy

PBS posted:

What's the purpose of leaving them open then?

If you close the workbook, the formulas revert back to a #REF error because they're no longer open and accessible. Unfortunately, my process involves me copy + paste values over the formulas so the results stick. Hence, open 20 workbooks, pause while I copy + paste values, continue through the rest of them :v:

We're getting a permanent actual solution to this problem in a few months, this is just a stop gap that's part of a monthly reporting process until the time our infrastructure teams get everything in place

e: for context, I'm a supervisor of an analytics team in a call center that does agent performance grades, and this is part of a monthly summarization of those graded calls

kumba
Nov 8, 2003

I posted my food for USPOL Thanksgiving!

enjoy the ride

Lipstick Apathy
Performance improvement question:

I'm a SQL guy that is starting to dabble in powershell and I have a script that works but is unbelievably slow and I'm hoping for some pointers because some of this is obviously not well-constructed

Goal: I have a huge slew of .html files on a network drive buried in subfolders upon subfolders - each individual file is the result of a chat conversation between an existing/potential customer and the agent handling the chat. I want to extract the contents of these html files, remove all the bullshit/extraneous html nonsense, and be left with a list of english words that I can use to generate a simple word cloud to get an idea of the most common types of questions, objections, etc that we're facing

This is what I have so far:

code:
#Create empty txt file
Set-Content -Path C:\temp\chats\test.txt ""
#Get all .html files in all subfolders that have to do with Chat - Unfortunately doing a string search is my only reliable way to identify these from the other items in these folders
Get-ChildItem -Path Z:\ -Include "*_CS Chat_*.html" -File -Recurse | ForEach-Object {
    $contents = Get-Content -Path $_.FullName.ToString() 
    $arrayToParse = $contents.Split(' ')
    $newArray = @()
#There's gotta be a better way to do this than this a of exclusions, right?
    $exclusions = @('number','number?','number.','number,','chatting','contacting','account','account.','account?','account,','please','thank','display:','.attachmentIcon','inline-block;','background-size:','background-image:','styleheadbodybCampaign','width:','height:','16px;')
    foreach($word in $arrayToParse) 
    {
#There are also *shudder* pdfs embedded in some of these .html files so I need to remove huge strings of alphanumeric characters, hence the length limitation, among other things
        if(($word.Length -gt 30) -and ($word.Length -gt 0) -and (-not $word.Contains("https")) -and (-not $word.Contains("Omni")) -and (-not $word.Contains("<")) -and (-not $word.Contains(">")))
            {
            if ($word -in $exclusions)
                {
                #Write-Host "Skipping " $word
                }
            else    
                {
                #Write-Host $word
                $word = $word -replace '[\W]',''
                $newArray = $newArray +=$word
                }
            }
    }
    Add-Content -Path C:\temp\chats\test.txt -Value $newArray
}
My primary question has to do with the filtering of text. There's a large amount of bullshit formatting in these documents that I need to strip out to make sense of it and the above was my way of brute forcing it after running it and identifying things that needed removing. Surely there must be a better way to do this??

My first immediate thought is doing the regex first before comparing to the $exclusions array so I can at least get rid of the silly duplicates with formatting but I'm not sure where to go from there. Any pointers would be super appreciated :)

e: i guess also most of the delay is probably from it being several gigs of data on a network drive instead of on my local machine so any performance improvement will probably be limited

kumba fucked around with this message at 18:58 on Nov 16, 2022

Adbot
ADBOT LOVES YOU

kumba
Nov 8, 2003

I posted my food for USPOL Thanksgiving!

enjoy the ride

Lipstick Apathy
Thank you all for the suggestions!!

I posted it more as a learning exercise because this is one of those things I only really needed to use once and while it took an hour to run, my feeble script got the job done :D

I appreciate the feedback and have learned a bunch already!!

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply