Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
icantfindaname
Jul 1, 2008


I need a way to parse large amounts of excel data. So for instance, what I have now is an excel column with "from 5 to 7", "less than 8", "10 and up", and I need to extract the values 5 and 7, 8, and 10 respectively. The problem is they're not all the same type, so I need some way of differentiating between them.

This can be done with a fairly simple VBA script, right? I'll have to look up how to write it but that is what I should be looking into?

Adbot
ADBOT LOVES YOU

icantfindaname
Jul 1, 2008


Is there any straightforward way of scraping data from websites? Specifically information about park and rec departments from local government websites. I found a program Mozenda that can do most of it but certain sites have formatting archaic or messed up enough for it not to work properly. For example LA doesn't work properly with it, and NYC doesn't have a tabular listing of their programs. I have a feeling this is a pretty drat complicated thing, but it doesn't hurt to ask. Am I stuck using services like Mozenda with no guarantees or coding it myself from scratch?

icantfindaname
Jul 1, 2008


Alright, thanks for the suggestions. I'll look into those

icantfindaname
Jul 1, 2008


I've been requested to port the RPART package from R into Stata, is that even possible? I'm looking at the Mata documentation now, and it looks like it's close enough to C that you could do it, but is there any sort of package management in Stata at all?

https://cran.r-project.org/web/packages/rpart/index.html

icantfindaname
Jul 1, 2008


Are there any applications that use COBOL that don’t really have good alternatives? Every now and then you read news articles about how state unemployment systems or whatever use COBOL, but are there actually meaningfully better ways of doing that than a mainframe and some COBOL script? If you were to implement such a system from scratch in 2022 what would be the tech used? Some Java EE thing?

icantfindaname
Jul 1, 2008


This may be more appropriate for the data engineering thread, but what is the best solution to extract tables from pdf files using python? I'm looking at these reports, the tables in them are pretty simple and don't seem like they should be that hard to extract. It's not an image, it's text with formatting

https://www.federalreserve.gov/supervisionreg/dfast-archive.htm



I've tried using pypdf, the built in linux pdftotext, tabula, camelot, none of them do it properly out of the box. Is there anything that does, or is this not really a solved problem? Obviously just copy pasting it would be easy, but I'm doing this out of curiosity

Adbot
ADBOT LOVES YOU

icantfindaname
Jul 1, 2008


mystes posted:

Is that table not in the csv files also linked there?

It is, but I’m trying to get it from the pdf directly as a challenge

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply