|
Question: What is the proper way to break out these email addresses into separate rows? I get a response from an API when I lookup an email address for someone formatted like this: code:
I will eventually send this person an email at both emails and will need to break this into one line, one email like: UserName1 | greatemailname@gmail.com UserName1 | greatemailname6@yahoo.com UserName2 | Email1 UserName2 | Email2 UserName2 | Email3 Username3 | Email1
|
# ¿ Jan 26, 2019 00:52 |
|
|
# ¿ May 14, 2024 15:10 |
|
Bundy posted:Can we not goon up what had until now been a nice, interesting and informative thread, with a stupid slapfight thanks just thanks. This, I really enjoyed the MSS chat as I've used both OpenCV w/python and pyautogui but hadn't heard of MSS
|
# ¿ Feb 25, 2019 02:32 |
|
cinci zoo sniper posted:PyCharm 2019.1 is out. Summary - revamped Jupyter integration; improvements for data classes, debugging, type checking, pytest. The jupyter changes look good. Jupyter notebooks being better in firefox reallydecreased my use of pycharm even though, for .py files, I really liked pycharm. https://www.youtube.com/watch?v=TIZH4aPSN2E
|
# ¿ Mar 27, 2019 21:01 |
|
Boris Galerkin posted:Speaking of data, I'm looking to store several GB of CSV data in a single compressed HDF5 file and I was wondering what package I should use for that. Right now I've found h5py, pytables, and there's also xarray I guess. Are there any pros/cons for any of these? Why not just use pandas?
|
# ¿ Apr 3, 2019 20:16 |
|
Boris Galerkin posted:A few weeks ago Mozilla introduced Iodide which is some kind of notebook-like clone but for Javascript and I think one of the selling points is that it runs entirely in your browser, no external server needed. Why code on an ipad?
|
# ¿ Apr 17, 2019 15:42 |
|
Umbreon posted:Oh I agree, I'm only asking about the monthly sub, I wouldn't get anything longer than that. I'm saying that even $40 a month feels really expensive for something like this, I just want to make sure it's worth it. If you're using it to do personal projects maybe its a little costly...I'd probably just move on to projects from there. (e.g. if you want to learn python to do data science/deep learning, go here, it is free: https://course.fast.ai/videos/?lesson=1) If you're trying to do your current job better get your company to pay for it If you're trying to do a new job it is dirt cheap relative to a code bootcamp or other more formal training scenario
|
# ¿ Apr 20, 2019 13:06 |
|
KICK BAMA KICK posted:A simple dict mapping strings to strings (can guarantee they're short, alphanumeric, no whitespace) that I want to read/write from disk -- is pickle the best option or would you rather write it as a text file? Few hundred entries at most, accessed/modified a only few times a day. Portability between implementations and platforms would be a huge plus. I do a lot of pickle reading/writing and csv reading/writing with the pandas implementation of each. Pickle is an order of magnitude faster, would take a pickle every time. (Bonus that it handles mixed data types)
|
# ¿ Apr 29, 2019 00:21 |
|
punished milkman posted:Anyone have any package suggestions for extracting tables of data from image files (.png/.jpg) ? I tried using Tesseract/pytesseract and while it's doing a great job of detecting the text, the tabular aspect of it is totally lost and I couldn't find a straight forward path to processing tables with it. I've used Camelot with PDFs before, and it worked OK (at best), but I'm hoping to use something else this time around. Extortionist posted:This isn't an easy problem. If the images are fairly consistent you can try using one of the tesseract outputs that supplies word coordinates and do your own table determination based on the relative positions of words. It might also be useful to run the images through opencv first to extract the positions of the lines (possibly also removing them from the image, or splitting into several small images prior to OCR). Chiming in to say Docucharm is pretty good at this if theyre of a vaguely consistent format. For example I needed to typed and printed-> scanned reports into structured data. I did a compare of them to Textract (which I got access to early) and they were much better. Havent used either in about 4 months so cant say if either has made large progress. CarForumPoster fucked around with this message at 22:11 on May 18, 2019 |
# ¿ May 18, 2019 22:09 |
|
Hughmoris posted:I need some advice on how to approach a problem at work. Why not just have a set of orders that get run through each time a configuration change is made?
|
# ¿ May 24, 2019 02:21 |
|
Hughmoris posted:That was my initial thought, too. I could have a defined order set, place the orders, then check what the final destination is for each order pre and post rule change. If you can really reliably find them by searching you could use python/selenium to get the data out the other side automatically.
|
# ¿ May 24, 2019 02:29 |
|
Going to start my first django project this week. If I like pycharm, is it worth getting the professional version over community because of the wed development related features? If this project goes well I expect to be using it daily for several weeks at least.
|
# ¿ May 28, 2019 13:16 |
|
unpacked robinhood posted:
I had never heard of pendulum until now and have definitely been bitten by datetime issues or wrote too complicated of code for that kind of bs. Absolutely fantastic.
|
# ¿ Jun 2, 2019 21:30 |
|
Empress Brosephine posted:So I finished Python Crash Course and loved it; what should I read next to improve my skills? Is it worth learning more than the blade level of skills with Flask? Find a project you want to do to solve some problem and do it.
|
# ¿ Jun 4, 2019 02:24 |
|
General_Failure posted:I may have been hit by a pip typesquatting attack. Thanks for posting about this, I didnt know this was a thing.
|
# ¿ Jun 17, 2019 02:55 |
|
FCKGW posted:I learned python in community college a few years back and I'm going to be going back to school soon and would like to get a refresher course. I like codecademy but even more than that I like picking something you want to make and making it. You'll pick it up quickly.
|
# ¿ Jun 18, 2019 22:38 |
|
TLDR: Is pywin32 the only/easiest way to print PDFs with python on windows? —— I want to print a few hundred PDFs per week in a specific order from a specific copier and with alternating printer properties on windows. Some are hole punched with paper from tray one, some are b&w only from tray 2. Batch printing greatly slows down the manufacturing process that follows printing (assembling documents and mailers) Pywin32 looks complicated as gently caress for what should be a trivial problem. The “shell” method looks like it wouldn’t take enough arguments leaving me reading MSDN docs and going very low level. Has there really been no progress since 2010 or so?! Edit: I only have about 4 configs and 1 printer on 1 port I’m concerned with so I am going to try installing the same printer 4 additional times with unique names and default settings. Not an elegant pythonic solution but should work. CarForumPoster fucked around with this message at 22:36 on Jul 6, 2019 |
# ¿ Jul 6, 2019 22:19 |
|
QuarkJets posted:Have you looked at pkipplib or win32print? Thermopyle posted:If you're asking about printing PDFs that already exist rather than actually creating PDF files with python, I'd just look into generic windows ways to send a file to the printer. Hell, a quick google shows some results for "windows print pdf from command line", which you could just do from python. Yea the PDFs are already created with python. (A report, an envelope and a shipping label) Win32print is the pywin32 method that gets hilariously complicated quick. Sending a PDF to a printer that’s already configured is pretty easy. The problem is configuring the printer to do what I want (eg hole punch) from Python. I think the solution is “install 4 copies of the printer driver with different default configs” but can’t test until Monday. CarForumPoster fucked around with this message at 01:05 on Jul 7, 2019 |
# ¿ Jul 7, 2019 00:40 |
|
Appreciate your guy's help, Thermopyle posted:If you're asking about printing PDFs that already exist rather than actually creating PDF files with python, I'd just look into generic windows ways to send a file to the printer. Hell, a quick google shows some results for "windows print pdf from command line", which you could just do from python. CarForumPoster posted:I am going to try installing the same printer 4 additional times with unique names and settings. Not an elegant pythonic solution but should work. This worked and is dead simple and pretty reliable. My printer or print spooler doesnt necessarily honor the order sent but a little bit of waiting fixed that. CarForumPoster fucked around with this message at 12:39 on Jul 9, 2019 |
# ¿ Jul 9, 2019 12:34 |
|
What I want to do: Convert a .docx file containing comments to pdf, displaying the Word comments in the PDF. Why:I have a Flask web application on Heroku where a user uploads a .docx which is sent to S3 and the app adds comments to the .docx, then returns the PDF which is displayed in the browser. The problem: Heroku runs on linux and LibreOffice doesnt print the PDF comments in a way thats pretty like Word on Win10 does, it instead embeds them as PDF comments. In Win10 Word O365, this is trivial. You save it, which you can do easily from python. I'm looking for the simplest/best solution to output a PDF with comments that looks like it does with Word. A few things I'm considering: -Have an Amazon Workspaces Win10 w/O365 running basically as a server and a Python script that somehow gets the files from S3, converts it on windows and returns the PDF to S3 and notifies my web app that the file is available for download. Not sure how to do this but seems possible. -Trying a bunch of other Linux DOCX->PDF solutions to see if they're any better. -Try installing MS Word on Heroku. Not sure if possible. (Update: It's Not) EDIT: Some better ideas possibly: -Use MS Office Online to convert the Word Doc -Find an API to do it for me. EDIT2: Finding an API seems like the best solution but gently caress me if they dont suppress the comments output. Tried 3 so far. One works, but the docs are confusing. 2 Don't work. EDIT3: Solved it using Zamzar. Just so happens thats how their DOCX-PDF works and their docs are great. CarForumPoster fucked around with this message at 20:00 on Jul 9, 2019 |
# ¿ Jul 9, 2019 13:14 |
|
Thermopyle posted:What you're looking for is called a task queue. Me and another dev just made babbys first deployed web app and this is exactly what we did for a function that takes about 3 minutes to run. As a pro tip on RQ/Redis/Flask/Dash combo it doesnt play nice on windows 10. The worker.py file we had to grab poo poo out of the queue didnt work so stuff just stacked up in redis. The front end "web" worker times out after 30s on heroku so we also had to figure out how to not use a while loop to ask the queue if our jobs were done yet. Still kinda working on that last bit but it was confusing for me for a while. CarForumPoster fucked around with this message at 02:58 on Jul 19, 2019 |
# ¿ Jul 19, 2019 02:49 |
|
Thermopyle posted:This reminds me of when I was first getting into web stuff...it was very confusing and nebulous and magical for a long time to me. This is me
|
# ¿ Jul 20, 2019 16:14 |
|
In case anyone else had this problem...I started to typequote:Is there a better way than a file I git ignore to store secrets like API keys and what not? Like maybe a AWS service that can only be accessed by whitelisted IPs But then I googled like a good boy and there is and it works fine through boto3: https://aws.amazon.com/blogs/aws/aws-secrets-manager-store-distribute-and-rotate-credentials-securely/
|
# ¿ Jul 24, 2019 13:11 |
|
If you have a helpful link as an answer, that'd be great, don't feel like you need to give me a detailed, personalized answer. My Problem: I'm starting to have more and more python/selenium based web scrapers that get very similar data from different gov websites. For example: I scrape the same data using 6 templates from 6 different government websites and format all the data to go in the same google sheet. Right now, I can run all of these these scrapers in about 10-15 minutes locally. We're considering upping this to 50 government websites/day, which goes a bit beyond what I can run sequentially on my local machine. Instead I'd like to run them simultaneously and automatically, simply making a log so I can catch any errors. One of them will run in to errors about 1 in every 4 or 5 days, but with 5X the websites it'll likely happen daily for a while. TLDR: Whats the current industry standard to scrape 50+ websites per day using separate processes/workers? Is it time to figure out AWS lambda? Whats the industry standard way to log this? CarForumPoster fucked around with this message at 21:08 on Jul 28, 2019 |
# ¿ Jul 28, 2019 21:06 |
|
Thermopyle posted:I don't think there's really an industry-standard way. It kind of just depends on what your infrastructure is like. I have what I feel like is a stupid question...should I just make a function for each template then run it on AWS Lambda? Any real world experience as to why I'd I do that versus trying to figure out python-rq?
|
# ¿ Jul 29, 2019 01:47 |
|
Thermopyle posted:I've done both and both are pretty easy. Thanks for the encouragement. That’s a good point regarding the ip ranges. I have a hilarious amount of AWS credits, more than I can use before they expire. My current infrastructure is nothing literally all our data is on S3 or sharepoint. There’s one switch in our office for the Ethernet jacks in the walls.
|
# ¿ Jul 29, 2019 17:09 |
|
i vomit kittens posted:I'm having some trouble working with datetime in a small app I'm creating. I'm able to convert a datetime object to a string using strftime, but when I try to convert the exact same string back into a datetime using strptime in order to search a database for it, I'm told that the formatting I'm using is not valid even though I literally copy/pasted the format from the strftime function. Poster above me hit it. I deal with a lot of date strings gotten from a variety of web scraping and API calls. Working with datetime is a bitch in that use case. I've come around to prefer pendulum. Best part is much of the syntax is the same, its just a little easier to use when dealing with date strings from the wild. Solumin posted:Can I ask why you're converting a datetime object to a string and back? My use case for this is I get them from a bunch of formats and then I use those dates in one format to do things like send emails via boto3/Amazon SES that have the date included. CarForumPoster fucked around with this message at 16:16 on Aug 4, 2019 |
# ¿ Aug 4, 2019 16:14 |
|
unpacked robinhood posted:What's the correct way to keep track of a list of files on disk in a db, with irregular user chosen names ? I have this usecase and the thing that worked for me is sending the file to S3 or SharePoint and just storing the URL and whatever content I want in that row. EDIT: I should mention, we also replaced the filenames with UUIDs when storing because we immediately ran into collisions with filenames. CarForumPoster fucked around with this message at 02:04 on Aug 6, 2019 |
# ¿ Aug 6, 2019 00:40 |
|
unpacked robinhood posted:Thanks. Do you simply call uuidN() to generate a value and use it as filename ? import uuid job_id = str(uuid.uuid4()) Then we append the extension to the job_id.
|
# ¿ Aug 6, 2019 20:26 |
|
KICK BAMA KICK posted:From a while back, about task queues for a web app This is helpful, I'm starting down the path of a Django project now that will send up with workers calling APIs and I'm pretty new to web app development.
|
# ¿ Aug 9, 2019 13:49 |
|
EVIL Gibson posted:Mainly with GAN networks and running them in windows. If you install python normally, it needs to have a lot of environment settings set. Conda manages all of that and i can simply use one that works and clone it rather than hoping i changed all the settings correctly to use the python enviroment i freshly installed (problem with lots of applications both mac and pc where it leaves cruft behind ) I dont do it professionally but I recall setting up Tensorflow (GPU version) and getting Cuda to work right on Win 10 at that time (~1 year ago) required a shocking amount of work compared to what I was expecting. I generally prefer conda installing for exactly the reasons above and pip install only when a conda version isnt available.
|
# ¿ Aug 12, 2019 21:30 |
|
i vomit kittens posted:Does anyone have any recommendations for a good 2D drawing library? I'm using Pillow right now but it's kind of limited as far as shapes go. Aggdraw is pretty much exactly what I'm looking for, but it's Python 2 only. I know almost nothing about this but the free vector drawing software Inkscape seems to use python for extensions. There may be an interface there that allows you rather robust software with python.
|
# ¿ Sep 3, 2019 02:49 |
|
I'd like your experiences using jupyter notebooks for daily or every 30 min cron jobs. We have some web scraper and API calls that happen at a certain time every day run on an Amazon Workspace. We're about to refactor some important code related to this and are kinda half .py files and half .ipynb at this point. Generally the .ipynb files have the actual code a human would review and the .py files have scraper templates and 50+ helper functions in them. I really like the idea of using a ipynb for documentation. What do you guys do? Anyone ditched .py files for ipynbs? We just started down this road, any thoughts on automatically scraping 10+ websites per day using jupyter notebooks?
|
# ¿ Sep 7, 2019 22:22 |
|
Sad Panda posted:I'm writing a piece of software which will check the contents of a folder for files and display whether they are found or not. It will be used to check if students have submitted their work. IMO if you're doing this for students and make it good, and you're just their dumb ol teacher that built it, it'll probably get a few of them them interested in coding. I would accomplish this in using Dash. Dash uses python (Flask) to make pretty Dashboards relatively easily. I'd have a function that does as you say continuously and plots the output. There's lots of open source examples: https://dash-gallery.plotly.host/Portal/ And components, such as an upload button: https://dash.plot.ly/dash-core-components ...and even makes the HTML layout easy: https://dash.plot.ly/dash-html-components If you like bootstrap, there's also this: https://dash-bootstrap-components.opensource.faculty.ai/l/components CarForumPoster fucked around with this message at 13:18 on Sep 8, 2019 |
# ¿ Sep 8, 2019 13:15 |
|
Sad Panda posted:That looks super pretty, but seems to involve pip install and I still don't think I've got pip install privs. I was on a super locked down comp at a large defense contractor and could still pip install things. I had to use the proxy param. Can you browse to pypi.python.org in your browser?
|
# ¿ Sep 8, 2019 13:54 |
|
Dominoes posted:I don't get along well with Docker, and am looking for help on an open-source project. You might even be interested in using this later, if not now. What I'm specifically asking for help on is a proof-of-concept for getting official binaries hosted on python.org. I’d like to suggest amazon workspaces free tier for windows. Takes about 4 seconds to set up.
|
# ¿ Sep 10, 2019 03:42 |
|
Zerilan posted:Humble Bundle has a python bundle again, https://www.humblebundle.com/level-up-your-python. I'm starting to apply again to some PhD programs for numerical analysis and machine learning, and Python commonly comes up as a desired language. I'm pretty used to doing math stuff in MATLAB and C, but less so in Python. Admittedly I'm a couple years out of my masters' now and had to settle for a job outside my field so pretty rusty on what I do know. How useful would the books/tools in this bundle be to self-teaching myself some python compared to whatever free online (or cheaper books) I could find out there? IMO the best way to learn python is to find something you want to do and do it. There’s a stack overflow post or github pull request for literally everything.
|
# ¿ Sep 14, 2019 00:24 |
|
I can't think of a super easy way to do this and I bet there is one. I have a pandas df: code:
code:
CarForumPoster fucked around with this message at 14:19 on Sep 14, 2019 |
# ¿ Sep 14, 2019 14:16 |
|
SurgicalOntologist posted:There may be a shortcut or more clever way, but if you can assume that the list is the same length for all of them you can use the .str accessor to access list items (I never understood why this is is in the string accessor but it's handy). Melt looks like what I want, never thought to describe it as unpivoting Thank you!
|
# ¿ Sep 14, 2019 15:25 |
|
a foolish pianist posted:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.explode.html#pandas.DataFrame.explode This owns, exactly what I want, thanks! I love you thread, you're so much nicer to me than stack overflow.
|
# ¿ Sep 14, 2019 22:13 |
|
|
# ¿ May 14, 2024 15:10 |
|
I love this thread
|
# ¿ Sep 18, 2019 21:52 |