|
Not having read the problem description or the replies (much), I'm going to and say that SQLAlchemy is the bee's knees. Basically, use SQLAlchemy Core, ignore the ORM exists, and mix and match textual SQL, reflection, transactions etc. to your heart's delight. This probably won't solve your problems (especially not if you actually have 200k columns and not rows -- for comparison, the maximum number of columns for a PostgreSQL table is ~1600), but it's nice software. This is also a great snype...
|
# ? Apr 6, 2020 22:43 |
|
|
# ? May 14, 2024 09:40 |
|
DoctorTristan posted:I can only give vague advice on this kind of vague info, but it sounds like you are trying to do something quite difficult and have a moderate-to-severe case of running before you can walk. Yeah this is definitely being popped out of the womb and being told to fly a spaceship. A lot of my time has been spent on stackexchange or deep in the confines of panda reference pages. To answer your question, 6AM me got it backwards, it's 200,000 long with 60 columns. Each row represents one call, and columns provide lat/long of the address they were going to, time the phone was picked up, time the unit was sent, time the unit got there, time the unit left, if the unit was canceled, in what priority order they arrived, and a bunch of other things. The dataframe represents the entirety of one year, as they want an analysis done on a yearly basis. The output ideally should be "during this year, here are the locations that had the most of these saturated incidents" and would most likely be tabular because my main gig is geospatial analysis so I can super easily just plunk that into ArcGIS and run with it. Thankfully, my internship folks prefaced giving me this project with "we definitely can't do it, and you don't have access to the VPN during quarantine, so maybe you might be able to try?" and they're optimistic in a final product but understand if I don't get too much done, what with not really knowing Python beyond the basics
|
# ? Apr 6, 2020 23:59 |
|
Anybody using rq -- I've got a few jobs that can get a few types of errors from an API they use and each might need to be handled differently (some jobs should just be requeued if the error looks like a transient "servers down" thing; another is scheduled to repeat anyway and can just wait till next time; and then there's errors that might be like bad credentials that should put everything on hold until I check the situation out.) I was going to attach a dict with instructions about how to handle each case to the metadata you can define on the job decorator, and evaluate that in the function you can designate as the job failure handler; just checking if there's a better strategy I'm overlooking cause this seems like a common situation somebody here might already have solved.
|
# ? Apr 7, 2020 00:13 |
|
CarForumPoster posted:You'll likely enjoy this part of the fast.ai deep learning course as they cover image localization/segmentation: Well the sound chip in my laptop took a poo poo, I guess, so I had to switch to the aux out on my DisplayPort monitor. Once I got that figured out yeah, it is a pretty good video and project. My boss has a bit of a hardon for TensorFlow unfortunately and I’m trying to convince him just to use PyTorch even for experimental purposes.
|
# ? Apr 7, 2020 00:46 |
|
KICK BAMA KICK posted:Anybody using rq -- I've got a few jobs that can get a few types of errors from an API they use and each might need to be handled differently (some jobs should just be requeued if the error looks like a transient "servers down" thing; another is scheduled to repeat anyway and can just wait till next time; and then there's errors that might be like bad credentials that should put everything on hold until I check the situation out.) https://python-rq.org/docs/exceptions/ Depending exactly on what your jobs look like, I'd consider using exception handlers for your workers.
|
# ? Apr 7, 2020 17:36 |
|
Thermopyle posted:https://python-rq.org/docs/exceptions/ Python code:
Python code:
|
# ? Apr 7, 2020 19:30 |
|
That seems fine. Another option is using different queues for the different types of jobs. Then you start a worker for a each type of queue and the worker has its own exception handling routine for the types of jobs it runs. This way you're not doing a bunch of branching and isinstance-ing in your handlers. Sorry if I'm not clear or not understanding your question very well. I'm in a funk today!
|
# ? Apr 7, 2020 21:22 |
|
Thermopyle posted:Sorry if I'm not clear or not understanding your question very well. I'm in a funk today!
|
# ? Apr 7, 2020 22:12 |
|
Spaced God posted:Can someone help point me in the direction of some good resources on using SQL in Python, or alternatively give some advice? I would fire up a docker image with the database of my choice, then connect to that using some python client. I've had really bad experiences with sqlalchemy and prefer db-native clients, but maybe thats just me.
|
# ? Apr 8, 2020 19:52 |
|
hhhmmm posted:I would fire up a docker image with the database of my choice, then connect to that using some python client. What were the experiences? I'm generally (and genuinely) curious about this. SQLAlchemy Core (go die in a ditch, ORM) can be really unobtrusive, and it can provide raw connections to the underlying DBAPI things (cursors etc.) if needed, e.g. for COPY etc.
|
# ? Apr 8, 2020 20:11 |
|
ORM's are good though.
|
# ? Apr 8, 2020 20:39 |
|
Thermopyle posted:ORM's are good though. For OLTP-type work, sure.
|
# ? Apr 8, 2020 20:44 |
ORMs are poo poo and sqlalchemy core is fuckin amazing. Two hills I've got headstones on.
|
|
# ? Apr 8, 2020 22:27 |
|
Last two projects I’ve been able to get away with using pandas as a orm and it’s been great. Going to be super annoyed the next time I need to do something more complicated.
|
# ? Apr 8, 2020 22:31 |
|
The last ORM I used was admittedly EF for .NET and all I remember is it being hell trying to get the EntityState set correctly for items that were modified. Though I think it was largely self-inflicted as we had some kind of object state of our own to manage as well. I don't see why they wouldn't be fine for smaller projects if you just have a few tables you need to grab data out of, but if I was starting a brand new, large scale app I'd probably go a different route. I've never heard of SQLAlchemy (Core or ORM) before and both look pretty dang solid though. As I transition into more Python based projects I should probably familiarize myself with things like that, using pymysql with cursors for everything is going to bite me in the rear end eventually. Macichne Leainig fucked around with this message at 22:36 on Apr 8, 2020 |
# ? Apr 8, 2020 22:34 |
|
SQLAlchemy Core + psycopg2 all day, everyday! (Works for Postgres and Redshift )
|
# ? Apr 8, 2020 23:09 |
|
Is there a way with type checking to enforce the signature on overridden implementations in subclasses?
|
# ? Apr 9, 2020 00:23 |
|
Protocol7 posted:As I transition into more Python based projects I should probably familiarize myself with things like that, using pymysql with cursors for everything is going to bite me in the rear end eventually. Been working on a Flask (+Flask-RESTful) project using Postgres with SQLAlchemy and Marshmallow recently and it's been pretty painless.
|
# ? Apr 9, 2020 00:40 |
|
Wallet posted:Been working on a Flask (+Flask-RESTful) project using Postgres with SQLAlchemy and Marshmallow recently and it's been pretty painless. I have the same flask setup but with Pymysql. It’s only like three endpoints so it’s nothing for now but I know it’s going to get larger. I’ll definitely check that out, thanks.
|
# ? Apr 9, 2020 01:02 |
|
suddenly got the urge to make some lovely algorithmic music. what is the absolute easiest python API out there for turning code into music? Presumably it would be some sort of MIDI controller thing, but I have no idea what to expect
|
# ? Apr 9, 2020 01:55 |
|
Every time I've pinned down the exact reason someone thinks ORMs are poo poo (which usually takes days of back and forth) it is because they're using the wrong tool for the job or just an ORM that is bad.
|
# ? Apr 9, 2020 04:28 |
|
Hmmmm I just use pyodbc and make SQL queries directly, what am I missing out on?
|
# ? Apr 9, 2020 06:02 |
|
punished milkman posted:suddenly got the urge to make some lovely algorithmic music. what is the absolute easiest python API out there for turning code into music? Presumably it would be some sort of MIDI controller thing, but I have no idea what to expect SuperCollider? Never tried it but heard good things. There was a recent Hacker News thread, I did a quick google and it seemed to have good Python bindings/modules/whatever. edit: thread was about CSound, but SuperCollider was mentioned in comments https://news.ycombinator.com/item?id=22787566 Another possibility that I think is under utilized is UVI Falcon supports Lua (well technically a subset of Lua they call 'uviscript'). I have to wonder what interesting things could be coded up there. It's designed to program instruments more than algorithmic music but still, gotta be some interesting potential there. Ditto KSP for Kontakt I suppose. Perhaps I'm wrong though, if it was worth doing people would be doing it probably. mr_package fucked around with this message at 06:09 on Apr 9, 2020 |
# ? Apr 9, 2020 06:07 |
|
Spaced God posted:Yeah this is definitely being popped out of the womb and being told to fly a spaceship. A lot of my time has been spent on stackexchange or deep in the confines of panda reference pages. It sounds like if you can identify (from lat/long co-ordinates) when a call was to out of area then you get a decent chunk of the way there? You could add an extra column ‘is_out_of_area’ to the dataframe then reuse that in later queries? One additional bit of advice I’d give is: don’t try to do too much in a single step, and don’t spend too long scouring the pandas docs hoping to find the one function that does exactly what you want. There’s no shame in just iterating over the dataframe row by row to do calculations, particularly if you’re new to both pandas and Python.
|
# ? Apr 9, 2020 09:53 |
Thermopyle posted:Every time I've pinned down the exact reason someone thinks ORMs are poo poo (which usually takes days of back and forth) it is because they're using the wrong tool for the job or just an ORM that is bad. The relational model is not the object model, regardless of how well you manage to fit the bits that look similar together with an ORM. I'm not arguing they are useless per se but the usefulness is limited particularly if you're working with a relational design that didn't grow in lockstep with whatever code you're using an ORM with now. Like using Google translate to have a conversation in French it will get you so far but for everything you need to do eventually you need to learn to speak some French. Best middleground I've found for this bit is sqlalchemy core as you still get useful tools without having to write queries and data structures by hand all the time (a problem ORMs tackled).
|
|
# ? Apr 9, 2020 11:20 |
|
Rocko Bonaparte posted:Is there a way with type checking to enforce the signature on overridden implementations in subclasses? mypy checks this: https://mypy.readthedocs.io/en/stable/class_basics.html#overriding-statically-typed-methods
|
# ? Apr 9, 2020 12:19 |
|
Hollow Talk posted:What were the experiences? I'm generally (and genuinely) curious about this. For postgres, very slow compared to using psycopg2 directly.
|
# ? Apr 9, 2020 12:22 |
|
hhhmmm posted:For postgres, very slow compared to using psycopg2 directly. If you haven't seen this already, try to specify additional options when you create your engine for "executemany": https://docs.sqlalchemy.org/en/13/dialects/postgresql.html#psycopg2-connect-arguments e.g.: code:
|
# ? Apr 9, 2020 12:34 |
|
SurgicalOntologist posted:mypy checks this: https://mypy.readthedocs.io/en/stable/class_basics.html#overriding-statically-typed-methods Ahh I thought I automatically had all this in PyCharm but it turns out I don't.
|
# ? Apr 9, 2020 17:41 |
|
You shouldn't use ORM's when the friction encountered because of the relational-OOP impedance mismatch is too high. This doesn't mean ORM's are bad, it means you, someone above you, or someone before you made a bad decision. Note that "too high" doesn't just mean "I had to think more about how to do this thing". You "just" need to choose the right tool for the job. Sometimes you may not know what the right tool will be so you might use sqlalchemy core. Sometimes you understand the problem space well enough that you can decide up front that some ORM will or will not meet your needs. In the general case, if you just default to Method X because you think it's a good middleground, then you're never using the best tool for the job. It's like a carpenter just taking a hammer to work because it can drive nails, pry boards, beat a hole in a wall, and scoop their potato salad for lunch. Of course, some people work on a narrow range of project types and they might never, or very rarely, benefit from an ORM or from not-an-ORM and it's just easier for them to default to whatever. IME, much of the time people say ORMs are bad they've just internalized how ORMs apply to the type of work they were doing when they formed their ORM opinion or they're parroting what their DB instructor told them in school or something like that. Some people then take this "lesson" they think they've learned about ORMs and build projects in a way or select projects that benefit from whatever they think the lesson taught them and this just reinforces what they think they know. Other people just work in a field that benefits from an ORM and then don't see the how people could not be using an ORM...or the opposite. People are very often not using an ORM when they should, or are are using an ORM when they shouldn't and everyone has Very Strong Opinions about the subject. This is a shame. Sometimes when you're using the not-ideal tool and have strong opinions about it, it prevents you from learning about, thinking about, and making the best of that tool. FWIW, with no scientific basis, no survey of my past projects, going completely by my gut impression of my career: 40% of projects I've been significantly involved in have been best served by an ORM. Rocko Bonaparte posted:Ahh I thought I automatically had all this in PyCharm but it turns out I don't. I usually run mypy in addition to using PyCharm's type checker. Neither mypy or PyCharm seem to be a subset of the other. Thermopyle fucked around with this message at 17:49 on Apr 9, 2020 |
# ? Apr 9, 2020 17:46 |
|
I decided to switch my project over from pymysql to SQLAlchemy ORM + marshmallow and holy poo poo that's a dream. The DB schema is just a few objects without much relation so I felt like an ORM would be appropriate, and I love it so far.
|
# ? Apr 9, 2020 18:20 |
|
edit: nvm
Hollow Talk fucked around with this message at 20:40 on Apr 9, 2020 |
# ? Apr 9, 2020 20:36 |
|
I am trying to parallelize some code I have. I have a dictionary with 5500 entries (but this number will grow to about 20000 entries later on), where each entry is a pandas series that is 288 rows in length. I have to run a function Y on each combination of entries in the dictionary, where Y computes the euclidean distance between the two series and is a fairly computationally inexpensive task. However, even with 5500 entries this comes out to approximately 15 million calls to function Y and I would like to parallelize it. I am looking into using joblib to make use of all 6 cores on my computer and I am wondering if 1. The best way to do this is to split the list of combinations into 6 approximately equal sublists, and then use joblib to run the computation for each list on a single core, and 2. Is it possible to have a single copy of the dictionary that can be accessed by each process, rather than having six copies of this large dictionary?
|
# ? Apr 9, 2020 21:54 |
|
Don't try to optimize the scheduling yourself, eg by creating N sublists. Libraries like multiprocessing and joblib can handle that for you; you provide one iterable of data that needs processing, and let the library deal with giving the data to the workers as they become available I'm not familiar with joblib, but it looks like it's built on top of multiprocessing, and multiprocessing has a Manager class that can share things like dicts between processes.
|
# ? Apr 9, 2020 22:57 |
|
Splitting up the work that way should be fairly ideal in terms of speed-up. Number 2 might be possible but I wouldn’t count on it being simple if it is. You would have a much better time if those series were stored in an array like a pandas dataframe or just a plain numpy 2d array, even if just for the computation you’re doing. I’d also recommend looking into using Numba for what you’re doing instead of joblib. It’s a JIT compiler that also handles threading and works particularly well with numpy arrays.
|
# ? Apr 9, 2020 23:29 |
|
Unimportant question time: Now and then I have a file with a lot of things that aren't broken into clearly defined classes, functions, etc—stuff like config values or dictionaries of error definitions. I try to keep those values organized in a sane way and I usually put little signpost comments in there: Python code:
I've seen people do this, but it every time I see it I feel like I've stepped into a readme from 1995: Python code:
Python code:
Python code:
|
# ? Apr 10, 2020 18:27 |
Split it into different dicts, I think -code:
|
|
# ? Apr 10, 2020 18:49 |
|
Python code:
Dominoes fucked around with this message at 20:33 on Apr 10, 2020 |
# ? Apr 10, 2020 20:22 |
|
Dominoes posted:This is a case where I wish Python would allow enums to hold data. I mean they can with mixin classes E: oh real Union types ... nope
|
# ? Apr 10, 2020 20:29 |
|
|
# ? May 14, 2024 09:40 |
|
Ninja-edited since I missed the point.
Dominoes fucked around with this message at 20:33 on Apr 10, 2020 |
# ? Apr 10, 2020 20:29 |