|
cinci zoo sniper posted:Yeah NumPy will internally route stuff to C or Fortran code to speed things up, including hardware-specific optimisations. Pandas does the same twofold, since it has both its own C stuff and is largely built on top of NumPy (wrt data structures and such). This doesn't help for constructs like apply though, since they need to invoke a python function (and thus interact with the interpreter) for each element of whatever you are applying to.
|
# ? Jun 10, 2018 19:25 |
|
|
# ? May 30, 2024 13:48 |
Nippashish posted:This doesn't help for constructs like apply though, since they need to invoke a python function (and thus interact with the interpreter) for each element of whatever you are applying to. Apply() is slightly more complicated than that. The method itself will evaluate what is happening inside the function, and correspondingly execute it in "normal Python environment" or in "Cython environment". The former will be a standard for loop with a function call overhead, the latter will be a slightly slower than a directly vectorised operation - it all is fairly dependant on your lambda function and the axis choice. The only hard rule here is that iterrows() is bad, since on to of every other overhead you'll also incur Pandas casting each row into a Series.
|
|
# ? Jun 10, 2018 20:02 |
|
Oh yeah my bad, the count function is the best way to do that infinite thing - loops indefinitely and hands you incrementing numbers, what more could you want
|
# ? Jun 10, 2018 20:34 |
|
cinci zoo sniper posted:Apply() is slightly more complicated than that. The method itself will evaluate what is happening inside the function, and correspondingly execute it in "normal Python environment" or in "Cython environment". The former will be a standard for loop with a function call overhead, the latter will be a slightly slower than a directly vectorised operation - it all is fairly dependant on your lambda function and the axis choice. The only hard rule here is that iterrows() is bad, since on to of every other overhead you'll also incur Pandas casting each row into a Series. Yeah, good point about apply. I think it evaluates the function on the first element more than once, in order to figure out the fastest way to execute it. As a result it might not be the best choice with a function with side effects, like the qr code example. It shouldn't be an issue on most filesystems, but nevertheless that might be a reason to go with iterrows in this case. Edit: or itertuples SurgicalOntologist fucked around with this message at 05:08 on Jun 11, 2018 |
# ? Jun 11, 2018 01:24 |
|
This was really helpful
|
# ? Jun 11, 2018 03:12 |
|
Is there a pycharm keyboard shortcut to turncode:
code:
|
# ? Jun 11, 2018 12:37 |
Boris Galerkin posted:Is there a pycharm keyboard shortcut to turn I’ll need to check but you may be able to coax code style guidelines into disrespecting that and then just give it a pass with code for matter (CTRL+ALT+L by default). E: I did spin up PyCharm at work and I don't think you can do this in a manner more straightforward than writing a corresponding code template, if any at all (excluding some plugin or external code formatter). cinci zoo sniper fucked around with this message at 15:38 on Jun 11, 2018 |
|
# ? Jun 11, 2018 13:05 |
|
I'm trying to get some plotly scatter plots to give different colors to the points based on the value in some column but I'm running into issues. The example code I'm trying to imitate is as follows:code:
code:
code:
|
# ? Jun 11, 2018 21:30 |
You should have a layout for each data group, if I remember correctly.
|
|
# ? Jun 11, 2018 21:52 |
|
So can I wrap the whole thing in the for loop? I'm basically unsure on how the syntax should work with incorporating the loop with the scatter() function call.
|
# ? Jun 11, 2018 22:35 |
bamhand posted:So can I wrap the whole thing in the for loop? I'm basically unsure on how the syntax should work with incorporating the loop with the scatter() function call. I would first try to see if you can make it work with generator expression in layout section, same one you have in data. If you wrap your entire return block inside one, you’ll get 4 single plot figures, rather than 1 figure with 4 subplots that you seem to want.
|
|
# ? Jun 11, 2018 22:51 |
|
No dice, I tried enclosing stuff inside the for loop in a couple different ways and it didn't work. Also I've never seen a for loop structured in that way before. Is it a special case? It doesn't work for something like:code:
e: Wait I found it! You create a list of traces and then have a for loop append to the list. That makes much more sense to me than the example I had in my original post. code:
bamhand fucked around with this message at 02:37 on Jun 12, 2018 |
# ? Jun 12, 2018 01:32 |
|
Is there a construct or idiom in SQLAlchemy for "fetch an instance with these attributes; if one doesn't exist, create it"? I find myself needing this a lot. I'll write one for my own uses but maybe it already exists. Edit: https://bitbucket.org/zzzeek/sqlalchemy/issues/3942/add-a-way-to-fetch-a-model-or-create-one So let me revise my question: anyone know of a third-party utility library that provides this? SurgicalOntologist fucked around with this message at 02:58 on Jun 12, 2018 |
# ? Jun 12, 2018 02:33 |
bamhand posted:No dice, I tried enclosing stuff inside the for loop in a couple different ways and it didn't work. Also I've never seen a for loop structured in that way before. Is it a special case? It doesn't work for something like: Rereading now with a fresh head - your original example should’ve worked without {} around go.Scatter. And generator expressions are pretty standard stuff - [a for b in c] is a syntactic shortcut for for b in c: [].append(a)
|
|
# ? Jun 12, 2018 05:17 |
|
SurgicalOntologist posted:Is there a construct or idiom in SQLAlchemy for "fetch an instance with these attributes; if one doesn't exist, create it"? I find myself needing this a lot. I'll write one for my own uses but maybe it already exists. There's a whole bunch of different ways to do this in different database implementations. Off the top of my head, I can think of at least these variants: 1. INSERT ... ON DUPLICATE KEY IGNORE (MySQL/MariaDB, primitive and rarely useful) 2. INSERT ... ON CONFLICT DO NOTHING (Postgres only, I think) 3. MERGE INTO ... WHEN NOT MATCHED BY TARGET THEN INSERT (standard SQL 2003, implemented in MSSQL and Oracle) 4. Two separate queries (select and then insert) in a transaction (dangerous, hard to get right) Then you get all the other variants if you want to update the potentially existing target row ("upsert"). All of this has a bunch of gotchas - this SO question has a good literature overview.
|
# ? Jun 12, 2018 15:25 |
|
I need to mentor people of wildly different ages, skills and backgrounds on a language new to them (in this case, Python). I know most of what I'm going to do, but I am looking for a free "coding test" style website for them to run exercises against. For privacy reasons I won't go into, I'd prefer if the site does not require a login (one strike against HackerRank). http://codingbat.com/python seems like what I want: it's lightweight, does the checking of answers on the fly in the browser, is free, and doesn't require a login. But on the downside, its examples are not very extensive. Does anyone know a better resource / website for what I need? Even better, perhaps, would be a set of ipython notebooks that could be run without an internet connection. So, they'd need to set up the problems, and have the answer available (but hidden), so the user could check that their code is producing the correct answer.
|
# ? Jun 13, 2018 02:18 |
|
pmchem posted:I need to mentor people of wildly different ages, skills and backgrounds on a language new to them (in this case, Python). Project Euler is pretty good. Only need a username and password to keep track of progress. Not sure if that's quite what you're looking for. Questions increase in difficulty so people could start at different points.
|
# ? Jun 13, 2018 02:43 |
|
Project Euler focuses more on the math/stats side of things rather than teach coding so really depends on what you're trying to teach them. Codewars is more oriented towards the coding side of things.
|
# ? Jun 13, 2018 02:47 |
|
huhu posted:Project Euler is pretty good. Only need a username and password to keep track of progress. Not sure if that's quite what you're looking for. Questions increase in difficulty so people could start at different points. yeah, I've already got a few of those picked out. Euler and codingbat were my starting points. After continuing to search, I just found this group of ipython nb's: https://github.com/donnemartin/interactive-coding-challenges It's pretty much spot-on for what I wanted. Even comes with unit tests in challenge notebooks, and separate solution notebooks. I'll pick and choose the appropriate challenges for my people. Donne also has these nb's, https://github.com/donnemartin/data-science-ipython-notebooks , which are also nice.
|
# ? Jun 13, 2018 03:03 |
|
I have a numpy.ndarray of coordinate points, i.e.,code:
e: Figured out the problem with argsort: I didn't check for absolute values of y. This works now (from this SO post) code:
Boris Galerkin fucked around with this message at 16:34 on Jun 13, 2018 |
# ? Jun 13, 2018 16:16 |
|
Boris Galerkin posted:I have a numpy.ndarray of coordinate points, i.e., Do you mean the closest points to y == 0? Because in your example there are no points with y == 0...
|
# ? Jun 13, 2018 16:32 |
|
Am I missing something really stupid here, I'm kind of muddling my way through python right now. Thanks for everyone's help so far by the way. ValueError: time data '01-Jun-09' does not match format '"%d-%b-%y"' e: Nevermind it was the quotes. I am bad at this.
|
# ? Jun 13, 2018 16:34 |
|
Boris Galerkin posted:vvv Yeah that's what I meant. vvv Do you want the closest indices to y == 0 where z == 0? Or do you want the closest indices to (y, z) == (0, 0)?
|
# ? Jun 13, 2018 16:54 |
|
Is there a way in PyCharm to group unit tests? I think this is a thing in ReSharper but it's not in the PyCharm unit test runner. I'm trying to debug some issues in an open source project, and it takes an hour to rerun all the 14,000+ tests. I just have 14 failures to thumb through scattered across different sections. Heck, it's frustrating to just click on one and lose the state of all the others. My fallback is to try to create a special unit test class that just references the 14 tests I need to run, but the test bench for this project is kind of ridiculous and I anticipate it being troublesome.
|
# ? Jun 18, 2018 05:40 |
|
Doesn't the Python version have a "Rerun all failed" button in the test window?
|
# ? Jun 18, 2018 06:01 |
|
Rocko Bonaparte posted:Is there a way in PyCharm to group unit tests? I think this is a thing in ReSharper but it's not in the PyCharm unit test runner. I'm trying to debug some issues in an open source project, and it takes an hour to rerun all the 14,000+ tests. I just have 14 failures to thumb through scattered across different sections. I'm not at a PC to tell you exactly what to click on, but you can run just failed tests and even better it autosaves the last X test runs so you can always open up whichever recent run you want. You can also manually save a test run. It's all in the buttons in the tests tool window...
|
# ? Jun 18, 2018 06:54 |
|
None of that seems to persist if I close PyCharm. I want something that will survive reboots and such. This sounds dumb but I am new to the source's test environment and expect this to be pretty rough until then; it is something I am looking over at home when I can.
|
# ? Jun 18, 2018 07:04 |
Rocko Bonaparte posted:None of that seems to persist if I close PyCharm. I want something that will survive reboots and such. This sounds dumb but I am new to the source's test environment and expect this to be pretty rough until then; it is something I am looking over at home when I can. You can export test results to file and import them in new session.
|
|
# ? Jun 18, 2018 07:21 |
|
I didnt know that about PyCharm, thanks thread!
|
# ? Jun 19, 2018 02:59 |
|
Oh poo poo I don't even have to explicit export them. There's just a pile of them in "Run->Import Test Results."
|
# ? Jun 19, 2018 05:33 |
|
Yeah, that's what I was talking about.
|
# ? Jun 19, 2018 06:29 |
Rocko Bonaparte posted:Oh poo poo I don't even have to explicit export them. There's just a pile of them in "Run->Import Test Results." Yeah they are stored automatically in IDEs memory, in some capacity.
|
|
# ? Jun 19, 2018 06:56 |
|
Are there any good packages for using Python to connect to S3? Right now I’m using Boto.
|
# ? Jun 19, 2018 15:00 |
|
Boto is all I've ever used. It's not great, but it works. Also, pipenv support is coming to PyCharm in 183.155.
|
# ? Jun 19, 2018 17:44 |
|
So fstrings are really cool but restricts me to 3.6+. Is there a way to use fstrings with 3.5+?
|
# ? Jun 20, 2018 08:19 |
|
https://github.com/asottile/future-fstrings/blob/master/README.md ?
|
# ? Jun 20, 2018 14:09 |
|
Holy poo poo that is amazing. Thanks.
|
# ? Jun 20, 2018 17:58 |
|
I've written a module that interfaces with a database, and for 'simplicity of interface' I wrote most functions to return a list even if we expect a single result, because most functions do return more than one thing (or could). e.g. both 'get_item_from_queue' and 'get_queue' return a list of dictionaries even though 'get_item_from_queue' will always return a single item because the SQL behind it is literally SELECT TOP 1. Doing some refactoring now, and looking at this it seems like hobgoblin consistency. But I can see the rationale. Especially if the design ever changes. I dunno-- would get_item_from_queue ever really be expected to return more than one thing? Probably not. Opinions?
mr_package fucked around with this message at 22:04 on Jun 21, 2018 |
# ? Jun 21, 2018 21:59 |
mr_package posted:I've written a module that interfaces with a database, and for 'simplicity of interface' I wrote most functions to return a list even if we expect a single result, because most functions do return more than one thing (or could). e.g. both 'get_item_from_queue' and 'get_queue' return a list of dictionaries even though 'get_item_from_queue' will always return a single item because the SQL behind it is literally SELECT TOP 1. Doing some refactoring now, and looking at this it seems like hobgoblin consistency. But I can see the rationale. Especially if the design ever changes. I dunno-- would get_item_from_queue ever really be expected to return more than one thing? Probably not. Opinions? In my experience, this is a useful thing to do for data retrieval or parsing. My context usually involves large and complex JSON or XML documents, but I still basically force everything to lists since then you write and test just one set of operations.
|
|
# ? Jun 21, 2018 22:15 |
|
|
# ? May 30, 2024 13:48 |
|
Based on the name I would think not, but the name could be more clear with like get_top_item_from_queue. Always returning a list for even single item gets seems annoying as hell from a usage perspective. Don't make me destructure/unwrap every time I use a method.
|
# ? Jun 21, 2018 22:21 |