Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
12 rats tied together
Sep 7, 2006

We're using Airflow at my employer for that, which is way overkill for what you asked, but does what you need to in a reasonably robust way. I would recommend taking a look at it if you anticipate a huge growth in either the number of ETL jobs you are running or huge growth in job complexity such as inter-job dependencies, retry behaviour, conditional code paths, etc.

I pretty much set this thing up with a couple of demo jobs for examples of basic functionality (we are really big on EMR) and didn't touch it for 6 months and now we have like 70 something jobs running, the data science teams have loved it.

edit: link https://airflow.apache.org/

12 rats tied together fucked around with this message at 19:19 on Jun 5, 2020

Adbot
ADBOT LOVES YOU

12 rats tied together
Sep 7, 2006

It's even worse for me: "gee-unicorn"

edit: I was also saying "uwsgi" by pronouncing each letter name in sequence like it was some sort of organization name.

12 rats tied together
Sep 7, 2006

abelwingnut posted:

relatedly, has anyone used this: https://developers.google.com/sheets/api/quickstart/python

and is it decent?

I worked at a place where someone did a hack week project to run a rails app off of google sheets via active record, or something. They were calling it Spreadsheets as a Database (SaaD). It seemed to work decently and I have had a good impression of google sheets since.

12 rats tied together
Sep 7, 2006

You can pip install --user, or just normal pip install, but yeah you generally should not sudo pip install unless you intentionally set out to modify system python. It's almost always fine, until it randomly isn't, and it's just not worth the hassle of eventually having to deal with (especially if you don't particularly care about systems administration).

I like pyenv for managing python installs. The usual workflow is that you:

1. Start a new python project by creating a new directory.
2. Place a .python-version file in the directory with the contents of the version this project should use.
3. Navigate to the directory and "pyenv install" to install whatever version of python that is.
4. Use python normally from now on.

It probably doesn't work on windows, but I'd really suggest using WSL to do any sort of python development on windows anyway.

12 rats tied together
Sep 7, 2006

Just a general process thing, if you can coerce your events into a time series metric, this integration is much easier to configure in AWS and has some prebuilt policies and triggers you can set: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-as-policy.html

More specifically for your use case, I would suggest not mutating MaxSize. If you want to decrement or increment the number of instances in your asg, consider modifying desired only. Max can be left alone, managed manually, or set by some kind of ops team / billing team. The autoscaling service is smart enough in general to not launch 400 instances when you just want to temporarily scale from 22 to 23.

12 rats tied together
Sep 7, 2006

Jose Cuervo posted:

To be clear, this is a research data set which lives on a HIPAA compliant server. It is currently a bunch of excel files that were pulled by the IT team who run the health system databases. I would like to turn the excel files into a database so that things are much more organized and I can learn about databases. Hence why I was asking for help.

There's a pervasive attitude on this website and elsewhere throughout the internet that "important software" should not be written in anything dynamically typed. Usually no consideration is given to the awful paradigms/language sets that previous versions of "important software" have been written in.

12 rats tied together
Sep 7, 2006

It is often either convenience or performance based, performance for example in the case of uvloop, which is cython and written on top of libuv.

I also worked at a data science consulting firm where the r&d team would occasionally build some complex/performance bound functionality in C/C++/Matlab/Rust and then publish both Python and R bindings for it so that the teams involved with consulting engagements could benefit regardless of language chosen. The guy who pioneered this process at that org gave a talk about it, if you're interested.

12 rats tied together
Sep 7, 2006

the context manager is pretty easy and is a great python feature, you basically just define __enter__ and __exit__ methods on your class. __enter__ should return the thing that you want assigned as the result of your "as", __exit__ happens when your program leaves the scope of your "with". your __exit__ needs to have 4 params because the python runtime will pass those params in an exit block, they're for handling return codes and stuff like that

temp file access is gnarly which is probably why nobody posted a code sample. here's a really lovely one:

Python code:
import os
import subprocess  

class TempFile():
    def __init__(self, path):
        self.path=path

    def __enter__(self):
        # opened in 'w' mode to create if it doesn't exist already
        open(self.path, 'w')
        # opened again in 'r+b' mode so we can read/write it
        return open(self.path, 'r+b')

    def __exit__(self, exc_type, exc_value, traceback):
        os.remove(self.path)

with TempFile(path='./tmp') as tf:
    tf.write('xyz')
    tf.seek(0)
    # output returns the file contents as a list of lines, in this case, one element: ['xyz']
    print(tf.readlines())

    # output shows that the file './tmp' exists at the moment
    print(subprocess.call('ls'))

# output shows that the file './tmp' no longer exists
print(subprocess.call('ls'))

12 rats tied together
Sep 7, 2006

Sure thing, keep in mind that your OS might have a convention for temp files as well, for example /tmp on a linux/unix system is often a tmpfs which resides in virtual memory and doesn't survive reboots.

e:

OnceIWasAnOstrich posted:

You could also use the built-in tempfile TemporaryDirectory which comes pre-built as a context manager and move the files out of the directory once they are successfully written. This also has the advantage of automatically choosing a platform-dependent location by default.

this is badass and yeah you should totally use this instead of writing your own thing. python standard library is pretty good

12 rats tied together fucked around with this message at 20:54 on Aug 2, 2021

12 rats tied together
Sep 7, 2006

duck monster posted:

Its *almost* at the point where they'll cut the nonsense and just implement the drat anonymous blocks from Ruby, like they should have done a decade and a half ago.

To be honest I prefer the explicit __enter__ and __exit__ definitions to the decorator that converts a try/catch into something that can be context managed, but I have a probably unreasonable hate of try/catch in general. When I write python at work, it's usually for / around people who are not python developers, so I try to stick to code patterns that are either single line comprehensions that I can explain with a comment, or "c# without curly braces".

It would be cool to work somewhere I can assume that everyone knows what a decorator is.

eXXon posted:

Also, for what it's worth (not much), I checked the codebase I'm working on and found no usages of deque in Python code, though there were some std::deque usages in C++ code.
I use deque quite heavily in a personal project (an authoritative server for an online video game), but I will admit that I don't see it a lot in production business logic. I think you could probably teach deques as a decent stand-in for something like celery/sidekiq/faktory and students would have something that they can conceptually map to when working with network queues or pubsub systems in their jobs.

It seems like in general a queue is not particularly useful without an event loop, and most python courses I've seen never really get that far.

12 rats tied together
Sep 7, 2006

thats certainly a noble goal but do note that pep8 stresses consistency (and knowing when to be inconsistent) over any of its individual recommendations. you can camelCase if you want to and you'll be pep8 compliant so long as you are consistently camelCase

i set all my poo poo to 2 spaces instead of 4

e:

Loezi posted:

I mostly wanted to talk about deque to 1) contrast the performance of dequeue.appendleft(v) and list.insert(0, v) and 2) highlight the existence of many data structures they might not have thought of.
Cool, that's a good call. Getting students into the habit of checking the time complexity table on the wiki is a fantastic idea, I wish more people were aware of it

12 rats tied together fucked around with this message at 19:00 on Aug 5, 2021

12 rats tied together
Sep 7, 2006

CarForumPoster posted:

At least a few people who hire python peeps in this thread, me included, have said that camelCase var names for python coders is a red flag for bad code. Not a reason to not hire someone, but it will stick out like a sore thumb.

Agree, I wouldn't have mentioned it if OP didn't indicate that they are primarily a hobbyist. Submitting python code as part of a python tech interview to a python software shop with camelCase is not a great way to communicate "I will be pleasant to work with and productive in your ecosystem".

necrotic posted:

Use black.
yeah

12 rats tied together
Sep 7, 2006

there was one thing ruby got right, and it was spacing :twisted:

12 rats tied together
Sep 7, 2006

Python stdlib has str.casefold for this purpose, that way you don't have to care about which direction the casing goes.

12 rats tied together
Sep 7, 2006

for a lot of developers the "how a website works" knowledge domain stops at the edge of whatever the framework is, maybe it extends a little bit into the web server itself (iis, gunicorn, puma, tomcat, etc)

it's really hard to know everything but i recommend pushing this boundary whenever you can. in this case, the web browser makes an http request to your flask application. a lot of magic happens that turns the http request (which is just a big blob of text that arrives at a network address) into something interactable inside your python files/flask app

your app runs some logic and eventually does the same thing in reverse: constructs a big blob of text that represents an http response, and sends it back over the network.

because http works in this request/response fashion, chrome only speaks http, and your python only happens "after" the http part, there's no way to build a real-time web application in python.

javascript was created to be the "client side" scripting language that runs in chrome (or any other web browser). your flask app can include javascript in its response and your users will run that javascript interactively, so, you should build the synthesizer in javascript

that's an oversimplification in basically every dimension but i hope it provides some useful context

12 rats tied together
Sep 7, 2006

The March Hare posted:

Just you wait until I write a VST in python and throw it through it through a wasm compiler >: )

if you ever do this please let me know because i have an electron app that i would much rather be python than javascript

12 rats tied together
Sep 7, 2006

what youre trying to do will probably work exactly as is in python, which supports higher order functions/is a functional language, which means your functions can be map values etc. the syntax you have them in has the map key as a list of a single integer which is weird though

i think if you can fit your entire game into code neatly you should definitely do that and not read from the filesystem for every page turn

12 rats tied together
Sep 7, 2006

the usual python answer here tends to be supervisord, in my experience, which might also be a port of foreman, i have never checked

for my personal projects i have some tasks defined in vs code for starting or bumping services, in production i just use systemd though

12 rats tied together
Sep 7, 2006

I have never been super fond of enums in general, especially not how they work in python, but you can put them into a dataclass

Python code:
>>> from dataclasses import dataclass
>>> @dataclass
... class Foo:
...     BAR: str = "bar"
... 
>>> Foo.BAR
'bar'
this is handy if your enums are really just a deduplication of effort for typing in some constant values. since python doesn't really have constant values, you could also just cram them into a module, and then import it

Python code:
# foo.py
BAR = "bar"

# some_other_file.py
import foo

foo.BAR

12 rats tied together
Sep 7, 2006

The rule of thumb here is that you aren't writing the AWS API, so you shouldn't unit test it, it's fine to use a mock response. As part of upgrading boto3 versions you would check to see if your stored mock responses are invalid, which will be fairly obvious because the new boto3 release notes notes will have a section on breaking changes (if any).

Python specifically has the moto package for mocking AWS services in this manner.

12 rats tied together
Sep 7, 2006

The editable package install is super useful for something like pulumi, where in order to use the tool you'll have dozens of isolated python programs that get invoked by a CLI utility, but you'll also want some way of defining and importing abstractions into those programs. If you use an editable package you don't have to juggle relative paths to get to your shared code in each program, and changes to the programs and the shared code can also be tightly coupled in a monorepo, which saves you from tedious and error prone version bumps and staged releases.

mr_package posted:

If you have two parallel files [...]

I think you're correct about this, if you can consider something as being "parallel" to something else there are probably better facilities for handling it than putting it into its own module. I don't spend a lot of time working on super large codebases but IMHO namespaces are useful as far as that they provide a descriptive name, so you should use them exactly until the descriptive name stops being useful. I don't think there's anything inherently valuable about, for example, Online.FoodOrders.Delivery vs Online.FoodOrders.Pickup. You could just have them be OnlinePickupOrder and OnlineDeliveryOrder.

12 rats tied together
Sep 7, 2006

Lightbulb on moment for me was Sandi Metz' "Nothing is Something" talk which should be on youtube, and is also referenced throughout other links provided so far.

12 rats tied together
Sep 7, 2006

That's probably the point where I would start breaking out Roof into its own class and compose a BuildingReport from valid instances of Roof, Wall, Window, etc. PolyRoof and PanRoof are both reasonable to intuit as "specializations of Roof". Tracking what kind of roof is on the building in the name of the building report's class feels like it will result in subclass explosion almost immediately.

12 rats tied together
Sep 7, 2006

Epsilon Plus posted:

How should I handle a situation in which code wants to create an instance of a class, but shouldn't because it's being passed the wrong data?
<snip>

Let's say a really needs to be an integer - if we pass it something that can't be converted into an integer, I need to make sure an instance of SampleClass isn't created.

There are a lot of ways to tackle this, but probably the easiest would be
Python code:
def __init__(self, a: int, b: any, c: any):
    self.a = int(a)
[...]
which will raise a ValueError if whatever input can't be cast to int. You might want to write a try_cast_to_int() helper if you need more complicated logic than that.

If you have the time, it would be better to use something like https://github.com/python-desert/desert to generate serialization schemas and validate against that schema while constructing. I assume you're doing some kind of network stuff here as well, but if you're not, you can also use a static type checker to make sure that your codebase isn't passing anything that isn't an integer to this function.

Popete posted:

Question on how best to handle shared data between two python modules. I have a class that holds state information within a dict and I have 2 modules using this class to read and modify the dict data but right now they each have their own separate objects of this class so the state data within the dict isn't shared. I'd like there to be one shared object of this class that each module can access. What is the Pythonic way of doing this?
You can put it in a module and import it, just create a "shared.py" or something and then "from shared import mydict".

12 rats tied together
Sep 7, 2006

id use twisted, but it does (a lot) more than that, and its kinda heavyweight

12 rats tied together
Sep 7, 2006

another data structure you might find relevant to this assignment is the set

e: actually, this is quite wrong, but I'm leaving it up for your reference anyway. :)

12 rats tied together
Sep 7, 2006

If you have unique ids that would scream set difference to me, which is what you said yeah

edit: there's of course like 2 dozen other things that are wrong with it, but, i guess one of the nice things about python is that you can fall back to list index nonsense if you didnt bother to read any of the documentation

12 rats tied together fucked around with this message at 22:03 on Mar 11, 2022

12 rats tied together
Sep 7, 2006

given that the only (presented) desire is to produce the resulting set, I'd probably use a set comprehension which reduces it to one line:
Python code:
{gene for gene in GeneList.difference(Names)}
the control flow is still confusing to me though:
Python code:
    if GeneList[x] == Names[y]:

    elif GeneList[x] < Names[y]:
            GeneList2.append(GeneList[x])

    elif GeneList[x] > Names[y]:
Which is like:

- only if the items are not equal,
- and the GeneList item is smaller than(?) the Names item
- save it and return it

What possible application could this have? The first time we hit an item pair that fails the equality test we start either only walking GeneList or only walking Names, it seems like that goes out of sync instantly and then does absolutely nothing until it catches back up? why not just always walk both lists?

12 rats tied together
Sep 7, 2006

It's a fine language to use for a 3d engine but like most other things that are python but need to be performant, you have to put a lot of work into having as little of it execute in the interpreter as possible. Since Python is simultaneously so slow and so popular, a lot of stuff exists for this that you generally don't have to think about. For example, you can write Panda3D in python but most of your hot path runs as C++ or GLSL/Cg anyway.

12 rats tied together
Sep 7, 2006

IMHO the best resource for learning more about that concept in <15 minutes is to read the section on the stack vs the heap in the rust book's ownership chapter.

After that you can read the python documentation on memory management for some insight into at least one dimension of Python's slowness.

12 rats tied together
Sep 7, 2006

I would go for Python -> Java -> cython -> C++ if you still feel like it.

If you aren't already using it, pick up mypy or one of the other static type checkers for python and start getting used to it before learning Java.

For your actual question, C is different enough from C++ as to be actively harmful to your experience learning either after the other. The ++ is more of a generational leap thing than a simple increment.

12 rats tied together
Sep 7, 2006

you can use triple quoted strings as multi line comments, just don't have them also be docstrings. iirc the parser ignores them unless they are the first line in a class or module

12 rats tied together
Sep 7, 2006

e: adding quote since this is a new page

Falcon2001 posted:

So I have a bit of a best practices question. In the event that you have a function that mutates a data structure like a list/etc, is it best practice to return the mutated version, or to just make it clear in the docstring that the function mutates the object? For the sake of argument, let's assume that this isn't a library or other externally facing function and is pretty tightly bound business logic specific to a situation, where you're separating code out into functions to improve readability/etc.

code:
big_dict = retrieve_data()
remove_butts(big_dict)
vs
code:
big_dict = retrieve_data()
big_dict = remove_butts(big_dict)

IMHO you should return a copy of the original that is mutated. I wouldn't muck about in caller state without it being an explicit requirement, since otherwise it seems more fragile for no benefit.

Ideally "remove_butts" can be expressed directly in the caller state as a dict comprehension, or similar, and it does not need to be a function though.

12 rats tied together
Sep 7, 2006

QuarkJets posted:

If you have a bunch of grouped-together data that you're using over and over, then that sounds like a slam dunk case for a dataclass. I don't think "overkill" is the right way to describe this, it's likely going to make your code a lot simpler to read and write once you've defined the object. "I have these 5 values that a bunch of different functions need" is totally dataclass territory

absolutely agreed, because then you can also annotate your functions as accepting objects of that type, and then you can run one of the static type checkers to give you red squiggly lines if you ever screw that up

Sleepy Robot posted:

Is it best practice to have some python packages installed on the system python for programs that you don't really use in a project but instead use kind of like command line utilities? I mean rather than having them installed from a virtual environment. For example, I have yt-dlp installed for downloading youtube videos, or dte for doing date calculations on the fly. I just have these in my system python and it seems to be working ok so far.

I use a ton of python-based utilities at my job, which is being a janitor for computers, and the thing I usually do is have pyenv set a version inside of my "junk drawer" folder that all of my stuff uses. I never install anything to the system python on purpose.

12 rats tied together
Sep 7, 2006

Why not? That seems like fairly fundamental OOP to me.

I've worked on codebases where oops, the first kwarg of the most core behavior of the app was based on an assumption that stopped being true after 11 years, and now I get to push a PR that changes the signature of many dozens of functions across the entire codebase.

Would have been way easier if we were already accepting OurThing instances instead of what is essentially an unpacked version of OurThing in random scopes with no guarantee that they haven't been mutated in some way.

12 rats tied together
Sep 7, 2006

python has switch case now, but you can also use a dict of string -> function to handle that string

python calls it "match": https://docs.python.org/3/tutorial/controlflow.html#match-statements

12 rats tied together
Sep 7, 2006

If this logic will result in "the body of an email" I would probably use a jinja2 template for it, which would let you more cleanly express "this line of the email should contain this text, or be empty".

Using a templating tool to render a document, like an email, tends to result in more maintainable and community-understandable code in the long term, IME.

12 rats tied together
Sep 7, 2006

punk rebel ecks posted:

You lost me here. The tutorial has no "requirements.txt".

Also how do I run my python file from the virtual environment? Navigate to it's folder via the terminal?

I wish there was a way I could do all this from VSC. :(

Other people have been doing a good job of fielding these questions, but I wanted to try and provide a zoomed-out explanation for you real quick because virtualenvs are one of those things that are easiest to understand if you start at the bottom (IMO), but it's hard to find resources online that actually start at the bottom and don't presume some foreknowledge on your part.
  • In order to run python stuff, your computer needs to be able to know where "python.exe" is (assuming you're on windows).
  • Your python.exe is preconfigured to look for python-related files inside a certain set of directories, some more info on this can be found here, but you don't have to fully understand it.
  • One of the folders your python is configured to look in is called "site-packages". Some info here, but it's not all the info and you don't have to fully understand it either.
  • When you "pip install something", a folder named "something" will be downloaded to your site-packages folder.
  • When you "import something", python scans your site-packages folder for a folder named "something", and then tries to run an __init__.py that exists inside that folder.
A virtualenv is like if you made isolated copies of these site-packages folders and swapped them around behind the scenes. In order to "use" a virtualenv, then, you need to tell Python about it in some way, so it knows to look in one site-packages instead of another. On the shell you do this by "activating" it.

VS Code is slightly more complicated and your best bet will be to read the documentaton for the python extension, which explains how VS Code finds your environments and how it decides which one to activate for you.

You could also do it by hand instead, and move these directories around yourself, but that would suck and you'd probably want to script it.

12 rats tied together fucked around with this message at 20:22 on Apr 27, 2022

12 rats tied together
Sep 7, 2006

The square brackets are what python calls a "subscription". It's from a math thing that I can't fully explain but is probably available on the internet. The main implication, IMHO, is that the item in the subscript is a member of (exists in) the container.

Under the hood, when you call "[1]" on some python thing, you're (usually) calling thing.__getitem__(1)
Python code:
>>> l = [1,2,3]
>>> l[1]
2
>>> l.__getitem__(1)
2

>>> dir(l)
[ [...],  '__class_getitem__' [...], '__getitem__', [...]]
Sometimes you're calling .__class_getitem__, the documentation goes into more detail about when and why. The square brackets are a syntax sugar for calling a method on an object, just like you might call e.g. l.count() to get the count of a number of times an item appears the list.

Adbot
ADBOT LOVES YOU

12 rats tied together
Sep 7, 2006

Phone posting but you should try "for l in string_one, print l". The type error I imagine is because you're using a letter as a list index, but list indexes should only be integers.

e: poster above me said it much better :tipshat:

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply