Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
D34THROW
Jan 29, 2012

RETAIL RETAIL LISTEN TO ME BITCH ABOUT RETAIL
:rant:
:bang: Now I'm really loving confused. I've got the following code. The only possibilities for self.end_left and self.end_right are EndTypes.HOST and EndTypes.WALL.

Python code:
	# Both ends are a host.
        if self.end_left == EndTypes.HOST and self.end_right == EndTypes.HOST:
            post_openings = self.posts + 1
        # One end is a host, the other is a glass wall.
        elif (
            self.end_left == EndTypes.HOST and self.end_left == EndTypes.WALL
        ) or (
            self.end_left == EndTypes.WALL and self.end_right == EndTypes.HOST
        ):
            post_openings = self.posts
        # Both ends are glass walls.
        elif (
            self.end_left == EndTypes.WALL and self.end_right == EndTypes.WALL
        ):
            post_openings = self.posts - 1
        else:
            print(type(self.end_left), type(self.end_right), "wat")
Attempting to run results in:
Python code:
>>> EndTypes.HOST EndTypes.WALL wat 	# The previous iteration of the print() statement
>>> <enum 'EndTypes'> <enum 'EndTypes'> wat	# The current iteration fo the print statement
Apparently the if-elif is completely missing that self.end_left is EndTypes.HOST and self.end_right is EndTypes.WALL? How the gently caress is this not being picked up?

EDIT: I threw the following in the print() statement and now I'm even more confused.
Python code:
print(type(self.end_left), type(self.end_right), self.end_left == EndTypes.HOST, 
	self.end_right == EndTypes.WALL, self.end_left == EndTypes.HOST and self.end_right == EndTypes.WALL, "wat",)

>>> <enum 'EndTypes'> <enum 'EndTypes'> True True True wat
By this logic, the middle elif should be picking this up on the first condition of the or statement...?

D34THROW fucked around with this message at 19:17 on Mar 31, 2022

Adbot
ADBOT LOVES YOU

Deffon
Mar 28, 2010

The first elif checks end_left twice.

This is always false:
code:
self.end_left == EndTypes.HOST and self.end_left == EndTypes.WALL

D34THROW
Jan 29, 2012

RETAIL RETAIL LISTEN TO ME BITCH ABOUT RETAIL
:rant:

Deffon posted:

The first elif checks end_left twice.

This is always false:
code:
self.end_left == EndTypes.HOST and self.end_left == EndTypes.WALL

I found it right before I checked the thread :suicide:

All works as intended now! Everything pretty-prints for debug and the debug JSON dump works properly. All quantities are as expected given the input parameters. This was the toughest bit so far, a lot of loving logic goes into determining how the wall is built from the parameters.

pre:
╒═════════════════════════════════════════╤════════╕
│ ITEM                                    │    QTY │
╞═════════════════════════════════════════╪════════╡
│ ANGLE 2 X 2 X 1/8 (EA)                  │   0.75 │
├─────────────────────────────────────────┼────────┤
│ ANGLE CLIP (EA)                         │  41.00 │
├─────────────────────────────────────────┼────────┤
     ...
├─────────────────────────────────────────┼────────┤
│ TEK 12 X 3/4 (EA)                       │ 175.00 │
├─────────────────────────────────────────┼────────┤
│ TEK 14 X 3/4 (EA)                       │ 150.00 │
╘═════════════════════════════════════════╧════════╛

╒════════════════════════════════════════════╤═══════╕
│ LABOR TYPE                                 │   QTY │
╞════════════════════════════════════════════╪═══════╡
│ FRAME OUT GLASS ROOM / 1ST CHAIR RAIL (LF) │    21 │
├────────────────────────────────────────────┼───────┤
│ INSTALL GLASS ROOM DOOR (EA)               │     1 │
├────────────────────────────────────────────┼───────┤
│ INSTALL GLASS ROOM WINDOW (EA)             │    11 │
├────────────────────────────────────────────┼───────┤
│ INSTALL INSULATION/FOAM <= 24"             │    21 │
╘════════════════════════════════════════════╧═══════╛
I can't help but :neckbeard: every time I commit a new working feature.

ExcessBLarg!
Sep 1, 2001

Deffon posted:

The first elif checks end_left twice.
This is a perfect use case for the match statement added in Python 3.10, which avoids these kinds of errors by having the variables-to-match specified once.

Bit of a shame it took Python so long to gain such a basic feature, but there it is.

D34THROW
Jan 29, 2012

RETAIL RETAIL LISTEN TO ME BITCH ABOUT RETAIL
:rant:
I know pickling is a security risk, but it seems that that's only when accepting pickles from unknown sources. If I am only unpickling objects that are pickled by the program itself, is that less of a security option? I'm talking pickle.dumps() stored directly in the DB.

The reason I'm asking is because I have some objects that are container objects - e.g. GlassRoom which contains one or more GlassWall objects in a dict's values. I don't think I can serialize those to JSON, unless...let me see if I can think this out here in pseudocode. Both GlassRoom and GlassWall include a CalculatorMixin class which provides methods to_dict() and from_dict() to dump/restore from the object's __dict__.

Python code:
class GlassRoom(CalculatorMixin):
	...
	def serialize(self) -> str:
		for k, v in self.line_items.items():
			self.line_items[k] = v.to_dict()
		return self.to_dict()

	def unserialize(self, json_to_load: str) -> None:
		for k, v in self.line_items.items():
			self.line_items[k] = GlassWall.from_dict(v)
		self.from_dict(json.loads(json_to_load))
Or something like that?

ExcessBLarg!
Sep 1, 2001

D34THROW posted:

I know pickling is a security risk, but it seems that that's only when accepting pickles from unknown sources. If I am only unpickling objects that are pickled by the program itself, is that less of a security option? I'm talking pickle.dumps() stored directly in the DB.
The security concern around picking is fairly-straightforward: if the data is untrusted or potentially modified, then unpickling can result in arbitrary code execution. But if the pickled data is trusted and not plausibly modified, then it's fine. There's not some other nebulous security issues that arise by pickling itself.

I'm a bit more concerned about why you're shoving pickled objects into a database blob column in the first place.

Alternatively you could manually serialize/deserialize your nested objects into JSON as you suggest (though JSON is only marginally better than blob data in an relational database context).

A third option would be to use YAML, which supports arbitrary objects but is still human readable. If you set things up correctly you can use the safe_load function to support your classes but not load arbitrary objects.

D34THROW
Jan 29, 2012

RETAIL RETAIL LISTEN TO ME BITCH ABOUT RETAIL
:rant:

ExcessBLarg! posted:

I'm a bit more concerned about why you're shoving pickled objects into a database blob column in the first place.

I'm starting to wonder that myself. I specifically created a table storm_panel_lines that contains the information necessary to recreate a storm panel line item, which is FK'd to a storm_panel_order record (or something like that). Could do the same with a glass_room table that's FK'd to a job and then a glass_wall table that's FK'd to the glass_room record.

How could I like...store an enum in a DB column though? Store it as enum.value and restore it by enum(value)?

I think I'm gonna take this back to the drawing board and see where I can work with this. What I did with StormPanelOpening is have the __init__ function take a StormPanelLine object (from the ORM) as an argument, which it then pulls necessary data from.

I do believe I completely forgot about this methodology. Ideally, I need to take several calculators back and refactor them now. :suicide:

D34THROW fucked around with this message at 18:36 on Apr 1, 2022

Foxfire_
Nov 8, 2010

There is no expectation that anything pickled is unpicklable when the version of anything changes. It might work, might give you an error, and might silently give you corrupted data. Pickles are not suitable for a nontransient serialized format

ExcessBLarg!
Sep 1, 2001

Foxfire_ posted:

There is no expectation that anything pickled is unpicklable when the version of anything changes.
This is an unhelpfully vague statement. The entire premise of the pickle module (vice marshal) is that protocol changes are versioned and it's guaranteed to have compatibility across python versions.

Sure, if your user data objects contain third-party objects nested within, and the schema of those objects suddenly changes, then your pickled objects are now broken. But if your objects consist entirely of your own defined user objects and basic types, you have to go out of your way to break compatibility.

QuarkJets
Sep 8, 2008

ExcessBLarg! posted:

This is an unhelpfully vague statement. The entire premise of the pickle module (vice marshal) is that protocol changes are versioned and it's guaranteed to have compatibility across python versions.

Sure, if your user data objects contain third-party objects nested within, and the schema of those objects suddenly changes, then your pickled objects are now broken. But if your objects consist entirely of your own defined user objects and basic types, you have to go out of your way to break compatibility.

That second part is what they're getting at. If you rely on pickling it's extremely easy to needlessly lock yourself into a specific frozen environment. So it's generally fine if your pickled data is transient but insuitable for something like a database column, unless the database is itself transient

QuarkJets fucked around with this message at 20:44 on Apr 1, 2022

ExcessBLarg!
Sep 1, 2001
Yeah I'm not an advocate of throwing blob data into a database, but the schema of his objects may well be frozen. That's not an unreasonable assumption.

D34THROW
Jan 29, 2012

RETAIL RETAIL LISTEN TO ME BITCH ABOUT RETAIL
:rant:
I'm doing everything from the ground up on this. I just refactored GlassWall to accept an ORM object as an argument as opposed to individual arguments and I'm now working on doing the same for the roof module.

QuarkJets
Sep 8, 2008

ExcessBLarg! posted:

Yeah I'm not an advocate of throwing blob data into a database, but the schema of his objects may well be frozen. That's not an unreasonable assumption.

It's frozen so far

Foxfire_
Nov 8, 2010

The pickle protocol may be backwards compatible (that's broken a couple times in Python history, but at least those were considered bugs), but your likelihood of getting usable data back from a pickle that's sat somewhere while a couple years of people and documentation turnover went by isn't that great unless you're doing something like building a manual "convert to/from to a dict of language primitives" pre serialization step. And at that point you might as well be using either a self-describing serialization format, or a binary format that's less 'do everything' but better at the things your application cares about specifically.

(mostly I'm bitter about scikit having no serialization format besides pickles and having to deal with people's broken files from some unknown past version)

D34THROW
Jan 29, 2012

RETAIL RETAIL LISTEN TO ME BITCH ABOUT RETAIL
:rant:
I'm stuck on Win7 for dev. No 3.10 for me :qq:

To

EDIT: To be clear, im not pickling. I'm preserving via tables. In between refactoring my documentation so I can remember what poo poo does. And building a better command line suite of management tools instead of fiddling with cli <arg> at the command line.

D34THROW fucked around with this message at 22:13 on Apr 2, 2022

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug
Speaking of pickling:

We have a (currently) single threaded job that runs regularly and operates as a pipeline where it starts by gathering data from multiple sources, and then merges and transforms those into a composite file we use for the rest of whatever job is running.

I'm looking into ways to make this more flexible, so I'm debating setting up separate jobs and setting up a 'cache' by pickling the composite at the end of merge and then storing it for reuse, and then regenerating it whenever the stored data is more than 30 minutes old. This would cut about 60-75 seconds off of each of our jobs (minus the whole unpickling process).

Is pickle a reasonable tool for this use case or should I be considering something else? The data is not easily serializable to json since it's fairly complex and includes a lot of custom data types.

QuarkJets
Sep 8, 2008

Falcon2001 posted:

Speaking of pickling:

We have a (currently) single threaded job that runs regularly and operates as a pipeline where it starts by gathering data from multiple sources, and then merges and transforms those into a composite file we use for the rest of whatever job is running.

I'm looking into ways to make this more flexible, so I'm debating setting up separate jobs and setting up a 'cache' by pickling the composite at the end of merge and then storing it for reuse, and then regenerating it whenever the stored data is more than 30 minutes old. This would cut about 60-75 seconds off of each of our jobs (minus the whole unpickling process).

Is pickle a reasonable tool for this use case or should I be considering something else? The data is not easily serializable to json since it's fairly complex and includes a lot of custom data types.

Make it a continuous process and then use lru_cache, it's a functools decorator:
https://www.geeksforgeeks.org/python-functools-lru_cache/

ExcessBLarg!
Sep 1, 2001

Falcon2001 posted:

Is pickle a reasonable tool for this use case or should I be considering something else?
I'd give it a try and if it works fine, you're unlikely to run into much trouble later with this use case.

Hughmoris
Apr 21, 2007
Let's go to the abyss!
I'm poking about some Wordle analysis...

I have a dataframe with one column, full of 5-letters words. Is there a pythonic way to split that word into 5 additional columns, each containing a letter of the word?

End result being a dataframe with the columns: FiveLetterWord | Character1 | Character2 | Character3 | Character4 | Character5

*Edit: This ended up working but it ain't pretty:
Python code:
df['c1'], df['c2'], df['c3'], df['c4'], df['c5'] = df['raw_words'].map(lambda x: list(x)).str

Hughmoris fucked around with this message at 02:28 on Apr 3, 2022

D34THROW
Jan 29, 2012

RETAIL RETAIL LISTEN TO ME BITCH ABOUT RETAIL
:rant:
Phoneposting, but list comprehension? I would do it on dataframe creation maybe, char for char in word or something like that.

Id fiddle with it if i was at my computer.

lazerwolf
Dec 22, 2009

Orange and Black
I believe you can do something like

code:
df[[col1, col2, col3, col4, col5]] = df.column_name.str.split(expand=True)
And that will return as many new columns as items in the split.

Check docs https://pandas.pydata.org/docs/reference/api/pandas.Series.str.split.html

Hughmoris
Apr 21, 2007
Let's go to the abyss!

lazerwolf posted:

I believe you can do something like

code:
df[[col1, col2, col3, col4, col5]] = df.column_name.str.split(expand=True)
And that will return as many new columns as items in the split.

Check docs https://pandas.pydata.org/docs/reference/api/pandas.Series.str.split.html

From what I can tell (and tested) split will split on a character (or whitespace by default). Split won't split a word in to characters I don't think.

List(word) will break a word in to letters but I'm not sure of a clean way to do it within pandas.

QuarkJets
Sep 8, 2008

Hughmoris posted:

From what I can tell (and tested) split will split on a character (or whitespace by default). Split won't split a word in to characters I don't think.

List(word) will break a word in to letters but I'm not sure of a clean way to do it within pandas.

In pandas you can delimit on an empty string to split a word into characters:

Python code:
df.words.str.split('', expand=True)
You get empty columns at the start and end, but you can easily drop those. Or maybe use the regex option to eliminate them

Hughmoris
Apr 21, 2007
Let's go to the abyss!

QuarkJets posted:

In pandas you can delimit on an empty string to split a word into characters:

Python code:
df.words.str.split('', expand=True)
You get empty columns at the start and end, but you can easily drop those. Or maybe use the regex option to eliminate them

I was testing splitting strings and split() fails with an empty separator but it works fine with pandas so I'm set. Thanks!

QuarkJets
Sep 8, 2008

Hughmoris posted:

I was testing splitting strings and split() fails with an empty separator but it works fine with pandas so I'm set. Thanks!

That's correct! This special behavior is just for pandas. Python's str.split() does not support empty separators since you can just slice the string or use the splat operator (*).

ExcessBLarg!
Sep 1, 2001
It's quite refreshing that Pandas adopted Ruby-style string split/join operations instead of the terrible Python native ones.

QuarkJets
Sep 8, 2008

ExcessBLarg! posted:

It's quite refreshing that Pandas adopted Ruby-style string split/join operations instead of the terrible Python native ones.

I think split is basically the same across all 3 of those (Ruby, Pandas, Python)?

str.join in Python is the odd man out only because Pandas is in the business of defining collections, not strings. I think that both approaches are valid. In plain English it's like saying "I want comma-separated values" (Python) vs "I want values separated by commas" (Ruby and Pandas)

ExcessBLarg!
Sep 1, 2001

QuarkJets posted:

I think split is basically the same across all 3 of those (Ruby, Pandas, Python)?
I meant Ruby and Pandas both taking an empty string as the split delimiter to return individual characters. Although I prefer Ruby's String#each_char for its explicitness.

QuarkJets posted:

In plain English it's like saying "I want comma-separated values" (Python) vs "I want values separated by commas" (Ruby and Pandas)
Let's not be silly here. Join is a method on the string delimeter because Guido found a clever way to avoid making it another built-in, which itself is only a problem because Python makes anything that deals with itetables built-ins because, historically, there was no interface hierarchy among built-in data types.

ExcessBLarg! fucked around with this message at 01:01 on Apr 4, 2022

QuarkJets
Sep 8, 2008

ExcessBLarg! posted:

I meant Ruby and Pandas both taking an empty string as the split delimiter to return individual characters. Although I prefer Ruby's String#each_char for its explicitness.

Let's not be silly here. Join is a method on the string delimeter because Guido found a clever way to avoid making it another built-in, which itself is only a problem because Python makes anything that deals with itetables built-ins because, historically, there was no interface hierarchy among built-in data types.

It's not silly to think of code in terms of what's easy to interpret by humans, in fact I thought that was the whole point of programming languages. That's all that I mean. I don't really care what the history of Python's join is, I just find the hysteria over it to be... well, hysterical

punk rebel ecks
Dec 11, 2010

A shitty post? This calls for a dance of deduction.
So I finally finished my pokedex project.

The file is on my Github.

Would anyone like to check it out and give me their thoughts?

I was wondering if I was at the point where I can start applying for (likely entry level) software developer/engineer/programming jobs?

lazerwolf
Dec 22, 2009

Orange and Black

punk rebel ecks posted:

So I finally finished my pokedex project.

The file is on my Github.

Would anyone like to check it out and give me their thoughts?

I was wondering if I was at the point where I can start applying for (likely entry level) software developer/engineer/programming jobs?

Initial thoughts

Instead of specifying which packages are dependencies in your ReadMe, create a requirements.txt file with pinned versions of your dependencies.

Why is your code zipped?

D34THROW
Jan 29, 2012

RETAIL RETAIL LISTEN TO ME BITCH ABOUT RETAIL
:rant:
I was today years old when I discovered float.as_integer_ratio. It used to be I hosed with a function that determined a fraction based on the modulo of the decimal part or some weird poo poo. I can't even remember how I did it, but this is so much loving cleaner. (I'm aware the initial declarations aren't needed with duck typing. Perhaps it's not the ~Pythonic~ way to do it but I come from the world of VBA where you need to pre-declare variables and I vastly prefer declaring them for my own readability.)

Python code:
def to_fraction(number: float, as_fraction: bool = False) -> str:
    """...docstring..."""

    whole = 0
    numerator = 0
    denominator = 0
    if int(number) == number:
        return number
    fraction = number.as_integer_ratio
    if as_fraction:
        return f"{fraction[0]}/{fraction[1]}"
    else:
        whole = int(fraction[0] / fraction[1])
        numerator = fraction[0] - (whole * fraction[1])
        denominator = fraction[1]

        return f"{whole} {numerator}/{denominator}"
EDIT: Or version 2.
Python code:
    # Removed the declarations.

    if int(number) == number:
        return number
    frac = Fraction(number)
    if improper:
        return f"{frac.numerator}/{frac.denominator}"
    else:
        return (
            f"{frac.numerator // frac.denominator} "
            f"{frac.numerator % frac.denominator}/{frac.denominator}"
        )

D34THROW fucked around with this message at 19:08 on Apr 4, 2022

punk rebel ecks
Dec 11, 2010

A shitty post? This calls for a dance of deduction.

lazerwolf posted:

Initial thoughts

Instead of specifying which packages are dependencies in your ReadMe, create a requirements.txt file with pinned versions of your dependencies.

Good idea.

lazerwolf posted:

Why is your code zipped?

Because I’m an idiot who doesn’t know how to use GitHub properly.

D34THROW
Jan 29, 2012

RETAIL RETAIL LISTEN TO ME BITCH ABOUT RETAIL
:rant:

pip freeze > requirements.txt will take care of that for you.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

punk rebel ecks posted:

So I finally finished my pokedex project.

The file is on my Github.

Would anyone like to check it out and give me their thoughts?

I was wondering if I was at the point where I can start applying for (likely entry level) software developer/engineer/programming jobs?

I check GitHub for sw interns. This would be an instant fail because the code is zipped.

Ideally I should be able to clone your github, pip install -r requirements.txt into my virtual env following the instructions in your read me and the code should just work for those instructions.

If you pip freeze your code make sure you can pip install it into a fresh environment because it will be in alphabetical order which doesn’t always work.

Data Graham
Dec 28, 2009

📈📊🍪😋



D34THROW posted:

pip freeze > requirements.txt will take care of that for you.

And while this will do for a quick-n-dirty first pass, ideally you would want to go back and weed out all the things that got installed that aren't strictly "requirements" for your project and that you don't care about pinning.

Right? My instinct is to pin as little as possible, but is that best practice? I'm second-guessing myself now and thinking that might lead to more trouble than overspecifying.

Macichne Leainig
Jul 26, 2012

by VG

Data Graham posted:

And while this will do for a quick-n-dirty first pass, ideally you would want to go back and weed out all the things that got installed that aren't strictly "requirements" for your project and that you don't care about pinning.

Right? My instinct is to pin as little as possible, but is that best practice? I'm second-guessing myself now and thinking that might lead to more trouble than overspecifying.

I try to keep a mental map of the actual top level packages I'm installing and keep requirements.txt limited to that, but there's certainly a better way than trying to keep all that straight in your head, I'm sure.

D34THROW
Jan 29, 2012

RETAIL RETAIL LISTEN TO ME BITCH ABOUT RETAIL
:rant:
I cannot for the life of me remember the name of the tool I used which walked your code and created a requirements.txt based on what was actually used. Like...someone running my app doesn't need black or flake8 and poo poo like that.

Foxfire_
Nov 8, 2010

D34THROW posted:

I was today years old when I discovered float.as_integer_ratio.
I'm having trouble thinking of something that's actually useful for. Display code would want some application-specific approximation instead of the exact values (humans usually don't want to be told that "0.1" is 3602879701896397 / 36028797018963968 instead of 1/10). And if you were doing arbitrary precision math, you shouldn't have floats to begin with

Adbot
ADBOT LOVES YOU

The March Hare
Oct 15, 2006

Je rêve d'un
Wayne's World 3
Buglord
Something like Poetry will keep your requirements in separate files - one will contain packages that are top-level required, and then a lockfile with those packages deps.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply