Python

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

Rather than writing separate code for every possible replacement, you should aim to write one bit of code that works for everything.

A good place to start would be writing code that works out if there is any [text in brackets] in the string so far.

Then you can write code that expands the [text in brackets] based on what you've defined in your input dictionary. Note that this doesn't need any replacement-specific code - you're just snipping out one bit of text and putting a new one in its place.

Then you can repeat that whole process until there isn't any more [text in brackets] left.

# ¿ May 6, 2022 22:49

Adbot: ADBOT LOVES YOU

# ¿ May 17, 2024 06:11

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

The advantage being talked about is you can assign your format string to a variable and use it in several different places, instead of repeating it everywhere.

# ¿ May 7, 2022 02:27

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

Sure. With format strings you can do something like:

code:

user_fmt = '<a href="/users/{id}">{name}</a>'

// later...

user_fmt.format(**thread.first_poster)
user_fmt.format(**thread.last_poster)

It's kind of marginal because you could always write a format_user function, but it is something you can do with .format() that you can't do directly with fstrings.

# ¿ May 7, 2022 04:34

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

If I have to actually "parse" it mentally to figure out what it does instead of just looking at it and it being obvious, then yeah it's too complicated.

# ¿ Jul 3, 2022 15:40

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

If it's a simple cross-product mapping then sure, list comprehensions are great.

Once you start adding conditionals at multiple levels, that's when things get complex enough that you should consider being more explicit and writing a loop instead.

# ¿ Jul 4, 2022 05:23

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

If you want an honest comparison then perhaps you should avoid writing the "for loop" version in a really stupid way.

Like maybe instead you could write it something like this:

code:


result = []
for i in range(10):
    if i*(i-1) >= 30:
        continue 
    result.append([val for j in range(15) if (val := i*j + j) < 50])

And then because you've avoided writing it in a stupid way, you can identify that your condition is monotonic - once it starts failing, it will keep failing, so you don't even need to check the rest of the elements:

code:


result = []
for i in range(10):
    if i*(i-1) >= 30:
        break 
    result.append([val for j in range(15) if (val := i*j + j) < 50])

# ¿ Jul 4, 2022 06:08

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

Sorry, but this structure:

code:

sub_result = some_completely_non_side_effecting_calculation()
if a_condition_that_could_easily_be_checked_before_doing_the_calculation():
  result.append(sub_result)

Is pretty objectively stupid. If you want to convincingly argue that your approach is superior, you should be comparing your approach to the best possible alternatives, not crazy ones that nobody would really write.

# ¿ Jul 4, 2022 11:07

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

random.choices has to start by iterating the entire list of weights, which of course will get slower the more items you have.

If you supply cumulative weights to random.choices then it should perform a bit better.

# ¿ Aug 12, 2022 04:31

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

Goodhart's law applies here - the moment you make code coverage an actual target instead of just something that you look at to figure out what part of the codebase might need more testing, you're going to wind up with a whole bunch of pointless change-detector tests that exercise all the code paths without actually testing the behaviour you're interested in.

# ¿ Aug 21, 2022 08:17

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

If you make your key function return a tuple instead of just a single value, then the results will be first sorted by the first element in the tuple - and if two list entries have the same first element, they will then be sorted by the second element in the returned tuple, and so on to the third element if the second elements are also equal.

# ¿ Sep 5, 2022 10:52

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

Python seems fine for this. If I remember my Fighting Fantasy correctly, you're looking at maybe a couple of hundred sections per book? This is on the level where getting "good enough" performance means putting your algorithm together in an effective way, rather than writing it in the most high-performance language.

The big concern I have is stat checks - for example, fights could leave you with a whole bunch of different stamina values at the end of them, and if you go and construct an entirely separate graph of followup states and duplicate all the work going forwards for every single possible fight outcome, then your solution is likely to be too slow no matter what language you write it in.

But the thing about stat checks is that they don't change the shape of the graph at all, they just change the odds of each outcome. You could instead construct and simplify the graph first, reducing each possible path to the series of fights and stat checks it will require, and then go through and evaluate the actual probabilities based on the simplified graph.

# ¿ Sep 29, 2022 00:52

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

Are you creating a program to play the gamebook, or are you creating one to solve the best path through it? Because while the structure you're going for seems adequate for allowing a human to play through the book, it's going to make things very difficult as far as efficiently solving it goes.

To solve the book efficiently, you're going to want to manipulate and simplify the section graph. You can't do that if the links between sections are just opaque functions - you want your code to be able to inspect those connections and merge them together. (For example, combining a chain of free choices between two-to-three options into a single free choice between 20 different outcomes; or combining a series of free and random choices into a single weighted random choice).

Instead of expressing each section as an opaque function, I'd aim to express it as a structured data element. Something like this:

code:

section:
{
  name: string // section name. Will just be the section number (e.g. "169") for most sections.
  effects: [effect] // list of all effects that happen as soon as you enter. e.g. gain or lose an item, adjust stats, enter combat, etc.
  edges: [edge] // list of all possible outgoing edges.
}

edge:
{
  target: string // which section this edge leads to
  requirements: [requirement] // list of everything needed to take this edge - which items, skills, etc.
  effects: [effect] // list of all effects that happen when you take this edge. e.g. adjust stats, lose an item, etc.
}

By encoding the book in a more structured way like this, you can have a function that looks at the current section, identifies what the user can do here, and gives them the appropriate choice of where to go - but you can also have a different set of functions that works on the same underlying data and manipulates the section graph in order to simplify it and make it solvable.

# ¿ Oct 1, 2022 08:53

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

If you name your functions starting with an _, that's a convention to signify stuff that shouldn't be called from outside the module. It also means that it won't be imported when someone does from your_module import *.

It's still technically possible to call it, but it'll look obviously fucky to any experienced Python programmer.

# ¿ Oct 9, 2022 01:32

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

I think it's pretty reasonable to call out that something might be significantly below-market - often the sort of new-to-the-industry person who's the focus of that ad doesn't have a good idea of what typical compensation looks like.

If it is actually good compensation for the sort of applicant being targeted, then it's not like it's hurting anybody if that applicant has a look around and realises what typical market rates are first.

# ¿ Oct 30, 2022 01:17

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

Can you just do _property = property somewhere in the file before the property field is defined?

# ¿ Nov 2, 2022 12:06

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

Deadite posted:

So astype is the first part executed before the first "where" happens? I thought the first "where" statement would have filtered out the 'Missing' strings before the second where executes and applies the astypes

The order of operations might become clearer if you were to break out every subexpression and assign it to a variable, instead of packing it all into one line.

# ¿ Nov 24, 2022 05:05

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

Does python not have an arraydeque of any kind? It shouldn't be O(n) just to read items by index in the middle.

# ¿ Feb 11, 2023 23:21

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

QuarkJets posted:

Assuming you're referring to Java's arraydeque, Python's equivalent is just called deque: https://docs.python.org/3/library/collections.html#collections.deque

quote:

Indexed access is O(1) at both ends but slows to O(n) in the middle.

This doesn't look like an arraydeque to me. It should have O(1) indexed access (as long as you're just peeking at elements and not adding/removing).

# ¿ Feb 11, 2023 23:49

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

Rust's VecDeque supports indexed access.

It's not actually particularly useful for the sort of things that you'd use a deque for - but O(n) access implies that it's some sort of slow linked data structure instead of a contiguous ring buffer, which is pretty bad.

# ¿ Feb 12, 2023 08:52

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

Just what do you think content contains?

# ¿ Feb 20, 2023 22:26

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

A very common way to handle streaming data is to accumulate it into a buffer, do what you need to with it, rinse and repeat when you get the next chunk of streaming data.

Plotly doesn't care how you get the data you want it to graph, nor should it care. Plugging the two blocks together is your job as a programmer.

# ¿ Feb 23, 2023 03:21

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

The error messages are not very helpful, if you don't look at them.

# ¿ Mar 1, 2023 01:58

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

Reading the linked pyce post, it literally just decrypts the code into memory when loading it. So absolutely trivial to bypass for anyone who knows what they're doing. I don't care enough to go look up what singularity containers are, but based on that precedent I'm just gonna assume they're equally bad and won't stop a skilled attacker.

If you're worried about your software being ripped off by the people you're selling it to, the right answer is to put something in the contract they're signing with you that says they can't do that. (You should be doing that - specifying what they can and can't do with the software - even if you're not worried about them ripping it off!). Then if they breach it you can sue the pants off them. If your current legal team is not up to that sort of thing, fire them and get a new one that actually knows what they're doing as far as contract law goes.

# ¿ Mar 17, 2023 11:55

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

Trying to do this without any outside resources is some bullshit "do you know exactly how the Python % operator works".

Even once you know that, it's more of a bullshit math problem rather than a programming one.

Anyway the way to solve it is:
- Partition the input into two groups - one group that contains all the largest numbers (by absolute value), and the other group with the smallest numbers.
- Sort each group separately, in ascending order
- Fill the output array such that you take the lowest value from the large-number array whenever it will be multiplied directly, and the lowest value from the small-number array when it will be used in the % calculation.

Then there are a bunch of edge cases to cover:
- if there are any zeros, instead you just sort the array in ascending order and tweak it so that zero never gets used in a % calculation.
- if there are any ones (or negative ones), those go in the large-number partition. if there are so many ones that they'd spill over to the small-number partition regardless, ignore the partitions and just sort ascending.
- if there are an odd number of negative inputs, you actually want to multiply the smallest numbers directly (so that the final result is least negative).

Bad question imo.

# ¿ Mar 24, 2023 00:59

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

Sometimes there are legitimate reasons to model your data that way. Don't contort your data into some totally bullshit shape just to avoid a circular import.

That said, this particular shape (Groups and Users, where Groups contain Users and a User can be in many Groups) probably shouldn't be modelled that way. One pretty good way to do it would be to have Groups containing Users, and then a third component that supports looking up all the Groups for a particular User.

# ¿ Apr 14, 2023 16:44

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

QuarkJets posted:

A simple tree hierarchy is never a "totally bullshit" shape, whereas a circular hierarchy often is

If your data is modelled as a tree, being able to step from a child to its direct parent is incredibly useful, but requires a circular reference.

# ¿ Apr 15, 2023 00:19

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

QuarkJets posted:

I don't think that's true - for instance the Qt data model permits directly stepping to a parent (via a parent property that basically all Qt classes have), but the codebase doesn't tend to have circular references. In C++ this requires a generic class that all classes with parentage inherit from (in Qt that class is QObject), in Python you can either define a similar class that manages parent/child linkage or just not use type annotations and allow parent to be dynamically typed

Both of those are worse than just having the circular reference.

# ¿ Apr 15, 2023 01:50

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

Can you define what "brittle" means to you and why you think it's bad?

It's totally okay to have a "package" of closely-related files that depend on each other. You shouldn't feel like you need to combine them all into a single file just because you think mutual dependencies feel icky.

# ¿ Apr 16, 2023 01:39

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

Can you give an example of how having the return value properly typed as a List[Group] instead of a List[Any] makes your code "easier to accidentally break when you're later trying to maintain it"?

# ¿ Apr 16, 2023 06:39

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

More relevantly for Java, changing from a raw field to a property getter is a source-breaking change, so you also need to go through and change every other file that touches that field.

# ¿ Apr 22, 2023 01:15

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

You seem to be expecting the chatbot to actually know things and I'm really not sure why you have that expectation?

# ¿ May 18, 2023 04:17

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

the python book he's well-known for is not particularly good either.

you might even describe it as the unnecessarily hard way to learn python

# ¿ Jul 25, 2023 13:43

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

Attach a debugger and see what the worker thread is actually doing.

# ¿ Aug 18, 2023 05:44

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

"I want to use just the python standard library but not the standard library function that literally implements exactly the thing I'm trying to do" seems like a very strange constraint.

# ¿ Aug 31, 2023 13:06

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

Essentially what you're doing is a reduction (functools.reduce), but you want to have a generator that outputs every intermediate result instead of just the final one. That's basically the purpose of accumulate - but even if you didn't know about it exactly, it seems reasonable to suppose that a stdlib function to do that would exist?

Would it feel better if you were using the arguments to explicitly specify the initial value and a combining function instead of just using the defaults?

# ¿ Aug 31, 2023 16:47

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

If it's gotten significantly slower recently, the first thing I'd do is grab an older version of the script that predates the slowdown.

This does two things:
- Lets you confirm that the slowdown was caused by a code change (or tells you to look elsewhere if it turns out the old version has also mysteriously gotten slower!)
- Gives you a baseline for comparing profiles - if something takes a long but roughly equal time in both profiles, then it might be a good candidate for general optimization, but isn't actually the cause of the regression you're investigating. If something takes a long time with the new code but not with the old one then that's where you want to focus.

Jabor fucked around with this message at 02:24 on Oct 1, 2023

# ¿ Oct 1, 2023 02:22

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

Many languages support the concept of a RangeMap, which you can insert Ranges (essentially, pairs of a start value and an end value) into and then look up single values in. Would that be useful for what you're trying to do?

# ¿ Oct 15, 2023 23:21

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

Even then you almost certainly want a synthetic primary key plus a separate unique constraint on the set of columns you think should be unique.

# ¿ Nov 4, 2023 14:09

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

The things it solves are major issues that you run into when you're trying to operate a system in reality instead of as a homework assignment that will be thrown away after it's been graded.

Notably, you now have a realistic path for changing your constraint when (not if, when) it turns out that your initial domain modelling wasn't correct.

Also you're actually saving space rather than spending more of it, since other tables referencing your synthetic key are smaller than if they had to reference your entire composite key.

# ¿ Nov 5, 2023 04:07

Adbot: ADBOT LOVES YOU

# ¿ May 17, 2024 06:11

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

One possibility is to pick a random direction and distance, instead of doing X and Y separately.

Another possibility is to draw Bezier curves with randomly generated control points.

You could bias your diffs in some direction - so that it looks generally random at the small scale, but slowly tracks across the plot when you look at it overall.

--

Really there are lots of fun things you can do with this.

# ¿ Nov 14, 2023 07:13

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python

«‹›2 »