Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
NinpoEspiritoSanto
Oct 22, 2013




QuarkJets posted:

I don't have them on hand, I found numbers some time ago while trying to decide if a new project should use something other than pymysql, which had become standard in a lot of my projects. I landed on "the performance differences aren't significant enough to use something other than pymysql".

Anything that uses the C API should be extremely fast, simple but pure python implementations are almost as fast, and higher-level APIs like SQLAlchemy were measurably slower but not enough to matter for most applications.

(iirc, Connector actually offers both a pure python implementation and a C API wrapper, and you can deliberately use one or the other)

mysqlclient (MySQLdb for py3) uses the c lib and is usually quickest with cpython. Worth noting pymysql is screaming fast with pypy where c API drivers struggle due to lack of cffi interface.

Adbot
ADBOT LOVES YOU

Data Graham
Dec 28, 2009

📈📊🍪😋



To draw the two threads of exception-handling-as-flow-control and database-manipulation together, a benefit of try-except is as a way to leverage the "better to ask forgiveness than permission" principle.

In other words, when working with a database, one common pattern is:

code:
Check if an object exists in the database
If it exists,
    get(object)
else,
    insert(object)
If you do this with if-then flow control, you're subject to a race condition, because other threads or apps might change the state of the database in between the "check if object exists" part and the "get or insert" part.

But the try-except approach means you do like:

code:
try:
    insert(object)
except object.AlreadyExists:
    get(object)
That way it's an atomic operation.

Not every kind of logic benefits from this pattern, but this is one where it's pretty relevant.

Wallet
Jun 19, 2006

Data Graham posted:

To draw the two threads of exception-handling-as-flow-control and database-manipulation together, a benefit of try-except is as a way to leverage the "better to ask forgiveness than permission" principle.

Using try/except as flow control usually feels a bit off to me except in cases like this where there's a high degree of uncertainty about the state/contents of whatever is being operated on (a database, responses from an API, user inputs in some cases). I have to agree with Dominoes that using try/except to test items in a list of objects with unknown methods sounds a bit fishy to me, partially because all of the possibile objects are probably well known and so this shouldn't really be necessary, and partially because passing around objects with different sets of methods in an undifferentiated list seems like a recipe for running into all kinds of other problems.

Whybird
Aug 2, 2009

Phaiston have long avoided the tightly competetive defence sector, but the IRDA Act 2052 has given us the freedom we need to bring out something really special.

https://team-robostar.itch.io/robostar


Nap Ghost

Wallet posted:

Using try/except as flow control usually feels a bit off to me except in cases like this where there's a high degree of uncertainty about the state/contents of whatever is being operated on (a database, responses from an API, user inputs in some cases). I have to agree with Dominoes that using try/except to test items in a list of objects with unknown methods sounds a bit fishy to me, partially because all of the possibile objects are probably well known and so this shouldn't really be necessary, and partially because passing around objects with different sets of methods in an undifferentiated list seems like a recipe for running into all kinds of other problems.

OK, so a more concrete example of the kind of thing I'm thinking of would be something like

code:
class Character:
    ...

class Hostile_Character(Character):
    
    def on_enter_room(self, player):
        self.add_target(player)
        self.say("Prepare to die, %s!" % player.name())


class Player

    ...

    def move_to_room(self, new_room):

        npcs_in_room = new_room.get_npcs()
        for[ npc in npcs_in_room ]:
            try:
                npc.on_enter_room(self)
            except AttributeError:
                pass
        
        ...

(this isn't actual production code, but hopefully it describes the kind of situation I'm envisioning.)

It feels like my alternatives are:
- check whether each NPC in the list has an on_enter_room method attached, and if so call it
- maintain a separate list of npcs and hostile_npcs in the room object, which I have to be sure is kept updated
- some sort of complicated thing with events and listeners

The more I think about it the more I think that the first case doesn't really violate EAFP since it's really unlikely that in between me checking and then calling the function the pointer's going to switch to a new object with a different set of methods, but part of why I'm asking this is to get a clearer idea of how much a sin each of my options there are in the Python paridigm.

KICK BAMA KICK
Mar 2, 2009

Define that method on the parent as just pass and override it in the child classes as needed?

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man
Inheritance is a path forward there for sure, but I'd also consider combining some more strict use of structural typing. To me, catching AttributeError in the normal course of events is not great even though it will work because it speaks to the organization of the code - why could things that don't have that attribute actually be used there? "It's a third party library that sucks" is a good reason, but if you're writing it all yourself I'd consider either
- Requiring on_enter_room to be defined but having it do nothing if necessary
- Making on_enter_room a class attribute that contains a possibly-empty list of callables (and remember, bound methods are first class so you can put like self.bark in the list)

In general I also find it's really helpful to use type annotations and an external static analyzer like mypy, which can catch a lot of the weirdnesses that might otherwise make you use attributeerror while you're writing your code. It's really easy to set up.

Wallet
Jun 19, 2006

KICK BAMA KICK posted:

Define that method on the parent as just pass and override it in the child classes as needed?

Probably something like this. Have the Character or HostileCharacter (convention is for class names to use CapWords as a side note) or whatever implement an empty or generic method and then override it for relevant character types if required.

I would be more likely to go in the direction that Phobeste is suggesting (though probably with both a method and an attribute that lists the callables) rather than empty in this specific example because it will be a lot cleaner going forward to be able to easily define the behavior of specific types of characters by specifying the set of methods they are going to execute when they enter rooms.

Dominoes
Sep 20, 2007

Use a fn or method that checks weather the hostile can enter the room.

eg:
Python code:
fn can_enter(self, room: Room) -> boolean:
    if room.is_locked:
        return False
    if self.is_too_drunk:
        return False
    if self.enemy.id in [h.id for h in room.inhabitants()]:
        return False

    return True

# ...

class EntryStatus(Enum):
    SUCCESS = auto()
    FAIL = auto()

# ...


def move_to_room(self, new_room) -> EntryStatus:
        npcs_in_room = new_room.get_npcs()
        for npc in npcs_in_room:
            if not npc.can_enter(new_room):
                return EntryStatus.FAIL

        # Do things
        return EntryStatus.SUCCESS

Dominoes fucked around with this message at 17:10 on Jul 17, 2020

mr_package
Jun 13, 2000
I might use an attrs class (I use attrs a lot so it's become my hammer) and define a "hostile" bool attribute for all NPCs assuming this isn't going to grow to an unmanageable number of fields. Which gets you something like :
code:
if npc.hostile: 
    npc.hostile_action()
Or as Wallet suggests, an "on_enter" list of callable functions that's just empty by default. But I think Dominoes is on the right track in the sense of attaching the attribute to the class when you might define /check its state a little differently)

I actually like Phobeste's idea of defining hostile_action() in an NPC base class and just passing and only over riding it with something for hostile NPCs.

Or maybe all NPCs get enter_room_action() function and friendlies maybe do something, hostiles do some other thing, some do nothing.

I would also say this is one of the times just using isinstance() is fine, and I would still try/except:
code:
if isinstance(npc, HostileNPC)
    try:
        npc.hostile_action()  # maybe enter_action() instead, defined for all NPCs?
    except:
        pass  # optionally do something else or raise here if you want all HNPCs to have a hostile_action

mr_package fucked around with this message at 18:34 on Jul 17, 2020

QuarkJets
Sep 8, 2008

I disagree with using isinstance for this kind of thing. You already have total control over all of the classes so the inheritance design is the cleanest one and you won't even need any try/except blocks, which are really just useful when you don't already have control over all of the inputs.

CarForumPoster
Jun 26, 2013

⚡POWER⚡
I just discovered Zappa and made my first serverless web app.

I got a simple Django site up and running in one day with no servers, no building config files, nothin'. Used a template cloudformation config file and zappa script that I just pasted my values into and bam. Why the gently caress have I been mucking around with configuring EBS and all its picky nuances when serverless is apparently way way way easier and not to mention cheaper?

I already host all my static with S3 because I use CloudFront to gzip it so theres basically nothing holding me back from migrating all my projects to serverless. I feel like I probably wont ever SSH into anything ever again.

CarForumPoster fucked around with this message at 02:45 on Jul 18, 2020

Bad Munki
Nov 4, 2008

We're all mad here.


Gateway/lambda/zappa has its own issues. They don’t manifest in most cases, but there are some critical shortcomings.

For instance, all responses from lambda->gateway are en bloc, which means you can’t stream response as your lambda generates it.

Also the timing limitations are way more strict, so if your process needs to run for any real amount of time, too bad.

I realize I’m in the minority here, but man it’s still annoying. We have dual deployments of a particular package because of all this, one via zappa and one via beanstalk.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Bad Munki posted:

Gateway/lambda/zappa has its own issues. They don’t manifest in most cases, but there are some critical shortcomings.

For instance, all responses from lambda->gateway are en bloc, which means you can’t stream response as your lambda generates it.

Also the timing limitations are way more strict, so if your process needs to run for any real amount of time, too bad.

I realize I’m in the minority here, but man it’s still annoying. We have dual deployments of a particular package because of all this, one via zappa and one via beanstalk.

My plan was to break up bulk or long running tasks with AWS step functions. Really long stuff I tend to use sagemaker for already.

Bad Munki
Nov 4, 2008

We're all mad here.


Yeah, in our case, we stream a single, continuous response to the client, that is created by hitting a secondary backend, which is NOT in our control, parsing those results, doing a bunch of transformations and calculations on the data, and then forwarding it on to the end client. Of course, the secondary backend has strict paging limitations and doesn’t allow parallelized results fetching. So we pull a couple thousand results at a time, do our magic to them, and then send that chunk off to the client, and repeat repeat repeat, but the client doesn’t know it’s chunked and just sees one looooooong rear end json or csv or whatever response.

We have a web UI that uses the thing which self limits to never request more than a couple thousand items, that one runs on the zappa deployment. But we also have a lot of users hit the service directly for batch and scrape jobs, those ones can produce tens or hundreds of thousands of results and can run for like an hour. Personally, I’ve run requests that processed and streamed results for a full 24 hours, pulling several million results.

And in all cases, it has to come back as a single monolithic file. Which is nice and convenient, I admit! But still. :negative:

Wallet
Jun 19, 2006

QuarkJets posted:

I disagree with using isinstance for this kind of thing. You already have total control over all of the classes so the inheritance design is the cleanest one and you won't even need any try/except blocks, which are really just useful when you don't already have control over all of the inputs.

Yeah, I'm not really following why you would do this when you could instead just have all npcs run an on_enter() method that executes any relevant callables if you've defined them; if the callables are defined in an attribute and the on_enter() method remains generic you can just put any universal fallback behavior you need in the method.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Bad Munki posted:

Yeah, in our case, we stream a single, continuous response to the client, that is created by hitting a secondary backend, which is NOT in our control, parsing those results, doing a bunch of transformations and calculations on the data, and then forwarding it on to the end client. Of course, the secondary backend has strict paging limitations and doesn’t allow parallelized results fetching. So we pull a couple thousand results at a time, do our magic to them, and then send that chunk off to the client, and repeat repeat repeat, but the client doesn’t know it’s chunked and just sees one looooooong rear end json or csv or whatever response.

We have a web UI that uses the thing which self limits to never request more than a couple thousand items, that one runs on the zappa deployment. But we also have a lot of users hit the service directly for batch and scrape jobs, those ones can produce tens or hundreds of thousands of results and can run for like an hour. Personally, I’ve run requests that processed and streamed results for a full 24 hours, pulling several million results.

And in all cases, it has to come back as a single monolithic file. Which is nice and convenient, I admit! But still. :negative:

Yea this is good to think about and I'll be doing something similar at times. I could def see doing batch jobs that are pulling 100kb-2MB of data from 1 million rows that need to come back as one monolithic thing and probably would be rate capped such that that'll take a while. Plan was to queue up a ton of rate limited lambdas but I'd still need some sort of semi-permanent storage for those jobs until the final report is done and error handling to detect the failures that occur along the way.

Sounds like I should plan for that in more detail than I have so far. Thanks for sharing.

Happiness Commando
Feb 1, 2002
$$ joy at gunpoint $$

I think this is a math question and not a python question, but help me anyway. A couple years ago, I did the Programming for Everybody course from Coursera that I saw recommended in a bunch of places. Then I got a job that was all Powershell, so I put python down. I picked it up again, and now I'm working through Automate the Boring Stuff because I figured it was a fine place to start (again). I'm doing one of the exercises and getting unreasonable results. The prompt is

quote:

Write a program to find out how often a streak of six heads or a streak of six tails comes up in a randomly generated list of heads and tails. Your program breaks up the experiment into two parts: the first part generates a list of randomly selected 'heads' and 'tails' values, and the second part checks if there is a streak in it. Put all of this code in a loop that repeats the experiment 10,000 times so we can find out what percentage of the coin flips contains a streak of six heads or tails in a row. As a hint, the function call random.randint(0, 1) will return a 0 value 50% of the time and a 1 value the other 50% of the time.

My code below (which does an obviously wrong thing claiming the sum of streaks is per-experiment) returns a percent chance of around 2.9%. Math says it should be 1.56% It's plausible, I suppose that a pseudorandom generator would get me the .2% difference between 1.56% and half of my answer, but I'm off a ton, which makes me think that I'm evaluating the streak incorrectly. What am I missing?
code:
import random
streakNum = 0
for experimentNum in range(10000):
    #roll 1d2 100 times
    results = []
    #streakNum = 0
    for i in range(99):
        results.append(random.randint(0, 1))
        #streakNum = 0
    #check if there are 6 in a row
    for i in range(93):
        innerSum = 0
        for j in range(5):
            innerSum += results[i + j]
        if innerSum == 6 or innerSum == 0:
            streakNum += 1
        tot
    print('in experiment number ' + str(experimentNum) + 'there were ' + str(streakNum) + 'streaks' )

print('Chance of streak: %s%%' %(streakNum / 10000))

a foolish pianist
May 6, 2007

(bi)cyclic mutation

code:
In [1]: for i in range(5):
   ...:     print(i)
   ...:
0
1
2
3
4

In [2]:
So range(n) is exclusive of n. Among other issues, this is going to cause you to only check for streaks of length 5, not 6.

necrotic
Aug 2, 2005
I owe my brother big time for this!
If you get seven tails in a row should that count as two streaks, or just one? When counting that as only a single streak I'm getting ~1.5-1.6%.

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL
Your code always looks at the next term in the sequence, so any streak longer than the streak length n you are looking for is going to be counted as multiple streaks of length n.

Dominoes
Sep 20, 2007

Something's missing from the description.

"the first part generates a list of randomly selected 'heads' and 'tails' values"

How big's the list?

Edit: This is a poorly defined problem. I backwards-interpreted it from the given solution, assuming we're counting the number of streaks compared to the number of flips: 2⁻⁶. The base comes from the probability of getting your prev value, and exponent comes from the streak len.

Python code:
import random

head_streak = 0
tail_streak = 0
streaks = 0
N = 10_000
for i in range(N):
    if random.randint(0,1):
        head_streak += 1
        tail_streak = 0
    else:
        tail_streak += 1
        head_streak = 0

    if head_streak == 6 or tail_streak == 6:
        streaks += 1
        head_streak = 0
        tail_streak = 0
    
print(streaks / N)
If you up the number of tries, the result converges on the expected value.

Dominoes fucked around with this message at 01:09 on Jul 21, 2020

Happiness Commando
Feb 1, 2002
$$ joy at gunpoint $$


That's, uh, really simple. Thanks.

Dominoes
Sep 20, 2007

This is one of the lovely cases where you have an analytical solution avail, so even more simply:

Python code:
print(2**-6)
edit: Here's one way it could have been worded:
"What percentage of sequential coin flips conclude a streak of exactly 6 heads or 6 tails? "

Or if they really want to force you into a particular approach to teach it:
"Simulate 10,000 coin flips. How many of those concluded a streak of exactly 6 heads or 6 tails? " "... Now that you've completed the simulation, verify it works by flipping a coin 10,000 times, and comparing your results."

Dominoes fucked around with this message at 02:37 on Jul 21, 2020

KICK BAMA KICK
Mar 2, 2009

Never seen this before!

Dominoes
Sep 20, 2007

The underscore? It's from PEP 515, and was introduced in Python 3.6. It's common in modern languages (but isn't exclusive to them! See Prior Art section of that PEP.). Rust's linter will complain if you write long numbers without it.

Dominoes fucked around with this message at 01:32 on Jul 21, 2020

QuarkJets
Sep 8, 2008

Happiness Commando posted:

That's, uh, really simple. Thanks.

It also fails the assignment, technically. Per the instructions, you need to store the generated heads/tails values in a list. But this is very easy to fix

e: Dominoes raises a good point about the problem being poorly written. Taking a very literal interpretation, it defines an experiment as "flip a coin N times, store those values in a list, then count how many times you have a streak in that list". It then says to conduct that experiment 10k times, in a for loop. If you want to very literally interpret the instructions, then do this. If you want to "find out what percentage of the coin flips contains a streak of six heads or tails in a row" then you can just flip a coin 10k times.

QuarkJets fucked around with this message at 02:15 on Jul 21, 2020

Dominoes
Sep 20, 2007

I picked my interpretation based on lucking into once that hit the given answer.

I originally assumed the list was 10,000 long, and you could run the experiment however many times you want to get decent accuracy. Spoiler: You'll almost always get a streak in 10,000 flips, so that answer came out to be close to 100%!

mr_package
Jun 13, 2000
My main app is based on attrs classes that hold/manipulate data. I read/write json, xml, csv, and other formats into / out of these classes. I don't want to have a bunch of save_json() load_xml() etc. as separate functions in one file (started to look pretty cluttered) so I put these load/save ops into their own modules in a sub-package named "io" and each of these as "json", "xml", and "csv".

This presents quite a bit of namespace collision, but my __init__.py files are all empty so I'm explicitly importing what I need where I need it, e.g. import mypackage.io.json so I can call mypackage.io.json.save() so everything should work (and it does). But PyCharm's debugger threw a fit at these names and wouldn't even run until I changed "io" to something else, and incidentally refactor -> rename didn't actually fix the imports or the calls to these functions so I had to do it all by hand.

Am I doing something wrong? What is the point of the module system and namespaces if they still conflict with the standard library? People actually are expected to know every STL module and ensure they don't conflict anywhere? Or is this technically a PyCharm bug but something I should kind of expect to see if I'm going to use "io" as a module name?

accipter
Sep 12, 2003
Would a relative import solve the issue? Instead of "import io", you would do "from . import io".

Wallet
Jun 19, 2006

mr_package posted:

People actually are expected to know every STL module and ensure they don't conflict anywhere?

The import system is trying to make it easy to import things without specifying where they are except in cases where being explicit is necessary. I'm not sure that's the best way for it to work, but that is how it works.

The solution here is what you did—rename it because while you don't have to, life will be a lot easier if you do.

Dominoes
Sep 20, 2007

How would you package a standalone/portable version of Python for Linux, that includes the expected extras? The issue of cross-distro-compatibility is not as tough as it seems at first. I've found I can get a 90% solution by having an older binary (compile on CentOs7) that works on things like CentOs, RedHat, Fedora etc, and a newer binary (compile on Ubuntu 16.04) that works on most other distros >= Ubuntu 16.04's release date.

On Windows, you can do this by installing using an official installer, then copying the relevant part of the python folder and zipping it. Doesn't seem like it should work, but it does! The result is a file much larger than the installer, but that is probably a necessary sacrifice.

On Linux, you can get a portable python by compiling using the official instructions included in the source code... However, it's missing some extras that people expect it to have (I don't remember which ones, but I believe it includes TkInter, wheel, and some other things that will come up in a significant minority of cases).

Any ideas on dealing with this? You can't use the Windows approach, since the install is generally fragmented across multiple places. You'd think there would be a compiler flag for "include all", but this isn't the case. relevant GH thread.

This might have an easy solution... Or might not. I have a very low success rate with compiling and linking C code, so perhaps someone better at that might have a pointer.

Dominoes fucked around with this message at 00:57 on Jul 26, 2020

QuarkJets
Sep 8, 2008

Dominoes posted:

How would you package a standalone/portable version of Python for Linux, that includes the expected extras? The issue of cross-distro-compatibility is not as tough as it seems at first. I've found I can get a 90% solution by having an older binary (compile on CentOs7) that works on things like CentOs, RedHat, Fedora etc, and a newer binary (compile on Ubuntu 16.04) that works on most other distros >= Ubuntu 16.04's release date.

On Windows, you can do this by installing using an official installer, then copying the relevant part of the python folder and zipping it. Doesn't seem like it should work, but it does! The result is a file much larger than the installer, but that is probably a necessary sacrifice.

On Linux, you can get a portable python by compiling using the official instructions included in the source code... However, it's missing some extras that people expect it to have (I don't remember which ones, but I believe it includes TkInter, wheel, and some other things that will come up in a significant minority of cases).

Any ideas on dealing with this? You can't use the Windows approach, since the install is generally fragmented across multiple places. You'd think there would be a compiler flag for "include all", but this isn't the case. relevant GH thread.

This might have an easy solution... Or might not. I have a very low success rate with compiling and linking C code, so perhaps someone better at that might have a pointer.

Package your installation into whatever package manager binary format that your preferred distro uses. For CentOS that means building an RPM. It's pretty easy to do, once you get over a small learning curve. You can establish whatever series of steps is necessary for the build process, or you can literally just package a full directory of preinstalled crap; rpm doesn't care. The RPM can contain as little or as much as you want.

If you use Anaconda that installs everything to a common place, and the zip trick should work, but only if you unpack to the same install path. An RPM lets you standardize the install location.

QuarkJets
Sep 8, 2008

If you find that when you build Python from source that it's missing certain things, you may either A) not have the build dependencies for those things or B) not have the right configure options enabled to enable those things. For instance Tkinter lives on top of Tk, the configure step may be detecting that your build environment is unable to build Tkinter and just skips it. You've got to dig through those outputs and check the configuration flags to see what's going on

dirby
Sep 21, 2004


Helping goons with math
I haven't done much with Python before, but I've spent over a half hour googling and fighting and can't get this one step to work out nicely.

I have a very large file that has an instance (probably exactly one) of a long string (like "abcdefghijklmnopqrstuvwxyz") on a single line. I would like to replace it with a shorter string (like "a to z") if there are multiple instances I'm fine replacing them all, and I'm also fine replacing just one. I'm even fine with padding with spaces or something like "a to z " if getting the string the same length would help. My file is large enough that I can probably hold it in ram, but I'd really rather not try to hold two copies of it open in their entirety (if I even can). Also my file is utf-8 encoded with some weird characters that break convenient file-handling tools that don't let me specify utf-8.

(Also I'll want to do this sort of thing about 10,000 times, but if I can do it once in under a minute so that 10000 times takes under a week, that's fast enough.)

What package/approach should I be looking at?

dirby fucked around with this message at 22:34 on Jul 26, 2020

Dominoes
Sep 20, 2007

You don't need anything beyond the std lib for this. Start using with open, in read mode. I believe encoding is specified using a kwarg, and defaults to UTF-8.

You could iterate through the lines using readline() and wouldn't have to open more than one line in memory at once. Execute a replace("abcd...", "a to z") on the line, and if you found one, replace it in the original file. (Again, I'm not sure the exact fn).

A more raw way would be streaming over the bytes. Convert your string to bytes (ie a list of integers) ahead of time, and run a continuous check of if you've found a match. I don't think you'd need this approach, but it wouldn't be much more complicated. The max size in mem is effectively a single char, since you wouldn't need to store lines: You'd just store whether you're currently in a possible match or not.

Dominoes fucked around with this message at 23:37 on Jul 26, 2020

lazerwolf
Dec 22, 2009

Orange and Black
Do you know the exact string your want to replace? If so, why not just use sed and not bother with python?

dirby
Sep 21, 2004


Helping goons with math
Thanks to both of you for taking the time.

Dominoes posted:

You could iterate through the lines using readlines() and wouldn't have to open more than one line in memory at once.
Maybe I guessed wrong about how lists work in memory. I thought if I don't put a byte limit on it, readlines() would hold a list of every line in memory.

quote:

replace it in the original file. (Again, I'm not sure the exact fn).
This is most of my problem. write doesn't write a line, it writes the whole file.

quote:

A more raw way would be streaming over the bytes. Convert your string to bytes (ie a list of integers) ahead of time, and run a continuous check of if you've found a match.
A quick skim of python's io stream documentation doesn't seem to offer something that stands out to handle the writing part of this.

lazerwolf posted:

Do you know the exact string your want to replace? If so, why not just use sed and not bother with python?
Making a list (for a script to run through) of all the long and short strings to replace is on the order of the size of the original file. But it may be worth doing that way if it's too hard to do in-place...

Dominoes
Sep 20, 2007

My bad, typo. I think readlines() would load the whole file in memory as a list of strings, and converting it to a string certainly would, but I think readline without the s would just load a line.

For writing the file, the approach you choose depends on what you're trying to do, but here's one:

Don't make a new file until you find a match - that way you're not going down that route if you don't have anything to replace. Then make a new file, retroactively start your iteration again, and write each line to file (or each byte). When you get to the line to be replaced, write the replacement. Then continue your iter, writing each line (or replaced line) to the file.

Again, the more general / memory efficient approach is to work in bytes (ie ints instead of strs), but probably not required here.

Naive:
Python code:

with open(new_file, "w") as f_new:
    with open(filename) as f:
        while True:
            l = f.readline()
            if not l:
                break
            f_new.write(l.replace("asdfasfdz", "a to z"))
I don't know why finding the official docs for these fns/methods/types is such a struggle, but I'm striking out.

Dominoes fucked around with this message at 23:56 on Jul 26, 2020

The March Hare
Oct 15, 2006

Je rêve d'un
Wayne's World 3
Buglord

Dominoes posted:

I don't know why finding the official docs for these fns/methods/types is such a struggle, but I'm striking out.

https://docs.python.org/3.3/tutorial/inputoutput.html#methods-of-file-objects

Not sure why google returns 3.3 but just edit it for whatever version you're on.

Adbot
ADBOT LOVES YOU

Dominoes
Sep 20, 2007

That's the one Googs was sending me to, but it's a tutorial. I'm looking for a breakdown of each function/method available. Their args, what they return, and a description. Eg so you could see that readlines returns a list of strings where it splits by /n char, loads the whole thing into mem at once etc. readline returns a single string of the next line in the stream if it's there, and raises (what?) if you're at the end etc.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply