|
Csixtyfour posted:Would this work or do I need to keep working on it? Perfectly acceptable for a student IMO. If you wanted to continue poking at it, you could move your prompt into the input() call, and do input('Enter an integer: ') so you don't have to call print() as a separate statement. I would probably replace the "x" in your loop with an underscore, since you're not actually using it anywhere and some IDEs will complain that you didn't use x anywhere.
|
# ¿ Feb 4, 2020 02:33 |
|
|
# ¿ Apr 29, 2024 07:07 |
|
Or with pytest, use the capsys fixture
|
# ¿ Sep 10, 2020 21:01 |
|
QuarkJets posted:I thought sorted already did that sorted() gives you the sorted keys, not key-value pairs. You can do something like this but I'm not aware of a native method on dict that would do this for you. Python code:
|
# ¿ Sep 28, 2021 21:27 |
|
Neat, that's definitely nicer to look at than a dict comprehension. I guess I've never really thought about lists of tuples being transformed back to a dict like that, but it makes a lot of sense.
|
# ¿ Sep 28, 2021 23:38 |
|
I've never worked with Qt so I can't speak to the offload of work in QThreads or whatever, hopefully this gets the caching idea across, and may get you thinking about how you can modify your program's structure to make your life a little easier in the future:Python code:
The first time I call for Pikachu using from_cache, it's 100-200ms to retrieve from the API. The next time I ask for a Pikachu, the data is already local so I'm just spending the time to access the dictionary of Pokemon objects (absurdly fast in comparison, at the cost of a little bit of RAM). Python code:
code:
|
# ¿ Mar 25, 2022 07:00 |
|
D34THROW posted:Pokemon.from_cache returns a new instance of Pokemon (i.e. a constructor) if the Pokemon is not in the API, or the copy of the locally cached data. In other words, from_cache is a Pokemon factory? A classmethod is effectively a factory, really all it's doing is receiving the class as an argument, along with potentially other arguments, and then calling the constructor (or another classmethod that eventually calls the constructor). It's worth noting that it has access to class attributes, but not instance attributes: Python code:
E: since I was mocking this up pretty late at night, I didn't go into error handling, but in the case of a Pokemon not existing in the API, the code I posted above blows up. Were this code I was actually using, I'd be checking the status code of the response at a minimum, and raising a better exception (along the lines of a PokemonNotFoundError) and handling that error with something in the UI. In fact, it looks like that API only supports lowercase names, so doing a .lower() and .strip() on the name in the classmethod is probably a good idea too! nullfunction fucked around with this message at 19:13 on Mar 25, 2022 |
# ¿ Mar 25, 2022 19:01 |
|
Sure, that's a very common use case for classmethods, though without seeing the rest of your code, I don't see a reason you need two separate classes for that example.Python code:
|
# ¿ Mar 25, 2022 19:22 |
|
I don't think the fact that it's community-run or beta really answers the "what happens if the API is no longer reachable?" question, if anything, it underscores it. It's a question you should be asking yourself each time you interface with something outside your immediate control, though, even as a hobbyist. Answering those questions will give you natural boundaries in the code you write, hints on where to break things apart into more manageable chunks. Code that works and does what you intend it to do is an achievement at any level, and if you have no ambitions past hobbyist that's fair game too. Rather than shelving it, try adding a local file cache! Even if you're just saving the JSON you got from the webserver, it means you can still use it when your internet connection is down, and a refactor would do the code you posted some good, especially if you ever want to change anything about it in the future.
|
# ¿ Mar 26, 2022 03:00 |
|
Nobody likes startswith()?Python code:
|
# ¿ Aug 18, 2022 19:09 |
|
KICK BAMA KICK posted:
lower() is O(n), so yes, for large numbers of very long strings you may see a large benefit with this approach. I just learned that startswith can accept a tuple of strings, so you can skip the lower() and just Python code:
|
# ¿ Aug 18, 2022 23:09 |
|
QuarkJets posted:What's the actual performance difference between value = a_dict.get(key, default) and value = default if key not in a_dict else a_dict[key]? I'm guessing it's going to be negligible enough that I wouldn't let it effect my code at all. I'm an optimization blow-hard but even for me that's beyond what I'd consider doing. There is surely a more effective way to optimize the performance of the software than doing this kind of replacement. I was curious about this too. code:
|
# ¿ Oct 11, 2022 21:15 |
|
QuarkJets posted:And this gives performance that's a little worse, but still better than calling d.get directly??? Ultimately both are pulling work out of the loop, so yeah, I'd expect them to be faster than a call to d.get(). I do see that it's about 3ns faster (regardless of whether the dict is populated) than the operator approach's worst-case on my machine, which surprised me a bit, still the operator's best-case wins handily, but QuarkJets posted:Just goes to show how important it is to profile before trying to optimize is the real takeaway here.
|
# ¿ Oct 11, 2022 23:54 |
|
Dawncloack posted:I have a regex problem. You're missing the re.MULTILINE flag: Python code:
Obligatory "now you have two problems" and all that aside, I would rethink the way you've chosen to write your regex. Look into \s at the very least if you want to capture whitespace, using a .* or .+ when you actually want whitespace is guaranteed to wreck your day at some point in the future.
|
# ¿ Nov 22, 2022 22:55 |
|
nullfunction posted:now you have two problems As a general rule if you want to interrogate Python scripts in any meaningful way, you should look into the ast module. It's part of the standard library, though I'd wager that someone's done the hard work and written parsers for php and bash as well and you could take a similar approach for other files that you may need to parse with a quick pip install. Python code:
Python code:
code:
|
# ¿ Nov 22, 2022 23:40 |
|
Seventh Arrow posted:edit: oh wait, if I just convert the "=" to ":", will that get me a valid json? or is there a catch? You should set this line of thinking aside and step back a little bit. You've correctly identified that the data doesn't look like JSON. It's also not a format I recognize immediately other than to say it's structured and probably from the SOAP API as opposed to the newer REST API. Are you using the Python client for this API? If so you should have Python objects. How did you get to this format?
|
# ¿ Jan 5, 2023 21:13 |
|
DoctorTristan posted:I think those are python objects - specifically a list of BounceEvent instances. Presumably that’s a class defined by SalesForce and op got the output above by print() -ing the output while paused in a debugger. This is the key. I had figured you were print()ing the result, and there's a pretty important thing to call out here that will probably help you in many ways going forward, especially working with unfamiliar APIs. When you print() an object, all you're doing is asking that object for a string representation of itself. There's a default representation that gives you something like <__main__.Foo object at 0x0000deadbeef> if you choose not to implement your own __repr__() but it can be literally any string representation that the developer wants, including JSON or some weird string like you have here. Here's a small contrived example to hopefully illustrate: Python code:
Python code:
Unless there's a particularly mature and well-maintained third party API that does exactly what you need it to (simple-salesforce is great but doesn't cover everything!), I'd go with the official API. You'd need something really compelling to convince me that adding a third party dependency over the official dependency is the right move for this, especially if the APIs they expose are similar. Ultimately you're going to have to do the work of mapping the columns you want (object data) to the right format (pandas dataframe) so that you can use the to_sql call regardless, so you may as well do it against the official API. Seventh Arrow posted:As an aside, I've developed a severe aversion to saying, "this next task should be pretty easy." Congrats, you can call yourself a programmer now.
|
# ¿ Jan 6, 2023 01:36 |
|
I had a look at the SDK link you posted and it's the same one linked from the Salesforce page, so I guess you're using the only choice. I can speak to what you need to do conceptually, but I haven't used pandas, all I've done is have a quick look at what classmethods exist to create a dataframe. I know it's a super popular library so I'm certain you can google "how do I load a pandas dataframe" and get to the same pages I would see. Just don't take this as an optimized solution, it's something to get you started. I gather that you're new to writing code in general, so I'll try to break it down as far as I can, but you should go through the exercise of actually implementing all of this if it's gonna be your job. You're gonna be doing stuff like this a lot, and frankly Python's a great language to do it in. Part of navigating a huge ecosystem like Python's is going to be learning to use some of the built-in tools to self-service when exploring unfamiliar objects, as documentation isn't always available or useful when it does exist. I'm going to have to make some assumptions because I don't have access to Salesforce data, but let's take the repr from the BounceEvent you posted previously as a guide to what might exist on that object: quote:(BounceEvent){ The structure of the "Client" field suggests that the ClientID is a separate object or something more complex that you'll need to figure out and map accordingly, but overall this looks fairly straightforward in terms of mapping, provided you can locate the properties to get these values out of the object. I can see that this is a single bounced email, so I know that each BounceEvent maps to a "row" in pandas. I know that Pandas can accept a list of dicts (where each dict is a row, and each key in the dict is a column) into a dataframe, and you've indicated you want to use a dataframe to load the data into SQL so let's design with that goal in mind... we need a list of dicts to feed into pandas. pre:python pulls data python somehow gets pandas pushes the data from salesforce ==> that weirdly-formatted data ==> using to_sql to the marketing cloud into a dataframe client's Azure SQL DB Python code:
Now that you have all this code in a convenient function, it's a good time to fire up the REPL, paste the function in, and grab an object to play around with: code:
Python code:
Noodling around with dir(), calling methods, reading properties, using type() to map out the object, taking notes of what calls you need to make to get all your data out... these are things you'll want to get good at, you'll probably do them a lot. They're also super useful in a pinch when debugging. At this point, you should have a function that you can feed a BounceEvent into, and get a dict out of that has everything you need for pandas. If you can get something loaded into a dataframe for one example BounceEvent, you've got a pretty good chance that at least one other BounceEvent will work too, so when exploring this way, I always try to start getting a single example of something working, then feeding it more data to try to validate that it works with more than just the one event you're testing. Python code:
A strikingly large portion of the programs you'll write follow the same basic pattern as what you're trying to do above, so much so that entire industries have grown up around it. Have a look at the wiki page for ETL as it may give you some starting points for stuff to research further, vocab, etc.
|
# ¿ Jan 6, 2023 21:11 |
|
StumblyWumbly posted:I have a Python memory management and speed question: I have a Python program where I'm going to be reading in a stream of data and displaying the data received over the last X seconds. Data outside that window can be junked, I'd like to optimize for speed first, and size second. I know how I'd do this in C, but I'm not sure about Python memory management. If you want to hold the last N items in a structure similar to a ring buffer without implementing your own, a deque is probably what you're after: https://docs.python.org/3/library/collections.html#collections.deque Deques can be given an optional maxlen to limit their size, but if you're expecting the stream volume to ebb and flow, you're probably better off implementing the cleanup of the tail end yourself and tying it to the time of the streamed event. In terms of performance, it's O(1) to access either end of the deque, O(n) in the middle, so if you're not doing a ton of random reads, or have a relatively small collection, a deque makes for a really nice choice.
|
# ¿ Feb 11, 2023 21:52 |
|
Jabor posted:This doesn't look like an arraydeque to me. It should have O(1) indexed access (as long as you're just peeking at elements and not adding/removing). They are indeed different. I'm not aware of an arraydeque equivalent with fast access in the middle in Python's stdlib though I'd be surprised if there wasn't an implementation somewhere out there in the broader ecosystem. Cachetools is an option, provided an LRU cache with TTL on top of it is acceptable. If you have the necessary skills and need the raw performance, binding something faster to Python is always an option as well. I would start by throwing together a proof of concept in Python and profiling it before adding dependencies or binding another language, though.
|
# ¿ Feb 12, 2023 00:53 |
|
QuarkJets posted:
Best practice is sort of difficult to gauge here, because there are a dozen different ways to do something like this depending on the finer points of your requirements and what you're trying to optimize. Especially given the number of ways you can run a container on AWS alone, to say nothing of other techniques within AWS or other cloud providers. As mentioned by a few others, if you just want a static site to display some images, S3 web hosting makes for a fantastic choice. Add a second bucket in a different region, replicate files there, throw CloudFront in with origin failover and it's suddenly globally-distributed and tolerant to a region failure -- pretty powerful for not a lot of work! There really isn't much Python involved here on the hosting side, other than maybe pushing files to S3 using boto3, generating your HTML file from a template, etc. It's nicely optimized for simplicity without sacrificing availability, and should be extremely cost-effective to run as well. It's also completely manual, if you want to add a new image to the 10 most recent, you have to regenerate your source files and reupload everything, there's no compute associated with this method. If you wanted a solution that leans on Python more heavily as the brains, I would use AWS Lambda for something like this. I don't think I would ship it as a container, I'd probably just live with using 3.9 until AWS finally gets their poo poo together with 3.10 and 3.11 runtimes since it's a very simple task and probably doesn't require a lot of maintenance. You could use a Lambda function to generate the HTML on the fly but I'd probably create the skeleton as a raw HTML file that lives in S3 and use htmx to call the function that asks S3 for files, determines the 10 most recent, and returns them as a chunk of HTML without having to write any JS on the frontend. If you had a suitable method of authenticating users, you could also use Lambda to return a signed URL that will let you upload files directly to the S3 bucket... no extra compute handling required to process a file upload, it's completely between the browser and S3 at that point! To take this from a toy example into the real world, you'd probably want to think about : - AuthN/AuthZ on your endpoint(s) - Will end-users be rate-limited? Will results from the Lambda function need to be cached to keep from running into concurrency limitations if web traffic increases significantly? API Gateway can do rate limiting and caching for you but comes with a price and its own limitations around max request timeout. Adding HA means another API Gateway in another region with all the same backing Lambdas deployed there too. - How often are we adding new images? Should we transition older images that don't need to be retrieved often to cheaper storage classes for archival, or should they be deleted after some time or falling out of the 10 most recent? - Will image files stored in S3 be forced to conform to a certain naming spec, or will the original filename be used for storage? Is the S3 bucket allowed to grow unbounded? S3 doesn't have a way to query objects by their modified time, which means potentially lots of API calls to find the most recent objects if you don't prefix your objects with some specifier to help limit the number of queried objects. S3 bucket replication requires that object versioning be turned on, does your software correctly handle an object that has had several versions uploaded? - If we need to maintain a record of historical images past the 10 most recent, should we store metadata about the objects in another place where we can do a faster lookup, then just reference the objects in S3? Is eventual consistency sufficient for this lookup? Lambda -> DynamoDB is a very common pattern, but is by no means the only option. If you want to use RDS, be sure you have a proxy between Lambda and your DB if you don't want to exhaust your connections when traffic spikes. - As the number of moving parts involved grows, managing deployment complexity is going to be a bigger deal. Ideally you started building this out with your favorite infrastructure as code tool but if not, you'll definitely want to -- Terraform, CDK, CFN, doesn't really matter. I like CDK because I can stay in Python while defining my infrastructure, but it's not perfect (none of them are). You could probably come up with a dozen more items to think about that aren't here, and to be clear, there are plenty of ways to accomplish the above in the hosting environment of your choice. You could opt to run the whole thing (object retrieval and storage too) in Django or Flask or FastAPI or whatever if you wanted the experience of building it yourself and don't mind a little undifferentiated heavy lifting.
|
# ¿ Feb 19, 2023 00:07 |
|
Each one of your inputs should have a unique ID that you specified on the front end. When you POST data to an endpoint by submitting a form, your browser is grabbing all the stuff you put in the form and sending it to the server (starlette in this case) in a request. Each field in the form, visible or not, gets pushed up in that request as a key-value pair. The key will be the ID you gave your input, the value will be whatever's been entered on the page. When you handle that request in starlette, it should give you all of the form data as a request parameter in your handler function. Each web framework is a little different but that request object should contain everything that was submitted in the form somewhere within it. Your inputs will likely be stored in a dict, with the key matching whatever the ID was on the front end.
|
# ¿ Feb 20, 2023 01:43 |
|
Seventh Arrow posted:I haven't been slamming the site with requests, so I don't think they're blocking me. Thoughts? Ideas? The problem is how you've specified the classes on your find_all. What you've written asks for all <li> elements that have a class that matches that whole big long string of classes. The only place I see those classes is applied to the top-level HTML element, not to any of the <li> elements. Digging around in the DOM, I was able to find the <li>s that correspond to the listings, they look like this: code:
Python code:
|
# ¿ Feb 21, 2023 02:27 |
|
No mention of zip() as a way to grab items in order from a collection of iterables?Python code:
You can also use itertools.zip_longest() if you wanted to handle replacing missing grades with zeroes, for example: Python code:
|
# ¿ Feb 23, 2023 20:35 |
|
Pydantic is more about data validation, not really applicable to the question as I understand it. I'm unaware of any kind of generic declarative framework like that. The suggestion to use deepdiff is, in general, a good one -- I've used it in the past to root out differences in large dict structures and act upon those differences -- but the gap between "I want to declare how I want the world to look" and "now I have dictionaries that I can bounce against each other" is significant if you want to generalize things. If you can't bend one of the existing tools like Ansible to your needs, you're probably going to have to build something. If you're going to build something, think really hard about whether or not you truly need it to be so flexible, especially if it FISHMANPET posted:only gets executed a few hundred times ever.
|
# ¿ Mar 10, 2023 01:54 |
|
Falcon2001 posted:I honestly wonder if just making a fake version of the client is the right answer, since this is starting to get a little insane. Would something like responses help? Check into the registry / dynamic responses stuff. I've found it to be extremely needs-suiting for mocking APIs in test clients.
|
# ¿ Mar 30, 2023 22:38 |
|
Dataclasses are cool and good and if you're accustomed to just throwing everything in a dict it's easy to get started and see immediate benefits. Pydantic is really helpful if you are getting your data from a less-than-trusted source and need to ensure that everything comes in exactly as you expect by using its validators. Naturally there's a cost to this extra processing, and whether or not this is acceptable will depend on your requirements and goals, but I can say that for all of my use cases the ergonomics of using Pydantic far outweighed the performance cost. I haven't had the opportunity to use the new 2.x series but I understand the core has been rewritten in Rust and is significantly faster than the 1.x series. If your data is of a predictable format or you need to stick to the stdlib, dataclasses will suffice. If your data is uncertain and you don't mind consuming an extra dependency, it's hard not to recommend Pydantic.
|
# ¿ Aug 4, 2023 22:37 |
|
Hopefully you've found the locale module in the stdlib to do your output formatting and just missed the fact that locale.atof() exists. The name isn't intuitive, but it parses strings according to your locale settings, which you can change using setlocale(). https://docs.python.org/3/library/locale.html
|
# ¿ Aug 9, 2023 00:23 |
|
Pretty sure the v2 equivalent you're looking for is FieldValidationInfo, which is passed into each validated field like the values dict was in 1.x, it just has a slightly different shape.Python code:
|
# ¿ Sep 5, 2023 05:30 |
|
Yeah, fair. I was mucking about with the rest of the code to play with v2 a bit more and didn't include half of the things that would make this better. So far v2 seems good, though I need to spend some time with their automated upgrade tool.
|
# ¿ Sep 5, 2023 05:55 |
|
Armitag3 posted:from them import frustration as usual
|
# ¿ Sep 27, 2023 00:26 |
|
Generally I like the obnoxious itertools solution but as mentioned, yeah, doesn't quite meet the requirements laid out in the prompt:quote:Generate a Normally distributed random walk with a starting value of 0 as a Python list I'd combine the techniques to just accumulate from a generator. Python code:
nullfunction fucked around with this message at 21:39 on Nov 11, 2023 |
# ¿ Nov 11, 2023 21:37 |
|
Yeah, most of the typing stuff you'd likely reach for has been a part of the language for years now. Sometimes you'll have a dependency that requires a particular Python version, which informs the ceiling on features available to you. If what you produce is a module, you'll sometimes find that consumers of that module have some sibling dependency that is stuck on an older Python and limits your ability to use the features you want. If you're looking for a baseline here's what I tend to do for all greenfield code and all code being refactored on an existing system. 1. Every argument should have a type annotation, every function should have a return type annotation. 2. Lists and dicts should always be annotated with what they are expected to contain in arguments. If you declare one in a function body (assuming Python >= 3.9) it should be annotated if declared empty or the type can't be inferred from the assignment expression. list isn't enough, list[str] is always better. 3. Complex or deeply-nested structures get turned into something more manageable either by aliasing or dataclasses. Type aliasing has existed since 3.5, though you'll have to pull in typing.Dict and typing.List rather than use the types themselves in annotations if you're on something older than 3.9. Nobody wants to see a pile of methods expecting a list[dict[Union[int, str], Union[int, str, bool, float, dict[str, Union[str, int, float]]]]] or whatever. 4. Any is a smell, but a useful one. There's no one answer when asking "how far should I go with types?" because the factors to consider aren't uniformly weighted in every organization and environment. If what you produce is a module consumed by other teams, effort spent adding type hints has a base payoff to your immediate team and then is further increased by your making your consumers' lives easier. If the codebase in question is a very simple service that will be thrown away after a certain time or get touched once every never, investing hours carefully defining the intricacies of a deeply-nested dictionary in a request payload may never pay off and slapping a dict[str, Any] in may be the wiser choice because there are more impactful tasks at hand. If you expose a set of primitives for your consumers to use in complex ways, the benefits from typing are much greater than if your interface is a single function call in a single module. If you're adding types to something that is established and has consumers that would be mad at you breaking changes, I would be very careful about the temptation to refactor as you add types, because the act of adding types to untyped code will almost certainly expose problems or failures of abstraction that you didn't realize were lurking all along. I find it better to use Any as an indication that we know something is not right and can't do any better without breaking compatibility. It differentiates from Unknown (a signal that this portion of the codebase has not had a typing pass) and serves as a target for a future refactor. Falcon2001 posted:anyone who uses another datetime format is a heretic and does not deserve to draw breath.
|
# ¿ Mar 23, 2024 00:23 |
|
|
# ¿ Apr 29, 2024 07:07 |
|
Seventh Arrow posted:At first, I had something much simpler that just did the assigned task, but I figured that I would also add commenting, error checking, and unit testing because rumor has it that this is what professionals actually do. I've tested it and know it works but I'm wondering if it's a bit overboard? Feel free to roast. I'll preface this with an acknowledgement that I'm not a pyspark toucher so I'm not going to really focus on that. From the point of view of someone who is reviewing your submission, I'm very happy to see comments, error checking, and unit tests! They give me additional insight into how you communicate information about your code and how you go about validating your designs. However, if the assignment was supposed to take you 4 hours and you turn in something that looks like it's had 40 put into it, that isn't necessarily a plus. Since you're offering it up for a roast, here are some things to consider:
It's clear that you've seen good code before and have some idea of what it should look like when trying to write it for yourself, but you're missing fundamentals and experience that will allow you to actually write it. To be clear this is a fine place to be for a junior. If this is for a junior role, submitted as-is it's a hire from me but there's a lot of headroom for another junior to impress me over this submission.
|
# ¿ Apr 27, 2024 00:36 |