|
I usually deploy to Heroku where you spin up dynos for multiple workers. No messing with systemd there so I can't help you with that. I don't think there'd be anything specific to RQ...it's just another process for systemd to manage.
|
# ? Feb 21, 2018 22:02 |
|
|
# ? May 22, 2024 06:38 |
|
Yeah migrating all this stuff to containers is on my to-do list for sure, but right now it's old school Linux admin style.
|
# ? Feb 21, 2018 22:07 |
|
Cingulate posted:You can try this stackexchange answer: Yeah, I saw that post and I think that I want to do the same thing that he wanted to do, but I saw that he said it's not possible in an update. Was just hoping that he was wrong and/or new updates make it possible but I guess not. Anyway on an unrelated topic, does anyone use Jupyter notebooks? I feel like Jupyter notebooks are one of these things that I've always heard about and people rave about it, like Docker, but I don't really "get" it so I'm having a hard time seeing use cases for them. But I'll admit that once I actually sat down and understood Docker sometime last year I went from "meh okay" to "holy poo poo, this is awesome" real fast. I'm hoping the same thing happens with Jupyter notebooks.
|
# ? Feb 22, 2018 15:33 |
|
Generally, if you want to do something in Python and it’s awkward, more often than not you want to do the wrong thing. Just try it with a function, like I said. Everyone and their moms uses notebooks. When I want to multiply two numbers and one of them has more than 3 digits, I open a notebook. That is, if you analyse data.
|
# ? Feb 22, 2018 18:12 |
|
Cingulate posted:Generally, if you want to do something in Python and it’s awkward, more often than not you want to do the wrong thing. Just try it with a function, like I said. About the notebooks: I guess I’m asking why would you do that? What I would do, if I really needed to do this, is just open up a new blank script in PyCharm, type in my multiplication and hit f5 or whatever the hotkey is to run the script. Like I said Jupyter Notebooks seems like a great tool but I just don’t get it. Most of the stuff I’m finding online remind me of when I didn’t get Docker: lots of people saying how great they are but nobody really “showing” how great they are.
|
# ? Feb 22, 2018 18:26 |
|
The basic math thing was hyperbole. Well, what is it that you need to do? Complex plotting things, with multiple open figures, a re a great scenario for notebooks. If you frequently get back to the data itself, even better. In other contexts, other tools will be superior. It depends.
|
# ? Feb 22, 2018 18:30 |
|
I guess I’m asking a question that can’t really be answered so never mind, I’ll just keep using it a bit more and see for myself. I had to make some new plots today so I used a notebook for that. I found that using it in the browser wasn’t really nice because I’m just so used to PyCharm’s autocompletion for everything. PyCharm lets me attach a notebook to a running server too, so that worked better and it gave me autocomplete. But still, I coulda done the same thing with just a script. e: Whelp, I just searched for something and this link came up: https://stackoverflow.com/a/38192558 and that's actually really insightful for me. I didn't think of using notebooks like that. Boris Galerkin fucked around with this message at 19:20 on Feb 22, 2018 |
# ? Feb 22, 2018 19:14 |
|
Boris Galerkin posted:Anyway on an unrelated topic, does anyone use Jupyter notebooks? I feel like Jupyter notebooks are one of these things that I've always heard about and people rave about it, like Docker, but I don't really "get" it so I'm having a hard time seeing use cases for them. But I'll admit that once I actually sat down and understood Docker sometime last year I went from "meh okay" to "holy poo poo, this is awesome" real fast. I'm hoping the same thing happens with Jupyter notebooks. They're less good for most other coding things. There are obviously holdouts but if you work with data in Python, you're going to use notebooks.
|
# ? Feb 22, 2018 19:48 |
|
Boris Galerkin posted:About the notebooks: I guess I’m asking why would you do that? What I would do, if I really needed to do this, is just open up a new blank script in PyCharm, type in my multiplication and hit f5 or whatever the hotkey is to run the script. Like I said Jupyter Notebooks seems like a great tool but I just don’t get it. Most of the stuff I’m finding online remind me of when I didn’t get Docker: lots of people saying how great they are but nobody really “showing” how great they are. I think if you're already a PyCharm user then the only other reason to open a notebook is if you plan to share the results (not just code) with someone else.
|
# ? Feb 23, 2018 04:38 |
|
I'm on OS X. I installed Anaconda thinking it'd be good to keep all the things installed organised. I'm trying to install pygame and it's failing all over the shop. conda install pygame finds nothing, so I went and found https://anaconda.org/search?q=platform%3Aosx-64+pygame Tried each of those and it says... quote:OBOW:~ obow$ conda install -c kne pygame I can install it with pip3 but then that's not found in Conda. https://conda.io/docs/user-guide/tasks/manage-pkgs.html suggests using an env but... quote:OBOW:~ obow$ source activate base
|
# ? Feb 25, 2018 18:03 |
|
After you activate, use "pip" instead of "pip3". Activating conda environments points pip to the right executable, but doesn't bother with pip3, which stays connected to your system Python.
|
# ? Feb 25, 2018 18:20 |
|
SurgicalOntologist posted:After you activate, use "pip" instead of "pip3". Activating conda environments points pip to the right executable, but doesn't bother with pip3, which stays connected to your system Python. Thank you! quote:OBOW:alien_invasion obow$ conda activate base
|
# ? Feb 25, 2018 18:28 |
|
Also, although it appears that python3 points to the right place, I would just use the plain python executable. Once you've activated an env, further disambiguation is not necessary. Executables like python2, python3, pip3, etc. are awkward workarounds for working with multiple pythons without environments. With environments you don't need to worry about that and should use the canonical version of everything, IMO.
|
# ? Feb 25, 2018 19:19 |
|
This is for pySpark, but the syntax should still be the same. I just realized that I know how to search for text, but not replace it. I need to look through a bunch of spreadsheets and replace any "NA" values with zeroes. So I thought about maybe looking for them with a regex:code:
|
# ? Feb 25, 2018 21:33 |
|
Actually, maybe map will do it...I just need to figure out the syntax. Maybe:code:
|
# ? Feb 25, 2018 22:08 |
|
re.sub? https://docs.python.org/3/library/re.html#re.sub
|
# ? Feb 25, 2018 23:46 |
|
map is for transforming a bunch of things (it maps values to other values, basically). So if you have a set of elements, but you want to change the NAs to 0s, your mapping function wants to output a 0 if it gets one of those, and just pass everything else through as-is (or map the value to itself, if you like)Python code:
that way the sequence of elements you get out is the same as what you put in - you've just replaced some of them with different values. What your code does is filter out all the non-NA lines (right?), leaving you with only a bunch of NAs, and then you turn those into a bunch of 0s instead - which isn't very useful! You want to keep all the elements, but use map to selectively change some as they pass through Using a regex find and replace might be a lot better anyway, I just wanted to point out what the functional stuff is about baka kaba fucked around with this message at 02:43 on Feb 26, 2018 |
# ? Feb 26, 2018 02:40 |
|
Yeah, thanks for that. I took a closer look and I might have to convert everything to a dataframe anyways. Also I noticed that out of the 10 csv files I need to combine, they all have columns that are set up differently than the others. This might take a lot of joins to get it to work.
|
# ? Feb 26, 2018 04:13 |
|
Ok so now I'm wondering if using join will get all of these csv files into a nice little pile. I need to combine multiple csv files into one object (a dataframe, I assume) but they all have mismatched columns, like so: CSV A store_location_key | product_key | collector_key | trans_dt | sales | units | trans_key CSV B collector_key | trans_dt | store_location_key | product_key | sales | units | trans_key CSV C collector_key | trans_dt | store_location_key |product_key | sales | units | trans_id On top of that, I need these to match with two additional csv files that have a matching column: Location CSV store_location_key | region | province | city | postal_code | banner | store_num Product CSV product_key | sku | item_name | item_description | department | category The data types are all consistent, i.e., sales is always float, store_location_key is always int, etc. Even if I convert each csv to a dataframe first, I'm not sure that a join would work (except for the last two) because of the way that the columns need to match up. Any ideas?
|
# ? Feb 27, 2018 01:36 |
|
What level of observation do you need the resultant data to be?
|
# ? Feb 27, 2018 01:59 |
|
Pretty detailed...this is the kind of analysis that I'll need to do on the data:
|
# ? Feb 27, 2018 02:21 |
|
Seventh Arrow posted:Pretty detailed...this is the kind of analysis that I'll need to do on the data: Phone posting so I could be missing something, but those first three files look to have the same columns. If that’s the case, then concatenating the files together would work. Then you’d want to do two merges for the files below. One on store location key and the other on product key. If you need help, I can post pseudo code for you here in a bit when I can get back to a laptop. I would do all of this in pandas btw.
|
# ? Feb 27, 2018 02:36 |
|
If you could post an example of what you had in mind, it would be greatly appreciated. I have some other things that I can work on, so no rush.
|
# ? Feb 27, 2018 02:38 |
|
Seventh Arrow posted:If you could post an example of what you had in mind, it would be greatly appreciated. I have some other things that I can work on, so no rush. Here's what I had in mind. code:
code:
|
# ? Feb 27, 2018 02:58 |
|
vikingstrike posted:Here's what I had in mind. That's great, thanks a lot. So I guess "concat" was what I was looking for when it comes to the similar csv files. Does it just automatically look at the column names and sort accordingly? Another bit of interest is the "frame.loc" line...so if I have multiple columns what would the format be like? Maybe something like: code:
|
# ? Feb 27, 2018 03:35 |
|
Seventh Arrow posted:That's great, thanks a lot. So I guess "concat" was what I was looking for when it comes to the similar csv files. Does it just automatically look at the column names and sort accordingly? It's aligning the DataFrames along the columns index, which in this case is their names. So it doesn't matter what their position is, it matters that they are labeled the same. The default parameter is axis=0, which is over the columns (so you are appending DataFrames by stacking them on top of each other, and the column names are telling pandas which data goes where), but you could set it to axis=1, and think of the same exercise based on row indices, too. If one DataFrame has a column the others don't, pandas will create a new column, and assign missing values to the rows/pieces that didn't contain that variable. quote:Another bit of interest is the "frame.loc" line...so if I have multiple columns what would the format be like? Maybe something like: Whoops, there was a typo in my original code. It should read: code:
To give you an idea of how you can build this into more complex expressions. Wonder if for region Canada, for all stores with a location id over 1000, I want to set the missing values in column 'price' to 'GOON', you could do: code:
|
# ? Feb 27, 2018 05:08 |
|
Ok this is the last time I make a nuisance of myself with this dataframe/csv stuff (I hope). There's a few csv files in the set that do not take kindly to having the 'sales' column being classified with a 'float' data type. This is the code that I'm using to convert the files to csv...it's a bit convoluted, but it strips the header and uses its data to form the schema:code:
It works A-OK with the first three csv files, so I'm thinking there has to be something within file_4 that doesn't conform to float specifics...maybe there's some extra whitespace or something (if you're really curious, you can see the file here). This column needs calculations done on it so I can't cheat and classify it as a string. So I guess I'm asking: Is there a way to scan the csv for the problem cell(s)? Or to maybe make the whole column conform to float? I saw on one webpage that you can normalize the data types with code:
|
# ? Feb 27, 2018 19:08 |
|
Can you show what the file looks like? I am pretty optimistic this can be solved with 3 lines of pandas.
|
# ? Feb 27, 2018 19:30 |
|
Is there a reason you're using pyspark? Everything you mention can be done with pandas directly. edit: pandas automatically recognizes sales as a float column. See below. code:
vikingstrike fucked around with this message at 19:48 on Feb 27, 2018 |
# ? Feb 27, 2018 19:45 |
|
Cingulate posted:Can you show what the file looks like? I am pretty optimistic this can be solved with 3 lines of pandas. Yes, I made a tiny little inconspicuous link in my post...here's the whole hog: http://www.vaughn-s.net/hadoop/trans_fact_4.csv vikingstrike posted:Is there a reason you're using pyspark? Everything you mention can be done with pandas directly. For this assignment, I have to do it in pyspark - however, pyspark just uses all the same syntax as vanilla python. I just tried your examples in Spark and got the same results. I guess the only question is whether I have to set up the dataframe using that convoluted setup that I used before, but I guess I don't!
|
# ? Feb 27, 2018 19:56 |
|
Seventh Arrow posted:Yes, I made a tiny little inconspicuous link in my post...here's the whole hog: If you need to create a DataFrame, and pyspark has a DataFrame creator function that gives you the desired output, I'm not sure why you'd try to roll your own. Turning CSVs into DataFrames is some of the most basic functionality of a library like this.
|
# ? Feb 27, 2018 20:06 |
|
vikingstrike posted:If you need to create a DataFrame, and pyspark has a DataFrame creator function that gives you the desired output, I'm not sure why you'd try to roll your own. Turning CSVs into DataFrames is some of the most basic functionality of a library like this. Well, I tried googling on my own to try to solve the problem myself and came across this page: https://www.nodalpoint.com/spark-data-frames-from-csv-files-handling-headers-column-types/ It was fascinating for me, but you're right...it's a lot of busywork for something that can be done in a more simple manner. This is actually kinda-sorta for a job interview. The teacher at the place where I study data engineering was privy to the "skill-testing exercises" that an employer uses and said that if I can solve them, he will try to get me a job interview. I feel bad that it's giving me so much trouble; I mean, data analysis isn't usually in an engineer's job description, but job descriptions in Big Data aren't an exact science right now. Anyways, I hope that I'm becoming a better programmer and thanks for the assist!
|
# ? Feb 27, 2018 20:12 |
|
I'm trying to look up what a movie professionals primary occupation is.Python code:
It's not too slow, but could it go faster? Edit: obvious suggestion would be kicking out all the porn movies. edit 2: Oh wow, I switched to df.groupby and now it's much faster. Cingulate fucked around with this message at 20:37 on Feb 28, 2018 |
# ? Feb 28, 2018 20:05 |
|
In list logic, .remove will remove the first item in a given list that matches the query... How do you delete the last item in a list that matches a particular query? Because the best I can come up with is to reverse the list, apply the remove function, then reverse the list again. And that strikes me as terribly inefficient
|
# ? Mar 2, 2018 00:20 |
|
JVNO posted:In list logic, .remove will remove the first item in a given list that matches the query... You could iterate in reverse.
|
# ? Mar 2, 2018 00:26 |
|
JVNO posted:In list logic, .remove will remove the first item in a given list that matches the query... You can use del to remove an item from a list by index (del list[0]), which might help, depending on what you're actually trying to do.
|
# ? Mar 2, 2018 00:55 |
JVNO posted:In list logic, .remove will remove the first item in a given list that matches the query... I would do it like this: Use reversed() to iterate over the list in reverse (i.e, from back to front) without modifying the list. Find the index to remove, then use the del statement to remove that element of the list. Note that removing elements from the middle of a list is not a particularly efficient operation. Putting it together: Python code:
pre:In [12]: a = [1, 2, 3, 2, 5, 2] In [13]: delete_last_matching_inplace(a, 2) In [14]: a Out[14]: [1, 2, 3, 2, 5] In [15]: delete_last_matching_inplace(a, 2) In [16]: a Out[16]: [1, 2, 3, 5] In [17]: delete_last_matching_inplace(a, 2) In [18]: a Out[18]: [1, 3, 5] In [19]: delete_last_matching_inplace(a, 2) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-19-dc87b4f3a28d> in <module>() ----> 1 delete_last_matching_inplace(a, 2) <ipython-input-3-b7155902c6b2> in delete_last_matching_inplace(a, k) 4 del a[-i] 5 return ----> 6 raise ValueError(f'no element in {a} matches {k}') ValueError: no element in [1, 3, 5] matches 2
|
|
# ? Mar 2, 2018 01:02 |
|
Probably worth benchmarking it (Python's lists are apparently arrays, so a reversed iterator should be fast?) but you could always just iterate over the array normally, assign the index to a variable whenever you find a match, and then at the end it'll be set to the index of the last matching element (or None) Array traversal should be fast either way, not sure about Python's implementation - reverse iteration is neater though (since it stops as early as possible)
|
# ? Mar 2, 2018 03:04 |
|
Wow, great responses and super quick. Unfortunately the responses aren’t easily applied to my own program- and I decided instead to rebuild the program in a way that obviated the need for removal. For anyone curious, I needed a list generated that includes 20 of each of the following: NR L0 L2P L2T L4P L4T L8P L8T For a total of 160 items in the list. All of these stand for different experimental conditions, and are randomly presented, but some conditions are related. The rules are: L0 and NR can go anywhere in the list that doesn’t conflict with another rule. For n = L2P, n + 1 = L2T For n = L4P, n + 2 = L4T For n = L8P, n + 4 = L8T I’m phone posting now, but my new approach is to add the L(X)P items to the list at the start, shuffle the order, and use that as a seed for a pseudo-random procedural generator. The procedural generator will then populate the list with L(X)T items, using L0/NR items as filler when necessary. It’s a heck of a lot more complicated than I thought ought to be necessary (~150 lines of code), and is slower than my usual experiment list generator, but I’m ironing out a final couple bugs (usually missing L(X)P items) and it appears to work.
|
# ? Mar 2, 2018 18:53 |
|
|
# ? May 22, 2024 06:38 |
|
Wallet posted:You can use del to remove an item from a list by index (del list[0]), which might help, depending on what you're actually trying to do. I like this method for pulling off the bottom of the list. code:
Da Mott Man fucked around with this message at 22:50 on Mar 2, 2018 |
# ? Mar 2, 2018 22:46 |