|
I want to play around with machine learning for fun. To start with say I want to build and train a model to tell me if a picture is a (hot)dog or not a (hot)dog. How do I get started?
|
# ? Aug 13, 2018 06:57 |
|
|
# ? Jun 5, 2024 04:56 |
|
Boris Galerkin posted:I want to play around with machine learning for fun. To start with say I want to build and train a model to tell me if a picture is a (hot)dog or not a (hot)dog. How do I get started? If you want to have a clue what you're doing take Andrew Ng's machine learning class on coursera, if you don't install scikit learn, read some of the docs and go nuts.
|
# ? Aug 13, 2018 07:21 |
|
Hey everyone, looking for a little help with figuring out why I can't get any of my IDE's to import "quandl" or "Quandl". I'm running Windows 10, 64 bit OS. I've installed a couple of other packages using this command: "py -m pip install [name]" I use "py -m pip list" to see what is installed. I see that I have: C:\WINDOWS\system32>py -m pip list Package Version --------------- --------- absl-py 0.3.0 astor 0.7.1 certifi 2018.4.16 chardet 3.0.4 gast 0.2.0 grpcio 1.14.1 idna 2.7 inflection 0.3.1 Markdown 2.6.11 more-itertools 4.3.0 numpy 1.14.5 pandas 0.23.4 pip 18.0 protobuf 3.6.0 python-dateutil 2.7.3 pytz 2018.5 Quandl 3.4.1 requests 2.19.1 scikit-learn 0.19.2 setuptools 28.8.0 six 1.11.0 sklearn 0.0 tensorboard 1.10.0 tensorflow 1.10.0 termcolor 1.1.0 urllib3 1.23 Werkzeug 0.14.1 wheel 0.31.1 When I go to my IDEs, I have MSV 2017, Anaconda, and regular Python 3.6.4 Shell, and attempt to import Quandl I get: In MSV using Python 3.6 64 and 32 bit: Traceback (most recent call last): File "<stdin>", line 1, in <module> ModuleNotFoundError: No module named 'Quandl' In Anaconda, using Spyder 3.2.8: Traceback (most recent call last): File "<ipython-input-2-9b0ae57416bb>", line 1, in <module> import Quandl ModuleNotFoundError: No module named 'Quandl' In Python 3.6.4 shell: Traceback (most recent call last): File "<pyshell#1>", line 1, in <module> import Quandl ModuleNotFoundError: No module named 'Quandl' I've uninstalled using the same command line as installing except with the command "uninstall" and then reinstalled without any issues or flags raised in my command prompt in admin, but still nothing. The other packages (sklearn, pandas, etc.) have all installed and import just fine. I wasn't sure if this thread or the non existent python thread in Haus of Tech Support would be better. I've searched Google and it's the same answer over and over, pretty much just making sure the programmer has pip-installed the package correctly or showing how to pip-install using command lines that don't work on my system. (which I would also love to get some info or reading material as to why, btw). No changes to my system. I'm stationed in Japan but I'm using a computer I bought an "american" HP (just trying to provide as much info as possible in accordance with the Haus rules.) Anyway, any help would be appreciated. Thanks!
|
# ? Aug 13, 2018 11:48 |
|
The import's lower-case for that module, despite the uppercase module name. You said you tried it, but your examples all use upper-case. I can't see what's wrong; try import quandl again.
|
# ? Aug 13, 2018 15:06 |
|
Hollow Talk posted:Have a look if pandas is good enough. Depending on how deep the nesting is, I usually tend to use nested dict comprehensions to flatten dictionaries when I want more control and/or need to ensure data quality in a more granular way. I find this particularly useful for large data, since it gives me more explicit control over chunking, and I can use iterators for lazy evaluation (pandas is not great when trying to parse a 90GB JSON file, for example). Thanks for your responses. I will take a look at Airflow and Luigi once it starts getting complex enough. I'm also using Stitch so I could also change over to the Singer Tap methodology once it gets complicated enough to benefit from the Stitch backend, and integrate with their other integrations. For the second point, could you tell me what you mean by using nested dict comprehensions to flatten dictionaries? Like a code example from Stack Overflow would be good. My searches haven't led to much. There are 3 levels of nesting I'm potentially looking at, but they are optional since I can get those attributes by calling another API.
|
# ? Aug 13, 2018 15:30 |
Mark Larson posted:Thanks for your responses. I will take a look at Airflow and Luigi once it starts getting complex enough. I'm also using Stitch so I could also change over to the Singer Tap methodology once it gets complicated enough to benefit from the Stitch backend, and integrate with their other integrations. https://www.datacamp.com/community/tutorials/python-dictionary-comprehension should be an OK start.
|
|
# ? Aug 13, 2018 15:40 |
|
Mark Larson posted:Thanks for your responses. I will take a look at Airflow and Luigi once it starts getting complex enough. I'm also using Stitch so I could also change over to the Singer Tap methodology once it gets complicated enough to benefit from the Stitch backend, and integrate with their other integrations. cinci zoo sniper posted:https://www.datacamp.com/community/tutorials/python-dictionary-comprehension should be an OK start. This has a few good examples. To be fair, I don't think nested dictionary comprehensions are necessarily any more readable than nested for loops. This is an example from a (really rather) quick and dirty script I needed in order to extract and subsequently flatten the contents of a SPSS file: code:
|
# ? Aug 13, 2018 22:16 |
Yeah I prefer to write loops of customised “getter” functions for non-trivial JSON/XML parses, but I do calculations on them, rather than ETL stuff, so that approach lends itself to dispose of intermediaries in a lazy fashion.
|
|
# ? Aug 14, 2018 05:33 |
|
BannedNewbie posted:If you want to have a clue what you're doing take Andrew Ng's machine learning class on coursera, if you don't install scikit learn, read some of the docs and go nuts. This one? Just signed up to Coursera and this course says I can either pay or do it for free. Do I lose out on anything if I do it for free other than the certificate? (Never done one of this online courses before. Are the certificates even worth anything?)
|
# ? Aug 14, 2018 06:44 |
|
completion rate is somethin like 5% free 50% paid besides that, you don't get anything, really. the cert's worth bupkis
|
# ? Aug 14, 2018 06:49 |
Yeah the only point of certificates is to serve as a poor completion motivator. And yes, that is the right course and it is absolutely fantastic - one of the first recommendations I make to less experienced hires into my data science team.
|
|
# ? Aug 14, 2018 08:53 |
|
Boris Galerkin posted:This one? Just signed up to Coursera and this course says I can either pay or do it for free. Do I lose out on anything if I do it for free other than the certificate? (Never done one of this online courses before. Are the certificates even worth anything?) It gives you a sense of ~~ pride and accomplishment ~~ I finally put up for a paid subscription after trying out a bunch of courses and never getting past the 2nd week. It's working, I'm halfway through one course and 80% done with another. It's $50 a month, which if you do more than one course at a time isn't bad. By the way, you can have Coursera automatically post your certificates on LinkedIn if you use that. You can also go through the material for free for a few months and then register for the course you want, to get the certificate if you are that hard up. Mark Larson fucked around with this message at 10:56 on Aug 14, 2018 |
# ? Aug 14, 2018 10:50 |
|
Thanks guys. No certificate for me it is.
|
# ? Aug 14, 2018 11:22 |
|
Dominoes posted:The import's lower-case for that module, despite the uppercase module name. You said you tried it, but your examples all use upper-case. I can't see what's wrong; try import quandl again. I didn't include the error message for the lower case "quandl" because they were the same, with the exception of the physical memory location address. (i think it's the address anyway) Still no luck.
|
# ? Aug 14, 2018 11:25 |
|
a forbidden love posted:Anyway, any help would be appreciated. Thanks!
|
# ? Aug 15, 2018 02:11 |
|
Peeny Cheez posted:Are you running python 2 and 3 side by side? You may have installed it in 2 and not 3. Nope. Only 3. I only had the shell then I got anaconda, and nothing still, then I added the ie to msv and still nothing. Last shot would be a clean install all over again but I'm not sure how effective that would be since every other package installs and imports correctly.
|
# ? Aug 15, 2018 03:26 |
|
Can you do a help('modules') in whichever interpreter you want to use and see if Quandl is listed?a forbidden love posted:I've searched Google and it's the same answer over and over, pretty much just making sure the programmer has pip-installed the package correctly or showing how to pip-install using command lines that don't work on my system. (which I would also love to get some info or reading material as to why, btw). Which command line commands don't work for you? Boris Galerkin fucked around with this message at 08:29 on Aug 15, 2018 |
# ? Aug 15, 2018 08:27 |
|
Symbolic Butt posted:Alternatively it seems like a very turtle thing: Perfect! I've never even touched turtle, though I have of course heard of it for a couple decades...
|
# ? Aug 15, 2018 20:39 |
|
I really want to like pipenv but I just can't because it feels like it's a buggy cryptic mess and nothing it tells me/outputs makes any sense at all. For example I just created a new pipenv with the following Pipfile: code:
code:
It was a fun experience but back to conda environments I go. e: Meanwhile, jupyter was successfully installed in this pipenv and it works fine. What the hell does any of this even mean. Boris Galerkin fucked around with this message at 08:56 on Aug 16, 2018 |
# ? Aug 16, 2018 08:52 |
|
Boris Galerkin posted:Can you do a help('modules') in whichever interpreter you want to use and see if Quandl is listed? Of course, thanks for the assist. code:
Boris Galerkin posted:Which command line commands don't work for you? According to the python documentation website (https://docs.python.org/3/installing/index.html#installing-index) I should be able to install using python -m pip install SomePackage But to get it to work on my shells I use: py -m pip install somepackage So I guess not a command per se. I don't know why it's different on my system, but works just fine for others using the same OS and IDE. Here's a copy/past of my command prompt install (well you get what I mean), and as far as I can tell it looks like everything loaded just fine. C:\WINDOWS\system32>py -m pip install quandl Requirement already satisfied: quandl in c:\program files (x86)\microsoft visual studio\shared\python36_64\lib\site-packages (3.4.1) Requirement already satisfied: more-itertools in c:\program files (x86)\microsoft visual studio\shared\python36_64\lib\site-packages (from quandl) (4.3.0) Requirement already satisfied: inflection>=0.3.1 in c:\program files (x86)\microsoft visual studio\shared\python36_64\lib\site-packages (from quandl) (0.3.1) Requirement already satisfied: requests>=2.7.0 in c:\program files (x86)\microsoft visual studio\shared\python36_64\lib\site-packages (from quandl) (2.19.1) Requirement already satisfied: pandas>=0.14 in c:\program files (x86)\microsoft visual studio\shared\python36_64\lib\site-packages (from quandl) (0.23.4) Requirement already satisfied: six in c:\program files (x86)\microsoft visual studio\shared\python36_64\lib\site-packages (from quandl) (1.11.0) Requirement already satisfied: numpy>=1.8 in c:\program files (x86)\microsoft visual studio\shared\python36_64\lib\site-packages (from quandl) (1.14.5) Requirement already satisfied: python-dateutil in c:\program files (x86)\microsoft visual studio\shared\python36_64\lib\site-packages (from quandl) (2.7.3) Requirement already satisfied: certifi>=2017.4.17 in c:\program files (x86)\microsoft visual studio\shared\python36_64\lib\site-packages (from requests>=2.7.0->quandl) (2018.4.16) Requirement already satisfied: urllib3<1.24,>=1.21.1 in c:\program files (x86)\microsoft visual studio\shared\python36_64\lib\site-packages (from requests>=2.7.0->quandl) (1.23) Requirement already satisfied: idna<2.8,>=2.5 in c:\program files (x86)\microsoft visual studio\shared\python36_64\lib\site-packages (from requests>=2.7.0->quandl) (2.7) Requirement already satisfied: chardet<3.1.0,>=3.0.2 in c:\program files (x86)\microsoft visual studio\shared\python36_64\lib\site-packages (from requests>=2.7.0->quandl) (3.0.4) Requirement already satisfied: pytz>=2011k in c:\program files (x86)\microsoft visual studio\shared\python36_64\lib\site-packages (from pandas>=0.14->quandl) (2018.5)
|
# ? Aug 16, 2018 12:26 |
|
And now I just found out that I also can't import "tflearn" (tensorflow). I guess Turing just doesn't want to let me learn about ML or AI Edit: It's also not listed in the "modules" prompt, but also does show up in the "list" prompt.
|
# ? Aug 16, 2018 12:51 |
|
Boris Galerkin posted:I really want to like pipenv but I just can't because it feels like it's a buggy cryptic mess and nothing it tells me/outputs makes any sense at all. pipenv is kind of weird still, but note that it just gave you a warning, not an error. pipenv could be more helpful in explaining what happened here, but I think what its telling you is that the sub-dependencies of your dependencies declare require versions of widgetsnbextension that are not resolvable. In other words: 1. one sub-dependency requires exactly version 3.3.1. 2. one sub-dependency requires 3.3.* 3. one sub-dependency requires 3.4.* There is not a version you can install that meets all of the requirements at the same time. Pipenv defaults to installing the latest version and does so, and thats why things actually work. Libraries generally don't check the versions of their sub-dependencies at runtime, so the package that required 3.3.1 and the package and 3.3.* go ahead and just do their imports and use the code and you just happen to luck out that those packages don't use any APIs whose behavior changed between 3.3 and 3.4. Or maybe the behavior has changed, but it's in a subtle way that will not cause a crash but will cause errors is your data! Anyway, pipenv can't fix this poo poo, but it could give better warning messages. Of note, your conda environments have the exact same problem, its just that you don't even get a warning. dependencies and dependency resolution is poo poo and we develop on a house of cards Thermopyle fucked around with this message at 19:48 on Aug 16, 2018 |
# ? Aug 16, 2018 19:46 |
|
Thermopyle posted:pipenv is kind of weird still, but note that it just gave you a warning, not an error. I found on one of their closed GitHub issues that apparently the "*" version signifier doesn't mean wildcard as it does in every other usage in the world, but instead it means "latest version." I don't really have anything constructive to say about that so it's whatever. Maybe I assumed too much, but I thought that specifying wildcards for every package would be translated to "I don't care which versions you use, just get me the latest compatible versions of every package I listed." Isn't that what requirements.txt does if you just list packages and not versions?
|
# ? Aug 17, 2018 08:08 |
|
I am not 100% positive about this but it looks like you are having this issue because you are installing your packages into one environment and then trying to use it with a different environment; i.e, when you say you installed it from the command line you installed it into environment A, but then in your Anaconda IDE it is using environment B, and also in your MVS IDE you are yet again using environment C. I'm assuming this based on the paths in your post: code:
code:
code:
This is just a guess cause I've never used the IDEs you use and I have no idea how Windows handles paths and such but I wouldn't be surprised if each of your IDEs installed their own built-in/default python.exe interpreters and by default they are using those, which isn't the same one that you "use" when you just type "py blah blah" into your command prompt. I would suggest looking into your IDE documentation and seeing if there's an option to specify which interpreter/environment to use. Someone else who uses Windows/those IDEs is gonna need to help you here. Maybe https://code.visualstudio.com/docs/python/environments helps you out? a forbidden love posted:According to the python documentation website (https://docs.python.org/3/installing/index.html#installing-index) I should be able to install using No idea. Maybe someone else who uses Windows can answer that. Boris Galerkin fucked around with this message at 08:50 on Aug 17, 2018 |
# ? Aug 17, 2018 08:48 |
|
Boris Galerkin posted:I found on one of their closed GitHub issues that apparently the "*" version signifier doesn't mean wildcard as it does in every other usage in the world, but instead it means "latest version." I don't really have anything constructive to say about that so it's whatever. Give Poetry a try. It does everything pipenv does but without some of the dafter design decisions (like upgrading your packages by default)
|
# ? Aug 17, 2018 12:09 |
|
For those of you who use Python for scientific computing, do you use Spyder as your editor / development environment? If "yes", what changes/features would you like to see in it? If "no", what changes/features would convince you to use it? If you don't know what I'm talking about, check out Spyder here: https://github.com/spyder-ide/spyder . It's the default IDE for the Anaconda Python distribution. I may have the chance to work on enhancing Spyder with a hardcore python dev. I'm genuinely interested in serious replies, thanks!
|
# ? Aug 17, 2018 16:36 |
|
pmchem posted:For those of you who use Python for scientific computing, do you use Spyder as your editor / development environment? Coming from an astrophysics perspective, I prefer to use Jupyter notebooks for a lot of my work (which runs as a local server you load in a browser tab). I use PyCharm generally for my standalone scripts and such, but I like using Jupyter for compiling and presenting my work since I can inline LaTeX blocks, plots, etc. I've never heard of Spyder and I don't know enough to have an opinion on it yet, but I'll add it to the long list of new things to look into.
|
# ? Aug 17, 2018 17:34 |
|
BaronVonVaderham posted:Coming from an astrophysics perspective, I prefer to use Jupyter notebooks for a lot of my work (which runs as a local server you load in a browser tab). I use PyCharm generally for my standalone scripts and such, but I like using Jupyter for compiling and presenting my work since I can inline LaTeX blocks, plots, etc. Not sure what your Python installation is like, but Spyder is installed by default if you're using the Anaconda Distribution: https://www.anaconda.com/download/ I'm very familiar with Jupyter notebooks, yeah. I use 'em sometimes.
|
# ? Aug 17, 2018 17:44 |
|
I'm trying to apply a function f(x,y) to each possible combination of two strings in a list.Python code:
code:
Googling gets me vague and complicated answers so I'm probably not going in the right direction ?
|
# ? Aug 20, 2018 09:18 |
|
pretty easy with itertools.product if and only if you don't need it in the dataframe (aka, you don't need it super fuckin fast) the keyword is cartesian product. for the dealio in pandas, see: https://stackoverflow.com/questions/13269890/cartesian-product-in-pandas
|
# ? Aug 20, 2018 09:23 |
unpacked robinhood posted:I'm trying to apply a function f(x,y) to each possible combination of two strings in a list. This is a great place for a generator or list comprehension. IN Python code:
pre:[['aa', 'ba', 'ca'], ['ab', 'bb', 'cb'], ['ac', 'bc', 'cc']]
|
|
# ? Aug 20, 2018 14:39 |
|
you just reimplemented itertools.product but not w/ a generator
|
# ? Aug 20, 2018 17:10 |
|
unpacked robinhood posted:I'm trying to apply a function f(x,y) to each possible combination of two strings in a list. Using your d, maybe something like this? Python code:
|
# ? Aug 20, 2018 18:24 |
|
Thanks ! I ended up using Eela6's proposition, alhough I find Symbolic Butt's one more readable. It looks like this : Python code:
unpacked robinhood fucked around with this message at 18:40 on Aug 20, 2018 |
# ? Aug 20, 2018 18:37 |
|
Boris Galerkin posted:I found on one of their closed GitHub issues that apparently the "*" version signifier doesn't mean wildcard as it does in every other usage in the world, but instead it means "latest version." I don't really have anything constructive to say about that so it's whatever. The problem isn't your requirements, its the requirements of your requirements. Apparently, there does not exist a fully-compatible set of your requirements. This is just the first time you've found that out because pipenv is the first tool you've used that bothers to tell you.
|
# ? Aug 21, 2018 19:12 |
|
Loading data from an API to a database... the fastest way possible. I'm trying to load data from an API, I get 100 objects per call. I'd like to load them all at once, with an INSERT statement that looks like this... INSERT INTO t(col1, col2, col3) VALUES ((1,2,3), (2,,4), (4,5,6), (4,3,2)) ... etc. Now the trick is that not all columns are present in all items of the json response/dictionary. I was working around this by getting the keys and the values and then putting them into an INSERT statement, however this worked per line and is unbearably slow. I kinda got the gist of what I need to do from here: https://stackoverflow.com/questions/8134602/psycopg2-insert-multiple-rows-with-one-query So, now I'm assigning all the columns to a list that I can call with the cur.execute() function, but need to make sure that if a value doesn't exist, it returns NULL or adds a comma, so the number of columns is always the same. Can anyone help? code:
Mark Larson fucked around with this message at 16:38 on Aug 22, 2018 |
# ? Aug 22, 2018 16:32 |
If you want it to just work, and not optimally fast, then create a column base dict (e.g. {'A': None, 'B': None}), and the do loop over keys and values, updating copy of that dict that you then throw into your insert.
|
|
# ? Aug 22, 2018 16:51 |
|
cinci zoo sniper posted:If you want it to just work, and not optimally fast, then create a column base dict (e.g. {'A': None, 'B': None}), and the do loop over keys and values, updating copy of that dict that you then throw into your insert. What would be the fast way then?
|
# ? Aug 22, 2018 18:58 |
|
slightly comedy option: make a csv, do a COPY, if your rdbms is chill with that
|
# ? Aug 22, 2018 19:01 |
|
|
# ? Jun 5, 2024 04:56 |
Mark Larson posted:What would be the fast way then? Bulk inserts through SQLAlchemy ORM bulk suite, or some flavour of inserts through SQLAlchemy core.
|
|
# ? Aug 22, 2018 19:14 |