Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
CarForumPoster
Jun 26, 2013

⚡POWER⚡
Is there a database/big data thread somewhere?




CarForumPoster fucked around with this message at 13:35 on Jul 16, 2020

Adbot
ADBOT LOVES YOU

CarForumPoster
Jun 26, 2013

⚡POWER⚡

infinite99 posted:

I'm trying to deal with some OCR output from images of tables and I'd like to be able to extract certain things out of these tables. I'm using the Azure service to do the OCR which gives me it's predicted text and bounding boxes for anything it scrapes out. So far the output is actually pretty good and I'm pretty impressed with it.

My issue now is actually getting the relevant parts of the output and throwing those into a structured database. It looks like the OCR reads things row by row and the JSON that it returns seems to reflect that so that's kind of helpful but without knowing what cells the information is sitting in, I can't reliably scrape out the information. The tables aren't always consistent either so that's another issue. I've done some image manipulation using OpenCV to figure out the cells of the table which gets me a bounding box and I can check where each portion of the OCR output falls within a box but it's not very consistent. Even trying to figure out the table headings is kind of an issue since they can exist at the top of the table or the bottom of the table. Any skewing in the image also messes up my calculations for the table as well but that might be something I can fix.

Here's some examples of what I'm working with:





The information I want to grab out is the Rev number, Date, and Description.

I feel like I'm way over-complicating this problem but I've been stuck for days trying to figure out a good solution to this so any help to point me in the right direction would be super appreciated!

hahaha I've done this exact task before with a mix of AutoCAD files, mylars, scanned drawings, etc. in the days before useful OCR. It loving sucked. Best of luck.

Here's how I'd go about it:
Scenario 1) If I had to do ~2000 drawings and they were in a security environment that this would be acceptable, I might have OpenCV auto crop the revision block so youre not sending IP to who knows where. Save the cropped file with a relevant file name, then I'd just get someone on Amazon Mturk/fiverr/upwork to do it manually for like $50-$100. As an engineer I'm getting paid to deliver results, not write code. Save the results in a google sheet if its an upworker or if its mturk just parse the resulting CSV. 2000 drawings would get done in prob 1 day using mturk because you have a large numebr of people doing them in parallel HOWEVER you would want to ALSO have people audit the results. Maybe some highly rated mturk people, maybe you, maybe an upworker. For quick and dirty projects in the past, I've just made a little HTML page with radio buttons where I glance at the work side by side to approve or disapprove it. I then have any disapproved ones automatically go back to mturk for reprocessing.

Mturk quality aint great but for a task like this is VERY easy to set up and parallelize. If you do the mturk route, a good rule of thumb is >1000 completed HITs, english speaking country required (Make a list. If you want to cheap, include Bangladesh, Philippines, etc.) , >98% approval on HITs. You should time yourself doing 10ish of the HITs yourself and try to budget about ~$6-7/hr.

If I had >>2,000 drawings to do or a security environment where they'd say gently caress you to the above idea, I'd try to use Open CV to find where to crop, I might try to manipulate the image to make any detected lines straight since on scan they might've gotten skewed, feed the cropped and corrected image to an OCR service like Azure or Amazon textract. Give textract a try too, they advertise table extraction but I've not had steller luck with it.

If, for some reason, I can't make Open CV can't reliably find the rev block to bound and crop it, I might try to do an image recognition based approach where I use a NN to find the rev block then I crop the image based ont he outputs of the NN.

Last option is to hire a few people overseas to do it with no cropping and make them sign NDAs.

CarForumPoster fucked around with this message at 13:49 on Jul 16, 2020

CarForumPoster
Jun 26, 2013

⚡POWER⚡

infinite99 posted:

I guess I just need to work on the process of figuring out the cells of the table through image manipulation to be able to build the table in memory and insert the text based on coordinates.

Thanks for the suggestions though!

If you need real scale of largely templated drawings then may I suggest: https://scale.com/document

This is a real problem thats worth a lot of money, so I wouldn't count on being able to roll your own to >>90% accuracy. That said I have never had great luck with OpenCV unless I have a fairly consistent image dataset.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

fuckwolf posted:

Thank you both. After reading a bit more about Python and Django, it sounds like the right direction. I’ve started working through the Google Python course as well as dipping my toes in the DjangoGirls material. So cool that there are so many free resources for learning this stuff. Exciting!

You sound much like me 2 years ago. I agree that python is the right language. I use both Django and Dash, my suggestion below, regularly and have active web apps currently deployed in both. I definitely have less experiences than Dominoes though.

When you get to the part where you want to actually deploy your Django app to be a real live web app for people to use on the internet, I highly recommend not learning about all the server config that tends to come with deploying python web apps to servers and instead taking a look at Zappa which makes using AWS' serverless architecture quite easy. EXCEPT if you decide to use Flask/Dash in which case Heroku makes it really really really easy to get a Flask/Dash web app up and working.

fuckwolf posted:

a website ... registered users ... easily add, edit, sort, and export data sets. ... replace Excel sheets ... features: create an e-mail notification when data falls outside of a certain range, display charts based on the selected data, and generate custom reports such as "Here's the trend for the last 30 days of brand X."

Except for the authentication of users, what you're describing is basically the perfect use case for the python based Dash

You can make a working, deployable-to-Heroku app, with as little as the following block of code (3 files):
code:
# file 1 - Procfile: This is your server spec.
web: gunicorn app:server

# file 2 - requirements.txt: This is your list of stuff to have heroku pip install
dash_bootstrap_compoents
pandas #probably

# file 3 - app.py: This is your actual code. Below is all you need to have pretty bootstrap stuff available on a bare bones site.
import dash
import dash_bootstrap_components as dbc

app = dash.Dash(
    external_stylesheets=[dbc.themes.BOOTSTRAP]
)

app.layout = dbc.Alert(
    "Hello, Bootstrap!" className="m-5"
)

if __name__ == "__main__":
    app.run_server()
The most compelling reason to use Dash instead of Django is the sever config is generally easier (IMO) and the CSS/HTML bits almost completely disappear. Whole thing is done in python and made pretty by dash-bootstrap-components, you learn one language, Python, and be done with it.

For comparison sake, a similar thing in Django would usually require defining and provisioning a database, creating HTML template and would be 6+ files that dont deploy successfully. That said, I love Django for 3 reasons: User authentication is good and baked in from the start, almost any problem you have with it has 4+ stack overflow articles about it, django rest framework makes it really easy to slap a REST API on top of an existing app. Django's learning curve is steeper than Dash if youre just starting out, but there's real rewards to it if you have those use cases

CarForumPoster fucked around with this message at 14:20 on Jul 18, 2020

CarForumPoster
Jun 26, 2013

⚡POWER⚡

fuckwolf posted:

refreshing my understanding of HTML and CSS. [...] Javascript and JQuery [...] Ruby [...] PHP and SQL

See reasons above but IMO you should not learn any of the things you suggested. You don't need to know SQL, CSS, JS, you need to google it every now and then. Example below.

quote:

1- handle adding to and retrieving from the database?
2- What do I need to know to generate e-mail notifications?
3- What about username and login stuff?
4- Does this all sound possible for someone who doesn't know what they're doing? Any general tips?

All of this is covered by a one page Dash app except the database. I'd suggest using Amazon RDS.

1- For reading/writing an excel-like table from a sqlite db into python, whether django or dash, you can just do this:
code:
conn = sqlite3.connect(database_path)
df = pd.read_sql("SELECT * FROM %s" % table_name, conn)
Put those two lines in your dash app, do whatever calculations, return the result. That easy.

2- Everyone uses APIs for service based stuff now. Doesnt matter the language. You should use an email api like mailgun is the strict answer. That said, its hilariously easy to send IMs via webhooks to MS teams or slack, which my team greatly prefers. For emails in python via an API like mailgun itd be this. Yes, really that easy. One line.
code:
requests.post(
        "https://api.mailgun.net/v3/samples.mailgun.org/messages",
        auth=("api", "key-3ax6xnjp"),
        data={"from": "Excited User <excited@samples.mailgun.org>",
            "to": ["devs@mailgun.net"],
            "subject": "Hello",
            "text": "Testing some Mailgun awesomeness!"})
3- How many users. Django has the best baked in auth. Dash has simple backed in auth.
4- Yes, its much easier than you're thiking. You just need to get started and python is prob the way to do that.

CarForumPoster fucked around with this message at 14:41 on Jul 18, 2020

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Walh Hara posted:

An unpopular opinion: while Python Dash is pretty cool, R Shiny is superior in every way except that it's written in R. But if you have to learn a language anyway and the app is very basic then in my opinion it's both easier to use and has a much nicer end result.

https://shiny.rstudio.com/

Superior (debatable) in every way except...

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Combat Pretzel posted:

So, machine learning? I just started looking into it to do some random things during my lunch time, that I typically stay at work. There's currently two options I'm looking at, Keras, which seems relatively straight forward to get a network model going and training, and Tensorflow, which I don't know what the gently caress is going on. Should I even bother with latter? I'm mainly looking into basic networks to predict stuff. Like training a model with data from our laboratory and then try to predict things (or possibly run it the other way, if possible, to predict what inputs I need for a certain result).

Highly highly suggest you watch the fast AI series of videos and not start by picking a framework

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Dominoes posted:

Is SciKit learn old news?

No! In fact thats why I recommended the FastAI series. They have a machine leaning video set which goes into more detail about getting your data ready for either ML or DL models, but uses ML/sklearn as instruction and a 2019 deep learning video set that goes into the FastAI V1 built on pytorch. In particular FastAI makes transfer learning insanely easy which will make your models both faster to train and way better (particularly for image recognition/CNNs and NLP)

If I've got tabular non-time series data I would def try both a random forest (sklearn) and a NN (FastAPI aka pytorch on easy mode).


Through literally no other instruction but those two I was able to build an image recognition model that exceeded the quality of all published papers I could find using pictures I scraped off eBay. I'm not an expert in the field at all but I know a lot about this one domain and selling on eBay and I, a lay person and amateur, built a thing that was 97% accurate at picking from 20 categories of similar things. FastAI V1 for image recognition in particular is insanely useful because of its automated transformations that help your model generalize by tweaking the images slightly during each epoch.

CarForumPoster fucked around with this message at 21:13 on Jul 26, 2020

CarForumPoster
Jun 26, 2013

⚡POWER⚡

KillHour posted:

I use Lucidcharts and I like it.

Thanks for this suggestion. I needed to think out how I was going to lay out some system and lucid charts made it easy. IMO better than Visio.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

raminasi posted:

I'm sure individual language threads will also be helpful. People love answering newbie questions.

The Python thread is super friendly

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Rocko Bonaparte posted:

Does anybody else use Microsoft Project for small-team software development stuff? I thought I'd try it out for some estimation work and I'm getting frustrated with managing groups of tasks. I did my estimates by taking larger tasks and giving them breakdowns until the breakdowns were tangible. Now I have these sacks of tasks with a some dependencies on each other, but I can't seem to work at the level of the groups. Everything has to happen at the lowest task level. I'd like to just say this group depends on that group and these resources will work on the group without constantly expanding and contracting everything. Is this conceptually not a thing?

No because waterfall development is built into the theory of operation of MS project. It doesn't inherently necessitate that you plan poo poo before you knwo what's going on, but untlike those which are fundamentally agile, it sure does enable you to over-plan stuff that you don't know yet. It seems like the venn diagram of your problem and the scenario I just described has decent overlap.

Solution: Use Jira Cloud FFS. It costs basically nothing, its the universal tool, its easier to use than project, and its hilariously cheaper than a seat of MS Project.

I dont even dislike project...if youre gonna build a plane for the govt, theres some merit there.


But dont author software requirements...especially on a small team.

CarForumPoster
Jun 26, 2013

⚡POWER⚡
I think it was me that said JIRA originally and I would like to add that I was talking about completely unmodified JIRA cloud integrated with github/bit bucket.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

AgentCow007 posted:

So I've started learning web development and I wanted to start accepting Stripe payments on some apps. Is it advisable to create some sort of business entity? These are pretty small crappy apps for now and I don't expect a ton of customers. Also, if I'm going to do a lot of smaller sites, does each one need to be a separate business or could I just make an overarching one like "Agentcow Services, LLC"?

I'm not a lawyer and this is not legal advice but I do work at a legal tech startup that does litigation and, occasionally, corporate formations.

If I was freelancing I'd definitely start a protective business entity, most likely an LLC. In my state its cheap ( wanna say $150), fast, easy, allows you to have employees if things start to go well and has many protections for your personal assets. Be aware though, there's plenty you can do that risks "piercing the veil" and some laws which hold the members/managers liable, for example unpaid wages under the FLSA. You can google "piercing the veil" for more info about when your personal stuff might not be protected. In my state I'd need to file paperwork each year that takes about 15 minutes to keep it active.

Websites get sued pretty frequently where I live for ADA violations. You wouldnt want that website you built for some hotel 3 years ago to end up with you getting shook down for $10K.

CarForumPoster fucked around with this message at 17:46 on Sep 2, 2020

CarForumPoster
Jun 26, 2013

⚡POWER⚡
I develop full time on windows unless it’s gonna end up on AWS lambda. It’s def dumb but inertia and all that.

If I could do it over I’d def start linux/mac.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Zoracle Zed posted:

90% of the time conda is shockingly good at handling even complex dependencies on Windows, but here's my "favorite" bug:

Yea this is my experience as well. It’s amazing except when you have some very complex thing that wouldn’t work with pip install at all.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Josh Lyman posted:

I wanted to follow up on this and say that NihilCredo's suggestion of using ls -R > filelist.txt on Linux and gci -R > filelist.txt in Windows has worked really well to generate the directory listings. However, diff (and its Powershell counterpart compare-object) is a pain to use and gci produces an inconsistent number of whitespaces which results in false positives for a change to a directory.

Is there a better way, either through Linux command line or a Windows program, that can compare the filelist.txt files for changes? So for example, this would only spit out file1a:
code:
# cat oldfilelist.txt
file1
file2
file3

# cat newfilelist.txt
file1
file1a
file2
file3
I feel like this would be a common task when tracking changes to source code--maybe an editor is my best bet?

Changes to source code are tracked with git.
If you just need to check if there have been changes, you can check the file hashes. In powershell its get-filehash c:\test.txt. Im sure its easy in python as well.

If you need versioning, can you simply put them in OneDrive/Sharepoint, Google Drive, etc. which will automatically save versions for you?

CarForumPoster
Jun 26, 2013

⚡POWER⚡

leper khan posted:

I'm getting :redflag: vibes about them not giving you a machine.

This. Are you international or something? A decent laptop on eBay is like $600 and lets them lock that poo poo down to prevent IP loss if it gets stolen or they fire you. It lets them have some working hour records by which to insulate themselves from liability.

I use an eBay search something like this. I’ll usually upgrade 8GB memory to 16GB though

https://www.ebay.com/sch/i.html?Pro...en%252E&_sop=15

CarForumPoster fucked around with this message at 04:58 on Nov 19, 2020

CarForumPoster
Jun 26, 2013

⚡POWER⚡

KillHour posted:

Unless you modify the collection within the loop! :viggo:

Agree for a list but is extremely common and necessary to loop over a pandas dataframe, modifying the elements in the rows.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

KillHour posted:

Modifying the elements in a loop is extremely common. Modifying the collection by adding or removing elements is not, and is the source of many bugs.

Yea agree that modifying the number of things you’re looping over within the list is a big red flag in code. Our new to python friend might not know the difference yet though.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Quote posted:

I'm coming to the end of a training course on Python 3. I have really enjoyed it, but now as the end looms I worry that I won't use it and will lose what I've learned.

I don't have a project in mind. I don't have anything that I want to build right now.

My question is, is there a sort of project list--or a collection of exercises-- designed to put what I've learned to the test? Like, exercises for the sake of understanding?

Pick a project. What do you like to do? What do you wish existed?

CarForumPoster
Jun 26, 2013

⚡POWER⚡
I haven’t used puppeteer but I see you’re missing libgobject-2.0.so.0


I had similar problems getting Firefox w/selenium to run on AWS Lambda which I didn’t get around with FF because I found an already working chrome package. All of the attempts to yum install dependencies for that package to make FF work brought me down a rabbit hole.

If possible I strongly suggest finding someone who has made puppeteer work in a similar environment. Especially if serverless.

By the way I was using docker for my serverless app testing and making a long sleep function in my code let me remote into the container to better diagnose issues.

CarForumPoster
Jun 26, 2013

⚡POWER⚡
It’s not super fast AFAIK but I always end up using a Python package called fuzzywuzzy for Levenshtein distance/fewest changed characters type fuzzy logic matching.

CarForumPoster
Jun 26, 2013

⚡POWER⚡
Fair warning, I've been drinking

melon cat posted:

Python beginner question. I've noticed that a lot of Python tutorials (like this one which explains how to make a blog with Python) use terminal to install stuff and run commands. This is easy on a Mac since you can run Terminal easily on that IS. But I have no idea how this is done in Windows 10. And every resource I've looked at always shits on Windows 10 for its broken cmd prompt.

Am I better off learning and using and learning Python on Ubuntu? Because I really don't mind dual booting Windows + Ubuntu. I'm just getting kind of irritated that most tutorials assume you're on a Mac/Linux and I think I've spent way too much time trying to figure out how to get these Terminal commands working on Windows.

Hot take: I exclusively dev on Win 10. WSL is a fools errand. You need Anaconda for a while and to use conda envs if you're developing for general purpose stuff. When you need to deploy poo poo to the web like flask or django you'll wanna run docker and maybe switch to virtualenv, though I highly recommend skipping figuring out docker and just adopt AWS SAM. gently caress servers and gently caress configuring them. If you wanna put django on the internet using Win10 use Zappa at first and when you need to config poo poo beyond that use AWS SAM to deploy to AWS Lambda. Here's how to get a django site on AWS Lambda in 15 minutes for free: https://romandc.com/zappa-django-guide/

hbag posted:

...now to figure out which cookie I need. none of these seem to really be standing out, but I might just have a smooth brain.

Here's a lazy rear end, slow but faster than figuring this bullshit out way: Get whatever site you want with selenium and loin the old fashioned way. When you wanna pass logins between session just use selenium's get_cookies and set_cookie methods. Couldn't be easier. I use this when I want to scrape a search result in parallel to do something like: 1) Login to website 2) Search for thing, getting the cookie from the logged in session. 3) Pass the cookie from 2) to multiple parallel lambda functions

CarForumPoster fucked around with this message at 06:07 on Feb 20, 2021

CarForumPoster
Jun 26, 2013

⚡POWER⚡

pokeyman posted:

How so?

(I have no experience with it whatsoever and haven't done Windows dev for years, so I'm inclined to believe you.)

If you're using Windows I'd wage everything you need is available on Windows. I say this having deployed several python ML models, web scrapers and apps (all Django, Flask, or some code running on AWS Lambda or EBS). All dev'd on Win10. You might not be able to run the midnight build of pyTorch/FastAI, but usually last stable of any package will do if you're on Windows.

If you're developing a project that needs to be deployed and you want to test in a prod-like environment, using WSL is dumb because it doesn't emulate your prod environment. Its just regular linux. That's the entire purpose of docker, to exactly-as-possible replicate your prod environment. So just use Docker Desktop and its handy CLI if you need to interact with it rather than trying to figure out how to SSH in.

So I arrive at the conclusion that the probability of anyone actually benefitting from WSL for the purposes of python development specifically is almost nothing and I say this as someone who has WSL2 Ubuntu installed. WSL has its place, but prob not the right choice for python development if your preferred OS is Windows.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

hbag posted:

how could i delete everything in a string EXCEPT what my regex matches?
the pattern im using right now excludes all the other LINES, sure, but i only want the string itself that matches, not the entire line

In Python I believe you just use re.search or re.findall which are part of the std lib. I’d imagine most other language have analogous things.

This wouldn’t delete everything else of course, it’d simply extract whatever the pattern matches.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

dupersaurus posted:

Baby's first machine learning question:

I've made a camera to watch my bird feeder, and now I want to get some automatic identification going on. I grabbed this pre-trained model for Tensor Flow, and it works great on birds, but it's also pulling birds out of thin air from no-bird images. I haven't yet dug into the evaluating prediction confidence (if that's even a thing), but it got me thinking about training for null cases. The model has a label "background", and I'm wondering: what's going to happen if I add some training using my own pictures, including feeder-only shots classified as "background"? Would it help, or is it going to confuse things since the feeder is always going to dominate the picture?

Because birds are part of Imagenet its gonna be super duper easy to pull them out of a webcam feed. I'd go at this like others are suggesting with a potential multilabel classification

I find fastai extremely good for getting something up and running in a day or two to see fi this will be a hard problem or not. They have tutorials and make it super easy to use pretrained models.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

dupersaurus posted:

This project’s been one leeroy jenkins after another, so yeah why stop now

I’ll check fastai out too. Though I got something out of tensor flow, it feels a bit too obtuse for starting from scratch.

FastAI's whole deal is to get subject matter experts using ML and DL without having to know much about it to make something useful. Especially for image recognition, they make it trivially easy to use pretrained models. The Hobby Coin Project (TM) in my red text was actually built w/fastai in a jupyter notebook which started with ImageNet.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Hughmoris posted:

I've been trying to teach myself the basics of git and contributing to open source. I found some typos in a popular repo, submitted my first PR, and they merged it. I'm now listed as a contributer. :3:

I'm assuming I can put this on my resume now?

Yes but don’t overstate your contribution and be very upfront and clear if asked. If someone asks you what it is answer clearly and specifically that you fixed typos.
I had a candidate who changed some print statements around when I looked at his contribs. When I asked him to describe the contributions he made, knowing they were totally insignificant to the function of those repos, he tried to dodge it. When he wouldn’t answer clearly I moved on but it was a major reason for not proceeding to next steps as his answer struck me as dishonest. If he’d been totally upfront him having made those minor contribs would’ve been a feather in his cap.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Empress Brosephine posted:

ooo that's a good idea using Python to do it. Thanks!

Maybe I should learn one day what language VS extensiosn are programmed in and just make one for myself...all it needs to do is scan the .html for "class=" or "Class=" or whatever typing of it and then dump what follows until the next line break!

Thanks!

Tangentially related but I've used regex101.com dozens of times so far for regex building stuff. Pretty useful site because of its ability to easily paste test cases

CarForumPoster
Jun 26, 2013

⚡POWER⚡
FWIW 50+ times over ~4 years using regexs to accomplish something and I still don’t understand them well enough to write one from scratch. I just always Google->stack overflow->regex101 if needed-> test in code.

It’s a crutch.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

General_Failure posted:

Besides complexity, not much I guess. Serious question though. Besides printing text, how hard is it to get visual feedback from a notebook?

You can do tons of stuff inline in a jupyter notebook including interactive apps (called jupyter widgets) render plots, and even run flask/dash apps.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

PIZZA.BAT posted:

I looked around for an AI thread and couldn't find one so sorry if this should be somewhere else. I'm trying to get the groundwork set up for a personal project where I'll be teaching an AI how to play a game by having it play against itself however many millions of times it takes to learn. Most of the genetic libraries I've come across necessitate me feeding in some sort of fitness model so that it can breed the evolutionary model itself which won't work for my use case because the game itself isn't solved / isn't something that can be broken down to a simple formula.

Is there a library out there that will let me configure how the initial tree should look, ie: here's the 150 inputs at the base of the tree and what the outcome should look like, now generate me a million random trees with a depth of anywhere between 50-100, and just give them to me to evaluate myself? At that point I can evaluate their fitness myself and breed/mutate the winning pool and repeat the cycle.

Does such a library even exist?

Why is feeding it an initial decision tree a requirement? Is there a way to score the movements, actions, outcomes in a game? Are you trying to make a deterministic set of paths or can you just have a black box of a model that optimizes for whatever your scoring function is?

Also monte carlo tree search is kinda like what you're describing.

This guy has game AI videos that are meant to be edutainment but they can give you a nice sampling of some methods: https://www.youtube.com/watch?v=D5xX6nRWDko

CarForumPoster fucked around with this message at 21:19 on Jun 11, 2021

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Kuule hain nussivan posted:

I've wanted to get into contributing to open source projects for a while, but it always turns into a very daunting task. Both finding worthwhile projects and then figuring out how to contribute to them.

Any tips? Or would there be any interest in starting a goon group to get things done together?

I have not done this beyond extremely minor things so I can't provide that much help but the little I have done has been from finding a project I actually use and care about which has open issues on GitHub? Usually popping into their slack or whatever chat channel or sending an email to the owners to ask if they accept PRs if I fork and make the fixes.

Is there some friction there? If you mean contributing new features, I'd try starting with doing some maintenance/bug fixes since usually people don't want to do that and everyone wants to make new features.

CarForumPoster
Jun 26, 2013

⚡POWER⚡
I really like dash and would gladly answer any questions you want. Super easy to deploy to heroku. If a Python graphing library, plotly or cytoscapes can make the plot you want you can do the whole thing in Python.

Is that the case here? If no it starts to be more complicated.

EDIT: I reread your post and you said you can do it using seaborn. In that case you can probably use dash and you might check out plotly because it’s pretty but seaborn works too. Plotly has the advantage of being integrated in dash with dcc.Graph()

EDIT2: https://plotly.com/python/animations/

CarForumPoster fucked around with this message at 14:30 on Jun 23, 2021

CarForumPoster
Jun 26, 2013

⚡POWER⚡

foutre posted:

That's exactly what I was wondering, thanks! It sounds like Dash would be a good fit. The one more complicated thing is the map itself - basically, it's a static image of the map with little dots and bullet tracers and stuff drawn on it. It sounds like it might be complicated to do with just the plotly animations, but it looks like there's a dash version of canvas, which is what I've used in React so that seems promising. I'm honestly kind of mad I didn't realize there was a version of this where I could be using Python instead of JS all along...

If you're up for it I've got some very basic questions:

Are there any go-to resources you'd suggest looking at for learning Dash? I'm comfortable with Python in general, but (as you can tell) very new to Dash.

Is the Dash framework pretty similar to React? Ie, model-view-controller decomposition and whatnot?

How do you handle getting data in Dash? I.e., in React I basically use Mongoose and Express; is there an equivalent set of things I should be looking at for Dash?

E: more generally re the above, it looks like a lot of the examples load in data beforehand rather than ie asynchronously getting data from a server. Is that a design choice that's necessary, or just kind of because that's the form data happens to be in? Ie, you could do it either way, but usually don't have to.

Thanks again for the help!

I haven't used React, although Dash uses React to render the UI (i.e. the webpage you interact with).

The thing I like about Dash vs, say, Django, is you don't really need to think about any of those things you just said. You can make that pubg simulator without knowing what a MVC is. Just use the callback to run the python code you'd normally write for a regular python program.

For example: If you have the data sitting on the server already you can load it when the app runs (for example using pandas like pd.read_csv() or pd.read_sql(). Callbacks run async by default in current Dash IIRC. So to make an app like the pubg simulator you linked where it starts with empty data and then updates it with a button press and lets say the match data is requested from a rest API, you'd just make the button use requests to hit the API.


IDK if this code works I havent tested it or thought about it much, but should be the right concept:
code:
### This code renders a scatter plot using a callback loading data first from a dataframe then, after a button is pressed, from an API. 

# Your imports and stuff
import dash
import dash_html_components as html
import dash_core_components as dcc
import pandas as pd
import plotly.express as px
import requests

external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css']

app = dash.Dash(__name__, external_stylesheets=external_stylesheets)

### Higher up you have a graph with some ID and a button
app.layout = html.Div([
    html.Button('Submit', id='btn-id-name', n_clicks=0),
    dcc.Graph(figure=None, id="graph-figure") 
    # IDK if you can actually make this None or if it needs some init.
    # Either way in the callback I say to get it some init data if the button hasn't been pressed yet.
])


# Down lower youll have your callback.
@app.callback(Output('graph-figure', 'figure'),
              [Input('btn-id-name', 'n_clicks')])
def some_func_name(n_clicks):
    if n_clicks: # If you've clicked the button. 
        headers = {
            "Content-Type": "application/json",
        }
        response = requests.get(
            f'https://api.dickbutt.com/api/v1/?param=bad_posting',
            auth="ABC123", headers=headers)

        # Do some stuff to the response here, make your fig
        new_fig = px.scatter(
            pd.DataFrame(response.json()),
            x="sepal_width",
            y="sepal_length"
        )
        return new_fig
    else: # If you have NOT clicked the button. 
        # Read in some initial data set if you want some starter thing. All callbacks are executed at runtime.
        df = pd.read_csv("init_data.csv")
        # Make some initial figure
        fig = px.scatter(df, x="sepal_width", y="sepal_length")
        return fig
    

if __name__ == '__main__':
    app.run_server(debug=True)
I didnt use dash bootstrap components but you should because they're easy and make stuff prettier.

CarForumPoster fucked around with this message at 22:41 on Jun 23, 2021

CarForumPoster
Jun 26, 2013

⚡POWER⚡

foutre posted:

Oh wow, that's so much simpler. Thanks, looks like Dash will be perfect for this, very much on board for it just handling a ton of that stuff for me.

No prob! I'm excited to see what you make. I never really concepted of using plotly for playback of data vis in that way so I'm glad you asked the question.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Dark Mage posted:

Have you considered using D3.js for drawing the actual map?

If he ends up sticking with Python you can render that map as a scatter plot over an image https://plotly.com/python/images/

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Dark Mage posted:

It seems like a lot of work to use Python for a front-end visualization tool. D3 is written in JavaScript, so that eliminates the need to use multiple languages do to one thing.

If he's using Dash, Dash is all python. You dont even write HTML. What you're prescribing uses multiple languages to do one thing. I posted an example app on the last post of the previous page to render a scatter plot using data from a CSV or API. After that all he'd need is to add a background image for the map, a callback for the "play" function, and probably a storage method like dcc.Store() to share data between callbacks if needed.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Dark Mage posted:


Your Python-based solution is certainly an interesting route, and I'd be curious to see how foutre builds around it.

I was curious to try so here's an actual prototype that works and puts moving scatter plots over a pubg map image and has playback. Took about 20 minutes.

code:
### This code renders a scatter plot w/animation using a callback loading data from a dataframe

# Your imports and stuff
import dash
import dash_html_components as html
import dash_core_components as dcc
import pandas as pd
import plotly.express as px
import requests
from dash.dependencies import Input, Output, State

external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css']

app = dash.Dash(__name__, external_stylesheets=external_stylesheets)

### Higher up you have a graph with some ID and a button. This is kinda like your HTML template.

app.layout = html.Div([
    html.Button('Submit', id='btn-id-name', n_clicks=0),
    dcc.Graph(figure={}, id="graph-figure")
])


# Down lower youll have your callback. This updates your graph. 
# All callbacks are executed at runtime, thus I include an "else" for the button not being clicked
@app.callback(Output('graph-figure', 'figure'),
              [Input('btn-id-name', 'n_clicks')])
def some_func_name(n_clicks):
    if n_clicks:  # If you've clicked the button.
        # Could do a request or load a CSV here, I'm using their example data.

        df = px.data.gapminder()
        fig = px.scatter(df, x="gdpPercap", y="lifeExp",
                         animation_frame="year", animation_group="country",
                         size="pop", color="continent", hover_name="country",
                         log_x=True, size_max=55,
                         range_x=[100, 100000],
                         range_y=[25, 90])

        fig.add_layout_image(
            dict(
                source="https://i.ibb.co/cthV7vc/L7ENJGA.jpg",
                xref="paper",
                yref="paper",
                x=0,
                y=1,
                xanchor="left",
                yanchor="top",
                sizex=1,
                sizey=1,
                sizing="contain",
                opacity=1,
                layer="below")
        )
        fig.update_layout(template="plotly_white")

        return fig
    else:  # If you have NOT clicked the button.
        fig = px.scatter(pd.DataFrame([['A', 1], ['B', 2], ['C', 3]],
                                      columns=["X", "Y"]),
                         x="X",
                         y="Y")
        fig.add_layout_image(
            dict(
                source="https://i.ibb.co/cthV7vc/L7ENJGA.jpg",
                xref="paper",
                yref="paper",
                x='A',
                y=1,
                xanchor="left",
                yanchor="top",
                sizex=1,
                sizey=1,
                sizing="stretch",
                opacity=1,
                layer="below")
        )
        fig.update_layout(template="plotly_white")
        return fig

if __name__ == '__main__':
    app.run_server(debug=True)

CarForumPoster fucked around with this message at 01:44 on Jun 25, 2021

Adbot
ADBOT LOVES YOU

CarForumPoster
Jun 26, 2013

⚡POWER⚡

no hay camino posted:

Does anyone have a recommendation for an intro to Machine Learning course? I'm currently taking the coursera Machine Learning course "taught" by Andrew Ng, but it's from 2012 and I can't even read the mathematical notation sometimes because the LaTeX is broken (I can report it but still wow, you want me to take your course seriously?). I do want to get a good mathematical grasp of machine learning beyond just knowing how to use libraries - maybe I might need a refresher on my multivariable calculus and differential equations.

Fast.ai is what got me into machine learning and deep learning. The first few lectures are technician level, then it goes into the concepts deeply and quickly. Its taught by an absolute legend in the field too, former president of Kaggle, dude has started and sold multiple ML startups, been part of many publications despite not being an academic.

If you want specifically machine learning here's that: (Playlist)
https://www.youtube.com/watch?v=CzdWqFTmn0Y

If you want deep learning they've done a few iterations of their deep learning course broken up into a beginner and advanced set. (Playlist)
https://www.youtube.com/watch?v=_QUEXsHfsA0

Again expect to start with a technician level and by lecture 5 or 6 diving into the fundamentals.

CarForumPoster fucked around with this message at 23:55 on Jul 30, 2021

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply