Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
CarForumPoster
Jun 26, 2013

⚡POWER⚡
Hey, I am trying to run my first EBS instance with a flask/dash app that runs fine locally. I am on windows and everything is bitching when I try to use the AWS CLI, so I am setting it up through the web browser. This is a file I downloaded a zip from GitHub for and then uploaded to EBS through a browser upload.

code:
[Mon Feb 04 19:32:21.434707 2019] [mpm_prefork:notice] [pid 4594] AH00163: Apache/2.4.37 (Amazon) mod_wsgi/3.5 Python/3.6.7 configured -- resuming normal operations
[Mon Feb 04 19:32:21.434727 2019] [core:notice] [pid 4594] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND'
[Mon Feb 04 19:32:24.209122 2019] [:error] [pid 4599] [remote 127.0.0.1:0] mod_wsgi (pid=4599): Target WSGI script '/opt/python/current/app/app-folder-name/application.py' cannot be loaded as Python module.
[Mon Feb 04 19:32:24.209181 2019] [:error] [pid 4599] [remote 127.0.0.1:0] mod_wsgi (pid=4599): Exception occurred processing WSGI script '/opt/python/current/app/app-folder-name/application.py'.
[Mon Feb 04 19:32:24.209391 2019] [:error] [pid 4599] [remote 127.0.0.1:0] Traceback (most recent call last):
[Mon Feb 04 19:32:24.209413 2019] [:error] [pid 4599] [remote 127.0.0.1:0]   File "/opt/python/current/app/app-folder-name/application.py", line 1, in <module>
----->   [Mon Feb 04 19:32:24.209417 2019] [:error] [pid 4599] [remote 127.0.0.1:0]     import dash
But there's a requirements.txt file that I thought would install automatically, and it says to install dash:

code:
-----> dash==0.35.2
dash-core-components==0.42.1
dash-html-components==0.13.5
dash-renderer==0.16.2
decorator==4.3.0
Am I able to force it to install requirements.txt somehow?

EDIT: Posting about it made AWS EB CLI work. Gonna try to ssh in now.

CarForumPoster fucked around with this message at 21:19 on Feb 4, 2019

Adbot
ADBOT LOVES YOU

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Docjowles posted:

This isn't really anything related to AWS. The requirements.txt file doesn't just magically do anything on its own. You need to do something like "pip install -r requirements.txt" first to actually download and install the dependencies. Then your app should work.

I thought EBS did that automagically when I uploaded my app. Apparently not. I'm SSH'd in as ec2-user, but ls/dir reveals no files. I know my requirements.txt is in

???Somewhere???/opt/python/current/app/app-folder-name/

....any idea where that is


EDIT: Posting about it made me not a dumbass and I installed it.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Ramrod Hotshot posted:

Our databases at work are hosted on AWS,

I've got two lists of names of companies.
a lot of them may have a partial match.
For example: Acme Holdings Inc. vs 12949 ACME INC##.

I know a little bit of coding in Python but not much.

How big is your list? If its on the order of ~100K rows fuzzy logic scoring might be a better solution. i.e. score the similarity of one string to another string. The fuzzywuzzy python package does that trivially, and in a couple lines of code, then you can rank them by similarity.

To get an idea of how that works:

https://datascience.stackexchange.com/questions/12575/similarity-between-two-words posted:

Informally, the Levenshtein distance between two words is the minimum number of single-character edits (i.e. insertions, deletions or substitutions) required to change one word into the other.


NLTK (also python) has some (IMO harder to use and understand) tools that do very similar things such as: edit_distance

Machine learning isn't magic and requires input data to model on. Fuzzywuzzy could help generate that input data if you need to do this on 100,000,000 rows and need something more robust (and perhaps a predictive model)

TLDR try fuzzywuzzy first loading your DB into memory using pandas.

FWIW I have an extremely similar problem, where two different web scraper data sources will report with slight variations of the same name, e.g. "SOMETHING AWFUL INC" and "Something Awful" and I use fuzzywuzzy with a minimum score to resolve that. Your two example strings are very different by Levenshtein distance and as someone who has used crowdsourced data labeling for this problem, I'd argue that most humans would NOT classify those as similar names.


EDIT:
If you want to know if any substring is located within any other string (e.g. "Acme"), thats actually pretty easy and fast.You could split the string using " " as a separator, then check for each of the string parts are they located in any of the columns containing

Something like (pulled out of my rear end)
code:
import pandas as pd

df = pd.read_sql("your data")
s = df["company_names_to_check_against"] #Make a pandas series that you can check for string matches
for index, row in df.iterrows():
	match_string = row["company"].split(" ")
		for substring in match_string:
			if s.isin([substring]).any(): # Check if any come back True as matching the substring
				#TODO: If they do, do something with that fact
			else: #Do something else
Could also check with fuzzywuzzy first then check if theres a partials string match and then do something if it both meets a threshold and has a partial string match.

CarForumPoster fucked around with this message at 17:08 on Feb 11, 2020

CarForumPoster
Jun 26, 2013

⚡POWER⚡
I have a DB hosted in RDS that I want to put an API in front of to serve content via a flask app served by elastic beanstalk. I may potentially give ~10 users access to the API as well.

Does AWS have some hilariously easy way to make a REST or similar JSON serving API before I get started on a new django (DRF) or maybe FastAPI project? This won't get a ton of traffic, I'd mostly be using it to generate reports server side that are served by a small flask app.

CarForumPoster fucked around with this message at 12:50 on Jul 7, 2020

CarForumPoster
Jun 26, 2013

⚡POWER⚡
Obvious one is a conventional web app type site on lambda/ebs with a regular ol sql db on RDS.

CarForumPoster
Jun 26, 2013

⚡POWER⚡
I got a silly question in the wrong thread. Anyone print from their AWS deployed code to a local printer? E.g. to run reports

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Thanks Ants posted:

I posted this in the printers thread, not sure if you saw it

https://www.printnode.com/en

I missed it somehow (I posted this same ? a while back) and this looks to be exactly what I want. Thanks so much!

CarForumPoster
Jun 26, 2013

⚡POWER⚡

my bitter bi rival posted:

Had a CodeDeploy fail this morning at the Download bundle step with no error. It look 70 minutes to fail and caused our ASG to choke in the meantime. Never seen this before. Opened a ticket with support, but has anyone seen anything like this before? Event Details is likely totally blank too.



Is it coming from bit bucket? They had an outage today.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

necrobobsledder posted:

really odd or misleading error messages that can happen and send you into an alternate dimension [...] Similar happens with S3 object and bucket policies.

Hear hear! My first bit of advice to my team on diagnosing weird AWS errors that aren't obvious are, do you have permissions to access what you want? Yes. How do you know? Did you test it? Yes. Escalate to support.

1 in 5 issues gets escalated.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Just-In-Timeberlake posted:

I'm trying to access a Lambda .netcore function via ALB vs the API Gateway (API gateway is the way it's currently accessed, and works) because API Gateway has a max timeout of 30 seconds, and there are times it will take longer, so ALB seems the way to get around this.

I'm not super knowledgeable about AWS but why would you use ALB to get around the Lambda function time exceeding the API Gateway timeout?

When I've had this problem, instead of returning what I want from my lambda function I return an ID that can be queried for its completed result, which I store somewhere.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

ledge posted:

Have you run the reachability analyzer? That worked for me when I was having trouble with network nonsense earlier today.

Not the OP but I didnt know about this, thanks for posting it!

CarForumPoster
Jun 26, 2013

⚡POWER⚡
Welding for money is a lot less "make cool beer sculpture" and a lot more "repair/install this pipe, girder, etc" by number of welders employed. IMO welding sewer pipe, conduit, pressure vessels, etc. is boring and hot. Same opinion about sitting at a sheet metal brake. Welding and machining stuff for fun is awesome, but I quit CNC machining to touch computers and wouldnt go back. I totally plan to have my own machine shop with a nice MIG and TIG setup if I ever sell my business or get rich enough to have that just sitting around. That said, two of my acquaintances are welders and really like it.

pissinthewind - If you want, check out the resume/interviewing thread in BFC. Lots of hiring managers helping people with career questions as well as resumes. To answer your question directly: no AWS is not in any way "played out". It seems like understanding and setting up systems to power internet services must not be of interest to you, or you'd probably know that already. If you're not interested in that subject, it's likely not a good career choice.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

cage-free egghead posted:

What are your guys' opinions on the AWS Resume Challenge for someone not quite done with school, has a bunch of years of IT support, the CCP, Cloud+, other certs?

I'd not head of this. Assuming that it is this: https://cloudresumechallenge.dev/docs/the-challenge/aws/

My thoughts are:
Why?
-and-
For what job?

If the why is "so that I know that I can do this" yea this project seems like a decent entry-level project. If a new-grad engineer 1 had this as a project on github with a readme explaining what it was and does, it'd count as a useful project to do with me.

If the why is "This website is how I will share my resume. This and only this." Thats a poo poo answer, do not do that.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

cage-free egghead posted:

I'm trying to get out of end user support and into any sort of cloud stuff. Getting close to finishing the Cloud Computer bachelors at WGU and came across this little challenge as a way to get my feet wet with some stuff, plus have a talking point for interviews and such.

Yes seems pretty good for that.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Hed posted:

I'm dumping files to S3 and on a schedule need to take all the new ones and convert them to custom avro and make avro files for every N files.

I have a Python function do bottle up and convert, is there a more elegant way to do this than s3 sync to a computer with a real file system and push it back? I've used Kinesis firehouse on ingest but don't see anything that could accomplish what I want.

Could schedule an AWS lambda function (e.g. cron or rate) to do it if the 15min timeout isn't an issue in your application.

It sounds like you dont want to trigger on each new file in the S3 bucket but if you did, AWS lets you trigger a lambda by adding a file to S3 p easily.

CarForumPoster
Jun 26, 2013

⚡POWER⚡
TIL I can get a desktop/GUI out from an AWS Lambda based docker container image. This makes diagnosing why some webscrapers are having issues much easier and maybe someone in this thread needs to know this.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Hed posted:

Interesting. Are you RDPing or shoving X down the pipe (lol that latency would be funnn)?

I access it through a browser w/vncserver and noVNC. I’m only doing this locally. I can post the dockerfile bits tomorrow if anyone’s interested. One annoying thing though I have to use an intermediate site in a browser to copy paste.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Hughmoris posted:

Does anyone use AWS Step Functions for anything, and if so, can you speak a little to your thoughts on it?

On the surface it sounds cool but I'm trying to think of a fun little personal project to solve with it.

I use them to limit the parallelization of my web scrapers and to do some of the business logic for the steps

Eg scrape website->are there leads?->enrich leads->assign sales person->add to crm

CarForumPoster
Jun 26, 2013

⚡POWER⚡
To get selenium + chrome going in a lambda based docker container, check out: https://github.com/umihico/docker-selenium-lambda
I use the shell scripts from gui-docker here: https://github.com/bandi13/gui-docker to get everything running.

Then to get a desktop GUI out from the local docker container + install some helpful things like a text editor you can:
code:
RUN grep PRETTY_NAME /etc/os-release
RUN yum update -y
RUN yum install -y amazon-linux-extras
RUN PYTHON=python2 amazon-linux-extras install mate-desktop1.x -y
RUN PYTHON=python2 amazon-linux-extras install epel -y
RUN bash -c 'echo PREFERRED=/usr/bin/mate-session > /etc/sysconfig/desktop'
RUN yum install tigervnc-server xterm git net-tools wget gedit nano -y
RUN printf "123456\n123456\n\n" | vncpasswd
ENV VNC_PASSWD=123456
RUN mkdir /etc/tigervnc
RUN bash -c 'echo localhost > /etc/tigervnc/vncserver-config-mandatory'
RUN cp /lib/systemd/system/vncserver@.service /etc/systemd/system/vncserver@.service
RUN git clone --branch v1.3.0 --single-branch https://github.com/novnc/noVNC.git /opt/noVNC
RUN git clone --branch v0.10.0 --single-branch https://github.com/novnc/websockify.git /opt/noVNC/utils/websockify
RUN ln -s /opt/noVNC/vnc.html /opt/noVNC/index.html
# Add in a health status
HEALTHCHECK --start-period=10s CMD bash -c "if [ "`pidof -x Xtigervnc | wc -l`" == "1" ]; then exit 0; else exit 1; fi"

# entrypoint for local testing
ENTRYPOINT ["/var/task/container_startup.sh"]
This has made finding some oddball errors from selenium MUCH easier.

Obvs this is a thing to be run locally only.

CarForumPoster
Jun 26, 2013

⚡POWER⚡
Why are you guys leaving? Image always heard turnover there is stupid high

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Scrapez posted:

Does AWS not pay well or just typical have to move to a different company to get a big bump in $$?

Yea same ?

AWS took my 3rd year CS student intern with a good GPA from a no name school at $33/hr. That's top of the pay scale and then some around here.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

ledge posted:

e.g. you get a "File can't be found" message if the eol is set to CRLF instead of just LF in your configuration files.

Hahahaha

CarForumPoster
Jun 26, 2013

⚡POWER⚡
I’ve not used aurora but big query has been great for my rarely used use case. I need to query these govt data sets totaling about 2TB across 50ish tables. It ends up costing me like $2 to do it but compared to the business value and the fact I only need to it a few times a month it’s an absolute steal, especially given it’s features.

Need to query the upper() with a regex? No problem. User defined functions? Yep.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

crazypenguin posted:

It will likely work fine. I’ve built plenty of internal apps on lambda.

You havent defined what acceptable performance means, but a python cold start should be short. Well under a second. (idk about Flask, but the answer is just: try it)

Once a lambda instance is warmed, it sticks around for awhile and serves quickly

Just schedule a keep warm function for every 4 minutes or so

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Falcon2001 posted:

In this case, the problem would be that it's a service that needs to work very quickly when we need it to - basically for realtime response stuff, otherwise I'd agree that's a pretty good approach.

For the others, it sounds like the idea is at least sane enough to get up to the 'testing' phase. 'Acceptable performance' mostly meant 'Is the latency going to be high enough that a user would find the delay irritating' and it doesn't sound like there's a significant problem here.

I just tested a Django site served by Lambda that’s got good backend features (DRF, some analytics) and a cheap RDS instance behind it but almost no content on the page I loaded. GTmetrix gave it a 100% A on us-east-1 from Vancouver. So Lambda will not be what makes it slow.

CarForumPoster
Jun 26, 2013

⚡POWER⚡
If you’re looking for a PMS for a law firm suggest Clio as well we were on Practice Panther, evaluated 6 others and migrated to Clio. Clios API is pretty easy to build on top of as well.

There’s CRMs/[ ] Management systems for every industry that have had millions invested into their development, many of which have APIs so you can extend them.

EDIT: to this day the easiest to dev and deploy dashboard to extend a CRM/productivity tool system I’ve seen is Plotlys Dash. Deploy to hobby tier heroku for $7 and push with git. Dead simple. Only downside is the free version only has HTTP basic auth.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Falcon2001 posted:

On the other hand, is there anything worse than 'The solution that we picked without consulting you doesn't do $VeryImportantThing, so your new process is going to be 50% manual and 50% in this new system"

That why you pick Clio and then roll your own things on top of it using its API.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Agrikk posted:

At the risk of starting something unhealthy:

Can we use lunchtime learnings instead of brown bag? Micro aggressions are a thing and I like to do my part to minimize them.

To avoid a derail, I’d be happy to talk more over PMs if you want.

Agree micro aggressions are a thing but it’s sure hard to see how this is one of them.

I think lunch and learn is a better term simply by being less confusing if you’ve never heard the term brown bag for learning g during lunch at work.

It’s really hard to see how the phrase brown bag as it applies to lunchtime learnings could have racial connotations even with the knowledge of a brown bag test apparently existing.

CarForumPoster fucked around with this message at 12:54 on Jul 17, 2022

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Thanks Ants posted:

They mean that you can't just smash a "like" button

Thanks I also didnt get it

also thank god we dont have a like button. We have :10bux: accountability and a population thats mostly between 25 and 45. Buy civility and Nissan GTRs since 1999!

CarForumPoster
Jun 26, 2013

⚡POWER⚡

StumblyWumbly posted:

I work on IoT devices, and we have a system developed a few years ago where we upload recordings to AWS, process the recordings on upload, and store the recording characteristics. Right now, this processing happens in an AWS Lambda running Python. This part of the system has been growing and growing, and occasionally, very rarely, run to the point where we just don't have the Lambda resources to handle some process. But, the main issue right now is that we use SciPy, and at the time we did our development AWS had a default lambda layer with SciPy. It looks like they've stopped updating that, so we're stuck with an old version of SciPy running on Python 3.7.

It seems like we should just make our own Lambda container image for this running whatever Python and SciPy/other libraries we want, as discussed here. is that a good solution? It seems like a bit off the beaten path for Lambdas so I'm worried support won't be great. I could run it on Fargate, but it seems like that might end up being a larger architectural change for the system, especially since our lambdas are not involved that often.

I could try building or buying our own SciPy lambda layer, but given how we use this I worry that layer size will become an issue some time soon.

You can deploy a docker container up to 10GB extremely easily using the SAM CLI. Very fast to set up. If you've got some existing configuration stuff for example extra secure gateways or EFS mounts, those will go in the template.yaml. You might need to do some googling to understand how to implement some current config stuff, but it's pretty easy.

This was a huge issue in 2018 that has basically been completely solved. FYI you can have more than 512MB in the /tmp directory now too and more memory (which also grants you more vCPU cores).

For example I have a ~5GB container that processes video with ffmpeg as well as runs selenium and chrome for webscraping. It has 3GB of memory and routinely writes files that might be a few hundred MBs.

Got another that has a full fastai install (pandas, scipy, pytorch and scikit-learn IIRC...multiple GB of dependencies.)

CarForumPoster fucked around with this message at 22:45 on Jul 26, 2022

CarForumPoster
Jun 26, 2013

⚡POWER⚡

StumblyWumbly posted:

Awesome, thanks! Are you running those in Lambdas or Fargate?

Lambdas

And in the case of the web scrapers I have a step function that does some business logic along the way calling different functions in the same lambda. This helps me do all my ETL which can easily exceed 15 minutes on a heavy day. Also handles failure notifications and what not. It also allows me to limit concurrency of scrapers though this has a double edged sword when it comes to unhandled errors.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Hughmoris posted:

Building a small Step Function State Machine sounds fun, I'll have to look into that. And yes, I can see how data lineage could turn into an absolute nightmare at business scale when you start having functions moving files around the environment.

I use step functions. Building them is meh but they are quite useful

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Docjowles posted:

I would put this into a Lambda or an ECS task and have EventBridge run that. Rather than having a whole EC2 instance just to run one command once a month

Same, pass it to Lambda. If whats needed is a series of functions that take longer than the 15 minute timeout, use step functions to run subfunctions of the same code. If it sounds tricky find a youtube on stepfunctions, they're pretty nice for "I want to do something daily/monthly but it takes an hour for all the steps". If you're over the "size limit" of a lambda you can use docker containers up to 10GB with lambdas.

If you use AWS SAM CLI to set up this workflow you can set up scheduling in the template.yaml file so it all "just works"

Adbot
ADBOT LOVES YOU

CarForumPoster
Jun 26, 2013

⚡POWER⚡

lazerwolf posted:

Is it a good practice to use container images for lambda functions? Seems to be the easiest way to handle dependencies. Are there any obvious downsides?

I use the AWS SAM CLI to do and deploy exactly this. I end up having to yum install a bunch in the container to get chrome and chrome driver running on Amazon Linux 2. It’d be harder or impossible to do this with zip files and layers.

CarForumPoster fucked around with this message at 02:41 on Mar 24, 2024

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply