Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Ciaphas
Nov 20, 2005

> BEWARE, COWARD :ovr:


nielsm posted:

Do those documents really have multiple root elements? If so, that's definitely a problem, and you'd have to wrap it in a new (fake) root element around the entire document minus XML declarations to make that part valid.

That case was an example of two different problem data files but now that you mention it, yes!

:suicide:

If I were higher on the totem pole I would have pushed back and said "This data freaking sucks and we can't do anything with it without more money and time" but c'est la vie.

Adbot
ADBOT LOVES YOU

Ciaphas
Nov 20, 2005

> BEWARE, COWARD :ovr:


New question only this time it's future proofing rather than an imminent problem. One document I got that they don't want me to work through--yet--looks like this (abstracted):
XML code:
<document>
  <val_1>foo</val_1>
  <val_2>bar</val_2>
  <val_3>baz</val_3>
  <!--etc to unknown limit-->
</document>
What XPath expression would I use to get all the val_X nodes in one go? Last I checked a * in XPath was just for namespace wildcards, but can it be used like *s on shells, like /document/val_* or something?

csammis
Aug 26, 2003

Mental Institution
XPath supports string functions like starts-with and contains. You'd want something like //*[starts-with(name(), 'val_')]

http://zvon.org/xxl/XPathTutorial/Output/example8.html

Ciaphas
Nov 20, 2005

> BEWARE, COWARD :ovr:


Brilliant, thanks.

gariig
Dec 31, 2004
Beaten into submission by my fiance
Pillbug
I'm looking for a simple HTTP server that I can run that will accept HTTP POSTs of files, save to disk based on the URL route, and host them as a GET command. So someone sends http://server/upload/foo/bar/baz.txt it will save to
/<uploaddir>/foo/bar/baz.txt and http://server/files/foo/bar/baz.txt will return the file. Having SSL and HTTP Basic Auth are needed as well. I could code something up, but I have to imagine this is a solved problem except I can't find anything besides HFS

TooMuchAbstraction
Oct 14, 2012

I spent four years making
Waves of Steel
Hell yes I'm going to turn my avatar into an ad for it.
Fun Shoe

gariig posted:

I'm looking for a simple HTTP server that I can run that will accept HTTP POSTs of files, save to disk based on the URL route, and host them as a GET command. So someone sends http://server/upload/foo/bar/baz.txt it will save to
/<uploaddir>/foo/bar/baz.txt and http://server/files/foo/bar/baz.txt will return the file. Having SSL and HTTP Basic Auth are needed as well. I could code something up, but I have to imagine this is a solved problem except I can't find anything besides HFS

Find a web framework that's easy to set up and use that? Django for example. It's stupid overkill but it gets the job done, plus then you know how to use a web framework if you need to make websites.

Or if overkill offends you, learn to write CGI scripts; it doesn't sound like what you want is all that complicated.

The Fool
Oct 16, 2003


gariig posted:

I'm looking for a simple HTTP server that I can run that will accept HTTP POSTs of files, save to disk based on the URL route, and host them as a GET command. So someone sends http://server/upload/foo/bar/baz.txt it will save to
/<uploaddir>/foo/bar/baz.txt and http://server/files/foo/bar/baz.txt will return the file. Having SSL and HTTP Basic Auth are needed as well. I could code something up, but I have to imagine this is a solved problem except I can't find anything besides HFS

I feel like node.js + express would be able to do this without too much additional work.

PRADA SLUT
Mar 14, 2006

Inexperienced,
heartless,
but even so
I can't think of where to put this, but maybe someone here has an idea.

I'm looking for software that will take an image, pick out all the QR codes in that image, and scan them, then either save or export the data in a way that they can be "handed off" to another app. The QR codes just contain random letters and numbers, and I'm looking at about 10-50 codes per image.

Optimally it could be run automatically, where it reads a directory or whatever for images and scans them as they come in.

Commercial software is fine.

csammis
Aug 26, 2003

Mental Institution
At my last job our ECM platform did this exact thing. We relied on a few third parties to do various OCR tasks...QR codes went through LEADTOOLS if I recall correctly but there are several heavy-lifting OCR engines that can do it. ABBYY FineReader is another.

If you want to look for software that does the job the keywords are "barcode recognition"

Colonel J
Jan 3, 2008
I'm wondering about floating point math. I know that floats can lose precision, but I'm unsure exactly how it happens. Can I do precise math on numbers separated by a thounsandth apart? How about a millionth?

Like let's say I'm working with topological data - GPS points of two things close by can be (e.g.) 47.675432 and 47.675435 . Can I reliably work with them or I should transform them somehow, e.g. by subtracting them by 47 first, bringin them closer to zero?

This would be in c++, but how about on the GPU in a shader? I tried searching a bit but I'm unsure what to look for. If there's a good guide on this stuff I'd like to read it. Thanks!

Colonel J fucked around with this message at 20:26 on Aug 13, 2016

pmchem
Jan 22, 2010


Colonel J posted:

If there's a good guide on this stuff I'd like to read it. Thanks!

https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

ultrafilter
Aug 23, 2007

It's okay if you have any questions.



This is much easier to read as a PDF.

sarehu
Apr 20, 2007

(call/cc call/cc)

Colonel J posted:

I'm wondering about floating point math. I know that floats can lose precision, but I'm unsure exactly how it happens. Can I do precise math on numbers separated by a thounsandth apart? How about a millionth?

Floats or doubles are numbers of the form 1.011010011... * 2^n, with either 23 or 52 bits of precision after the decimal point. (The digit before is always 1.). Basic math operations will round the exact result to the nearest representable value.

Colonel J posted:

Like let's say I'm working with topological data - GPS points of two things close by can be (e.g.) 47.675432 and 47.675435 . Can I reliably work with them or I should transform them somehow, e.g. by subtracting them by 47 first, bringin them closer to zero?

If that's useful, and it can be, you might as well subtract out something much closer than that, like their mean or one of the values. You'd want to use doubles because, well, single precision floats aren't precise enough, you can calculate that.

Colonel J posted:

This would be in c++, but how about on the GPU in a shader? I tried searching a bit but I'm unsure what to look for. If there's a good guide on this stuff I'd like to read it. Thanks!

Same thing. You could imagine starting with doubles and subtracting a local offset so that single-precision floats are good enough so that you can get better GPU performance for whatever it is you're doing, but I'd save that for when you're desperate.

Generally the time you need to worry is when accumulating a whole bunch of math operations or when doing a few risky ones. I think your limiting factor here is GPS accuracy -- let's say you're accurate to a meter. Then if you take the distance between two points, that could be off by up to 2 meters. Double precision is enough to represent lat/long values to 5 nanometers. Your numbers, for example, could represent true locations that differ somewhere between 0.000002 and 0.000004, depending which way they got rounded.

sarehu fucked around with this message at 22:40 on Aug 13, 2016

Eela6
May 25, 2007
Shredded Hen
Numerical analysis is a tricky subject. You're already ahead of the curve by thinking about it.

As mentioned above, generally speaking, numerical error accumulates over a series of steps.

A common source of 'dangerous' error occurs when subtracting two things are very close to each other.

If you have a ~ b and you run into a calculation like f(a,b, c) = ca-cb, you might get f*(a, b, c) = 0. This is bad!

My advice is to play around with 'worst case' scenarios in your implemention. Try it with single, double, and extended precision.

If you want to learn more, Numerical Analysis by Burden and Faires is a well respected introductory textbook on the subject.

Edit: just to be clear, I am by no means ant expert. Just took a few courses on numerical analysis in undergrad. That PDF linked above seems like a good reference!

Eela6 fucked around with this message at 05:24 on Aug 15, 2016

peramene
Oct 13, 2015

by Fluffdaddy
sorry to interrupt - "Small Excel Questions" has been archived fwiw

csammis
Aug 26, 2003

Mental Institution
OP updated to reflect that. I left the link to the archived thread but if someone wants to start a new one just PM me and I'll add the it.

LOOK I AM A TURTLE
May 22, 2003

"I'm actually a tortoise."
Grimey Drawer
There's been a "new" one for more than seven years: http://forums.somethingawful.com/showthread.php?threadid=3132163

csammis
Aug 26, 2003

Mental Institution
Okay, OP updated. If anyone has any more updates they think I should make just let me know. I don't go searching for new megathreads on a regular basis (clearly)

denzelcurrypower
Jan 28, 2011
Anyone know of a good way to get into ethical hacking or learning about security in general? It's hard to determine what resources are good. I've been coding in Java for about a year but haven't gotten into any security features.

Bob Morales
Aug 18, 2006


Just wear the fucking mask, Bob

I don't care how many people I probably infected with COVID-19 while refusing to wear a mask, my comfort is far more important than the health and safety of everyone around me!

Ornithology posted:

Anyone know of a good way to get into ethical hacking or learning about security in general? It's hard to determine what resources are good. I've been coding in Java for about a year but haven't gotten into any security features.

Huge topic so there's a huge amount of material out there.

You have broader topics like reverse engineering software, disassembling programs, using debuggers, packet tracers

Then you have books that detail exploits (that have since been patched) for Linux, Windows, Browsers, Phones...

Then you've got books on things like social engineering, phishing, the 'soft' side of security

Reading those books kind of just gives you a little more info than a script kiddie. To really 'learn' you have have a deep understanding of your particular tools and platforms which you'll gain by really digging into things yourself, and understanding how all the underlying tech works (encryption, networking, how the OS works with memory and processes...)

Edit: enter 'reverse engineering' into Amazon and look at the results and the top-rated books in those categories are pretty safe bets

Stinky_Pete
Aug 16, 2015

Stinkier than your average bear
Lipstick Apathy

Ornithology posted:

Anyone know of a good way to get into ethical hacking or learning about security in general? It's hard to determine what resources are good. I've been coding in Java for about a year but haven't gotten into any security features.

hackthissite.com offers a good hands-on introduction. You may also want to install Kali Linux, which is a pentesting distro with a bunch of tools to play with and learn about

But yeah security is a tough field and relies on a detailed understanding of memory on the stack and all sorts of soul-crushing (for most) stuff, so pick up a few books on the subject and get ready to look at a lot of assembly, probably.

Here's a sample syllabus I found https://www.cis.upenn.edu/~cse331/

Linear Zoetrope
Nov 28, 2011

A hero must cook
I still don't understand how buffer overflows that allow the user to execute arbitrary code work. I mean, the concept is straightforward, but I have no idea how I'd be able to consistently target the instruction memory I'd want to overwrite. If you actually have a copy of the program sitting around, sure, you can run a debugger on it, and since programs generally execute pretty consistently you can target the overrun. On a remote program handling multiple connections allocating memory in unpredictable addresses? No clue how people manage it. If I had to guess you just inject the first <N> bytes with your malicious code and then just put in an absurd (i.e. megabyte sized or more) number of repeating "unconditionally jump to <start of malicious block>" statements afterwards. (Though if that's the case I'm not sure how you'd get the address of the start of the block it needs to jump to).

E: This is specifically on execution of arbitrary code via overruns, getting programs to just print out a dump of the values of data you want is much easier to understand for me.

moctopus
Nov 28, 2005

Does anyone know where I can get large sets of data? I feel like making an API around something, but I don't have any interesting data.

Stinky_Pete
Aug 16, 2015

Stinkier than your average bear
Lipstick Apathy

moctopus posted:

Does anyone know where I can get large sets of data? I feel like making an API around something, but I don't have any interesting data.

What would the API do?

Linear Zoetrope
Nov 28, 2011

A hero must cook

moctopus posted:

Does anyone know where I can get large sets of data? I feel like making an API around something, but I don't have any interesting data.

Going to have to be more specific here. Like, you can get reams of data for machine learning in the UCI repository, though a lot of times you need to massage it since they're not always consistently formatted. You can also find a lot of massive image datasets just by searching neural net challenges, even if you don't want to actually do any machine learning/NN stuff those can be useful. Synthetic user data (e.g. fake bank records or UN/Password combos or whatever) I'm sure exist, but I don't know them off the top of my head.

moctopus
Nov 28, 2005

Jsor posted:

Going to have to be more specific here. Like, you can get reams of data for machine learning in the UCI repository, though a lot of times you need to massage it since they're not always consistently formatted. You can also find a lot of massive image datasets just by searching neural net challenges, even if you don't want to actually do any machine learning/NN stuff those can be useful. Synthetic user data (e.g. fake bank records or UN/Password combos or whatever) I'm sure exist, but I don't know them off the top of my head.

Thanks!

Stinky_Pete posted:

What would the API do?


I know I was being terribly vague, but I was going to start with the data and figure something out around it.

I guess I got to "big heap of data that I can query" and didn't think further.

I'd like it to be not made up data though.

Stinky_Pete
Aug 16, 2015

Stinkier than your average bear
Lipstick Apathy

Jsor posted:

I still don't understand how buffer overflows that allow the user to execute arbitrary code work. I mean, the concept is straightforward, but I have no idea how I'd be able to consistently target the instruction memory I'd want to overwrite. If you actually have a copy of the program sitting around, sure, you can run a debugger on it, and since programs generally execute pretty consistently you can target the overrun. On a remote program handling multiple connections allocating memory in unpredictable addresses? No clue how people manage it. If I had to guess you just inject the first <N> bytes with your malicious code and then just put in an absurd (i.e. megabyte sized or more) number of repeating "unconditionally jump to <start of malicious block>" statements afterwards. (Though if that's the case I'm not sure how you'd get the address of the start of the block it needs to jump to).

E: This is specifically on execution of arbitrary code via overruns, getting programs to just print out a dump of the values of data you want is much easier to understand for me.

The general answer is they don't, and it's hard and takes many many man-months of work to figure out an exploit for a given application

Here's a paper

https://www.sans.org/reading-room/whitepapers/securecode/buffer-overflow-attack-mechanism-method-prevention-386

Stinky_Pete
Aug 16, 2015

Stinkier than your average bear
Lipstick Apathy

moctopus posted:

Thanks!



I know I was being terribly vague, but I was going to start with the data and figure something out around it.

I guess I got to "big heap of data that I can query" and didn't think further.

I'd like it to be not made up data though.

I'd recommend something in the UCI repo that jsor posted. You'll have to set up your own database and populate it with the data if you want to query it, though.

Edit: Actually, check out the BLS. They have tons of data that could be made easier to access in code

Stinky_Pete fucked around with this message at 23:30 on Aug 17, 2016

moctopus
Nov 28, 2005

Stinky_Pete posted:

I'd recommend something in the UCI repo that jsor posted. You'll have to set up your own database and populate it with the data if you want to query it, though.

That was what I was planning on doing anyway so this is looking pretty good.

Thanks again guys!

Peristalsis
Apr 5, 2004
Move along.

moctopus posted:

Does anyone know where I can get large sets of data? I feel like making an API around something, but I don't have any interesting data.

You should work for my new boss.

Boss: "We need a new program!"
Me: "To do what?"
Boss: "It should be written in Python, and use key-value pairs!"
Me: "Is there a spec, or some kind of use case?"
Boss: "This is agile development, the requirements come later!"
Me: "Okay, I'll start writing something. I'll have moctopus bring some data."

moctopus
Nov 28, 2005

Peristalsis posted:

You should work for my new boss.

Boss: "We need a new program!"
Me: "To do what?"
Boss: "It should be written in Python, and use key-value pairs!"
Me: "Is there a spec, or some kind of use case?"
Boss: "This is agile development, the requirements come later!"
Me: "Okay, I'll start writing something. I'll have moctopus bring some data."

I have found locations of toilets in Moscow and pornhub comments.

ExcessBLarg!
Sep 1, 2001

Jsor posted:

I still don't understand how buffer overflows that allow the user to execute arbitrary code work. ... On a remote program handling multiple connections allocating memory in unpredictable addresses?
In the classical buffer overflow scenario the buffer is located on the call stack and, historically, the call stack address is deterministic. Even with multiple threads, many programs maintain a thread pool of fixed size so the process is still deterministic. This is why address space layout randomization (ASLR) was/is such a big deal, as it made the location of call stacks non-deterministic and much more difficult to exploit.

Jsor posted:

If I had to guess you just inject the first <N> bytes with your malicious code and then just put in an absurd (i.e. megabyte sized or more) number of repeating "unconditionally jump to <start of malicious block>" statements afterwards.
By the way, this is called a NOP slide and 1 MB worth is hardly absurd.

JawKnee
Mar 24, 2007





You'll take the ride to leave this town along that yellow line

moctopus posted:

I have found locations of toilets in Moscow and pornhub comments.

how many toilets are there in the comments?

Stinky_Pete
Aug 16, 2015

Stinkier than your average bear
Lipstick Apathy

JawKnee posted:

how many toilets are there in the comments?

there should be an API function for that

JawnV6
Jul 4, 2004

So hot ...

Jsor posted:

I still don't understand how buffer overflows that allow the user to execute arbitrary code work.
Some practical exercises can be found at: https://microcorruption.com/

Jsor posted:

programs generally execute pretty consistently
You're wrong about this, but I'm not sure how.

fritz
Jul 26, 2003

moctopus posted:

Does anyone know where I can get large sets of data? I feel like making an API around something, but I don't have any interesting data.

How about : http://www.planetary.org/blogs/guest-blogs/2016/hubble-series-1-how-to-find-hubble-data.html

dupersaurus
Aug 1, 2012

Futurism was an art movement where dudes were all 'CARS ARE COOL AND THE PAST IS FOR CHUMPS. LET'S DRAW SOME CARS.'

Jsor posted:

I still don't understand how buffer overflows that allow the user to execute arbitrary code work. I mean, the concept is straightforward, but I have no idea how I'd be able to consistently target the instruction memory I'd want to overwrite. If you actually have a copy of the program sitting around, sure, you can run a debugger on it, and since programs generally execute pretty consistently you can target the overrun. On a remote program handling multiple connections allocating memory in unpredictable addresses? No clue how people manage it. If I had to guess you just inject the first <N> bytes with your malicious code and then just put in an absurd (i.e. megabyte sized or more) number of repeating "unconditionally jump to <start of malicious block>" statements afterwards. (Though if that's the case I'm not sure how you'd get the address of the start of the block it needs to jump to).

E: This is specifically on execution of arbitrary code via overruns, getting programs to just print out a dump of the values of data you want is much easier to understand for me.

On phone so can't really link but Computerphile on YouTube did a video on it recently that explained it well.

ultrafilter
Aug 23, 2007

It's okay if you have any questions.


moctopus posted:

Does anyone know where I can get large sets of data? I feel like making an API around something, but I don't have any interesting data.

If you want really large data sets, Amazon's Large Data Sets Reposity is right up your alley.

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe

Jsor posted:

I still don't understand how buffer overflows that allow the user to execute arbitrary code work. I mean, the concept is straightforward, but I have no idea how I'd be able to consistently target the instruction memory I'd want to overwrite. If you actually have a copy of the program sitting around, sure, you can run a debugger on it, and since programs generally execute pretty consistently you can target the overrun. On a remote program handling multiple connections allocating memory in unpredictable addresses? No clue how people manage it.

By the time people were exploiting remote programs, ASLR and W^X were in common practice. Before that, you could pretty much guarantee where your data would be loaded in memory statically.

Jsor posted:

If I had to guess you just inject the first <N> bytes with your malicious code and then just put in an absurd (i.e. megabyte sized or more) number of repeating "unconditionally jump to <start of malicious block>" statements afterwards. (Though if that's the case I'm not sure how you'd get the address of the start of the block it needs to jump to).

E: This is specifically on execution of arbitrary code via overruns, getting programs to just print out a dump of the values of data you want is much easier to understand for me.

This is known as a heap spray and was what was used to defeat ASLR before W^X came into common use.

W^X stops people from executing memory that is also mapped writable, so you can't just jump to any writable memory.

The latest technique is to use every set of "instruction, ret" in the original compiled code -- these are called "gadgets", and you chain them together by pushing all of them onto the address stack. This allows you to chain together a bootstrap program that writes the data you want, maps a page as executable and not writable, and jumps to it. There are programs to generate ROP chains for you now, even.

You might think this would be unrealistic, but reminder that in x86 you can jump to unaligned instructions, so you basically need to find any instruction followed by "C2", "C3", "CA" or "CB", even if it's part of a larger data block.

Adbot
ADBOT LOVES YOU

JawnV6
Jul 4, 2004

So hot ...

Suspicious Dish posted:

You might think this would be unrealistic, but reminder that in x86 you can jump to unaligned instructions, so you basically need to find any instruction followed by "C2", "C3", "CA" or "CB", even if it's part of a larger data block.
https://twitter.com/_drew_davidson/status/743451922657587200

Suspicious Dish posted:

By the time people were exploiting remote programs, ASLR and W^X were in common practice. Before that, you could pretty much guarantee where your data would be loaded in memory statically.
Fun to think about new features reducing ASLR's efficacy: Dedup Est Machina: Memory Deduplication as an Advanced Exploitation Vector. Craft pages that look like one with a guess as the randomized portion, write and see if it had been deduped.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply