|
nielsm posted:Do those documents really have multiple root elements? If so, that's definitely a problem, and you'd have to wrap it in a new (fake) root element around the entire document minus XML declarations to make that part valid. That case was an example of two different problem data files but now that you mention it, yes! If I were higher on the totem pole I would have pushed back and said "This data freaking sucks and we can't do anything with it without more money and time" but c'est la vie.
|
# ? Aug 10, 2016 18:35 |
|
|
# ? May 15, 2024 03:11 |
|
New question only this time it's future proofing rather than an imminent problem. One document I got that they don't want me to work through--yet--looks like this (abstracted):XML code:
|
# ? Aug 10, 2016 18:39 |
|
XPath supports string functions like starts-with and contains. You'd want something like //*[starts-with(name(), 'val_')] http://zvon.org/xxl/XPathTutorial/Output/example8.html
|
# ? Aug 10, 2016 18:48 |
|
Brilliant, thanks.
|
# ? Aug 10, 2016 18:58 |
|
I'm looking for a simple HTTP server that I can run that will accept HTTP POSTs of files, save to disk based on the URL route, and host them as a GET command. So someone sends http://server/upload/foo/bar/baz.txt it will save to /<uploaddir>/foo/bar/baz.txt and http://server/files/foo/bar/baz.txt will return the file. Having SSL and HTTP Basic Auth are needed as well. I could code something up, but I have to imagine this is a solved problem except I can't find anything besides HFS
|
# ? Aug 11, 2016 18:58 |
|
gariig posted:I'm looking for a simple HTTP server that I can run that will accept HTTP POSTs of files, save to disk based on the URL route, and host them as a GET command. So someone sends http://server/upload/foo/bar/baz.txt it will save to Find a web framework that's easy to set up and use that? Django for example. It's stupid overkill but it gets the job done, plus then you know how to use a web framework if you need to make websites. Or if overkill offends you, learn to write CGI scripts; it doesn't sound like what you want is all that complicated.
|
# ? Aug 11, 2016 20:36 |
|
gariig posted:I'm looking for a simple HTTP server that I can run that will accept HTTP POSTs of files, save to disk based on the URL route, and host them as a GET command. So someone sends http://server/upload/foo/bar/baz.txt it will save to I feel like node.js + express would be able to do this without too much additional work.
|
# ? Aug 11, 2016 20:45 |
|
I can't think of where to put this, but maybe someone here has an idea. I'm looking for software that will take an image, pick out all the QR codes in that image, and scan them, then either save or export the data in a way that they can be "handed off" to another app. The QR codes just contain random letters and numbers, and I'm looking at about 10-50 codes per image. Optimally it could be run automatically, where it reads a directory or whatever for images and scans them as they come in. Commercial software is fine.
|
# ? Aug 12, 2016 15:16 |
|
At my last job our ECM platform did this exact thing. We relied on a few third parties to do various OCR tasks...QR codes went through LEADTOOLS if I recall correctly but there are several heavy-lifting OCR engines that can do it. ABBYY FineReader is another. If you want to look for software that does the job the keywords are "barcode recognition"
|
# ? Aug 12, 2016 15:28 |
|
I'm wondering about floating point math. I know that floats can lose precision, but I'm unsure exactly how it happens. Can I do precise math on numbers separated by a thounsandth apart? How about a millionth? Like let's say I'm working with topological data - GPS points of two things close by can be (e.g.) 47.675432 and 47.675435 . Can I reliably work with them or I should transform them somehow, e.g. by subtracting them by 47 first, bringin them closer to zero? This would be in c++, but how about on the GPU in a shader? I tried searching a bit but I'm unsure what to look for. If there's a good guide on this stuff I'd like to read it. Thanks! Colonel J fucked around with this message at 20:26 on Aug 13, 2016 |
# ? Aug 13, 2016 20:19 |
|
Colonel J posted:If there's a good guide on this stuff I'd like to read it. Thanks! https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
|
# ? Aug 13, 2016 20:31 |
|
This is much easier to read as a PDF.
|
# ? Aug 13, 2016 20:50 |
|
Colonel J posted:I'm wondering about floating point math. I know that floats can lose precision, but I'm unsure exactly how it happens. Can I do precise math on numbers separated by a thounsandth apart? How about a millionth? Floats or doubles are numbers of the form 1.011010011... * 2^n, with either 23 or 52 bits of precision after the decimal point. (The digit before is always 1.). Basic math operations will round the exact result to the nearest representable value. Colonel J posted:Like let's say I'm working with topological data - GPS points of two things close by can be (e.g.) 47.675432 and 47.675435 . Can I reliably work with them or I should transform them somehow, e.g. by subtracting them by 47 first, bringin them closer to zero? If that's useful, and it can be, you might as well subtract out something much closer than that, like their mean or one of the values. You'd want to use doubles because, well, single precision floats aren't precise enough, you can calculate that. Colonel J posted:This would be in c++, but how about on the GPU in a shader? I tried searching a bit but I'm unsure what to look for. If there's a good guide on this stuff I'd like to read it. Thanks! Same thing. You could imagine starting with doubles and subtracting a local offset so that single-precision floats are good enough so that you can get better GPU performance for whatever it is you're doing, but I'd save that for when you're desperate. Generally the time you need to worry is when accumulating a whole bunch of math operations or when doing a few risky ones. I think your limiting factor here is GPS accuracy -- let's say you're accurate to a meter. Then if you take the distance between two points, that could be off by up to 2 meters. Double precision is enough to represent lat/long values to 5 nanometers. Your numbers, for example, could represent true locations that differ somewhere between 0.000002 and 0.000004, depending which way they got rounded. sarehu fucked around with this message at 22:40 on Aug 13, 2016 |
# ? Aug 13, 2016 22:28 |
Numerical analysis is a tricky subject. You're already ahead of the curve by thinking about it. As mentioned above, generally speaking, numerical error accumulates over a series of steps. A common source of 'dangerous' error occurs when subtracting two things are very close to each other. If you have a ~ b and you run into a calculation like f(a,b, c) = ca-cb, you might get f*(a, b, c) = 0. This is bad! My advice is to play around with 'worst case' scenarios in your implemention. Try it with single, double, and extended precision. If you want to learn more, Numerical Analysis by Burden and Faires is a well respected introductory textbook on the subject. Edit: just to be clear, I am by no means ant expert. Just took a few courses on numerical analysis in undergrad. That PDF linked above seems like a good reference! Eela6 fucked around with this message at 05:24 on Aug 15, 2016 |
|
# ? Aug 15, 2016 05:20 |
|
sorry to interrupt - "Small Excel Questions" has been archived fwiw
|
# ? Aug 16, 2016 06:25 |
|
OP updated to reflect that. I left the link to the archived thread but if someone wants to start a new one just PM me and I'll add the it.
|
# ? Aug 16, 2016 14:50 |
|
There's been a "new" one for more than seven years: http://forums.somethingawful.com/showthread.php?threadid=3132163
|
# ? Aug 16, 2016 14:55 |
|
Okay, OP updated. If anyone has any more updates they think I should make just let me know. I don't go searching for new megathreads on a regular basis (clearly)
|
# ? Aug 16, 2016 15:13 |
Anyone know of a good way to get into ethical hacking or learning about security in general? It's hard to determine what resources are good. I've been coding in Java for about a year but haven't gotten into any security features.
|
|
# ? Aug 17, 2016 18:33 |
|
Ornithology posted:Anyone know of a good way to get into ethical hacking or learning about security in general? It's hard to determine what resources are good. I've been coding in Java for about a year but haven't gotten into any security features. Huge topic so there's a huge amount of material out there. You have broader topics like reverse engineering software, disassembling programs, using debuggers, packet tracers Then you have books that detail exploits (that have since been patched) for Linux, Windows, Browsers, Phones... Then you've got books on things like social engineering, phishing, the 'soft' side of security Reading those books kind of just gives you a little more info than a script kiddie. To really 'learn' you have have a deep understanding of your particular tools and platforms which you'll gain by really digging into things yourself, and understanding how all the underlying tech works (encryption, networking, how the OS works with memory and processes...) Edit: enter 'reverse engineering' into Amazon and look at the results and the top-rated books in those categories are pretty safe bets
|
# ? Aug 17, 2016 19:06 |
|
Ornithology posted:Anyone know of a good way to get into ethical hacking or learning about security in general? It's hard to determine what resources are good. I've been coding in Java for about a year but haven't gotten into any security features. hackthissite.com offers a good hands-on introduction. You may also want to install Kali Linux, which is a pentesting distro with a bunch of tools to play with and learn about But yeah security is a tough field and relies on a detailed understanding of memory on the stack and all sorts of soul-crushing (for most) stuff, so pick up a few books on the subject and get ready to look at a lot of assembly, probably. Here's a sample syllabus I found https://www.cis.upenn.edu/~cse331/
|
# ? Aug 17, 2016 22:55 |
|
I still don't understand how buffer overflows that allow the user to execute arbitrary code work. I mean, the concept is straightforward, but I have no idea how I'd be able to consistently target the instruction memory I'd want to overwrite. If you actually have a copy of the program sitting around, sure, you can run a debugger on it, and since programs generally execute pretty consistently you can target the overrun. On a remote program handling multiple connections allocating memory in unpredictable addresses? No clue how people manage it. If I had to guess you just inject the first <N> bytes with your malicious code and then just put in an absurd (i.e. megabyte sized or more) number of repeating "unconditionally jump to <start of malicious block>" statements afterwards. (Though if that's the case I'm not sure how you'd get the address of the start of the block it needs to jump to). E: This is specifically on execution of arbitrary code via overruns, getting programs to just print out a dump of the values of data you want is much easier to understand for me.
|
# ? Aug 17, 2016 23:06 |
|
Does anyone know where I can get large sets of data? I feel like making an API around something, but I don't have any interesting data.
|
# ? Aug 17, 2016 23:10 |
|
moctopus posted:Does anyone know where I can get large sets of data? I feel like making an API around something, but I don't have any interesting data. What would the API do?
|
# ? Aug 17, 2016 23:12 |
|
moctopus posted:Does anyone know where I can get large sets of data? I feel like making an API around something, but I don't have any interesting data. Going to have to be more specific here. Like, you can get reams of data for machine learning in the UCI repository, though a lot of times you need to massage it since they're not always consistently formatted. You can also find a lot of massive image datasets just by searching neural net challenges, even if you don't want to actually do any machine learning/NN stuff those can be useful. Synthetic user data (e.g. fake bank records or UN/Password combos or whatever) I'm sure exist, but I don't know them off the top of my head.
|
# ? Aug 17, 2016 23:13 |
|
Jsor posted:Going to have to be more specific here. Like, you can get reams of data for machine learning in the UCI repository, though a lot of times you need to massage it since they're not always consistently formatted. You can also find a lot of massive image datasets just by searching neural net challenges, even if you don't want to actually do any machine learning/NN stuff those can be useful. Synthetic user data (e.g. fake bank records or UN/Password combos or whatever) I'm sure exist, but I don't know them off the top of my head. Thanks! Stinky_Pete posted:What would the API do? I know I was being terribly vague, but I was going to start with the data and figure something out around it. I guess I got to "big heap of data that I can query" and didn't think further. I'd like it to be not made up data though.
|
# ? Aug 17, 2016 23:18 |
|
Jsor posted:I still don't understand how buffer overflows that allow the user to execute arbitrary code work. I mean, the concept is straightforward, but I have no idea how I'd be able to consistently target the instruction memory I'd want to overwrite. If you actually have a copy of the program sitting around, sure, you can run a debugger on it, and since programs generally execute pretty consistently you can target the overrun. On a remote program handling multiple connections allocating memory in unpredictable addresses? No clue how people manage it. If I had to guess you just inject the first <N> bytes with your malicious code and then just put in an absurd (i.e. megabyte sized or more) number of repeating "unconditionally jump to <start of malicious block>" statements afterwards. (Though if that's the case I'm not sure how you'd get the address of the start of the block it needs to jump to). The general answer is they don't, and it's hard and takes many many man-months of work to figure out an exploit for a given application Here's a paper https://www.sans.org/reading-room/whitepapers/securecode/buffer-overflow-attack-mechanism-method-prevention-386
|
# ? Aug 17, 2016 23:22 |
|
moctopus posted:Thanks! I'd recommend something in the UCI repo that jsor posted. You'll have to set up your own database and populate it with the data if you want to query it, though. Edit: Actually, check out the BLS. They have tons of data that could be made easier to access in code Stinky_Pete fucked around with this message at 23:30 on Aug 17, 2016 |
# ? Aug 17, 2016 23:24 |
|
Stinky_Pete posted:I'd recommend something in the UCI repo that jsor posted. You'll have to set up your own database and populate it with the data if you want to query it, though. That was what I was planning on doing anyway so this is looking pretty good. Thanks again guys!
|
# ? Aug 17, 2016 23:26 |
|
moctopus posted:Does anyone know where I can get large sets of data? I feel like making an API around something, but I don't have any interesting data. You should work for my new boss. Boss: "We need a new program!" Me: "To do what?" Boss: "It should be written in Python, and use key-value pairs!" Me: "Is there a spec, or some kind of use case?" Boss: "This is agile development, the requirements come later!" Me: "Okay, I'll start writing something. I'll have moctopus bring some data."
|
# ? Aug 18, 2016 00:31 |
|
Peristalsis posted:You should work for my new boss. I have found locations of toilets in Moscow and pornhub comments.
|
# ? Aug 18, 2016 00:31 |
|
Jsor posted:I still don't understand how buffer overflows that allow the user to execute arbitrary code work. ... On a remote program handling multiple connections allocating memory in unpredictable addresses? Jsor posted:If I had to guess you just inject the first <N> bytes with your malicious code and then just put in an absurd (i.e. megabyte sized or more) number of repeating "unconditionally jump to <start of malicious block>" statements afterwards.
|
# ? Aug 18, 2016 00:40 |
|
moctopus posted:I have found locations of toilets in Moscow and pornhub comments. how many toilets are there in the comments?
|
# ? Aug 18, 2016 00:41 |
|
JawKnee posted:how many toilets are there in the comments? there should be an API function for that
|
# ? Aug 18, 2016 02:06 |
|
Jsor posted:I still don't understand how buffer overflows that allow the user to execute arbitrary code work. Jsor posted:programs generally execute pretty consistently
|
# ? Aug 18, 2016 02:29 |
|
moctopus posted:Does anyone know where I can get large sets of data? I feel like making an API around something, but I don't have any interesting data. How about : http://www.planetary.org/blogs/guest-blogs/2016/hubble-series-1-how-to-find-hubble-data.html
|
# ? Aug 18, 2016 03:45 |
|
Jsor posted:I still don't understand how buffer overflows that allow the user to execute arbitrary code work. I mean, the concept is straightforward, but I have no idea how I'd be able to consistently target the instruction memory I'd want to overwrite. If you actually have a copy of the program sitting around, sure, you can run a debugger on it, and since programs generally execute pretty consistently you can target the overrun. On a remote program handling multiple connections allocating memory in unpredictable addresses? No clue how people manage it. If I had to guess you just inject the first <N> bytes with your malicious code and then just put in an absurd (i.e. megabyte sized or more) number of repeating "unconditionally jump to <start of malicious block>" statements afterwards. (Though if that's the case I'm not sure how you'd get the address of the start of the block it needs to jump to). On phone so can't really link but Computerphile on YouTube did a video on it recently that explained it well.
|
# ? Aug 18, 2016 03:52 |
|
moctopus posted:Does anyone know where I can get large sets of data? I feel like making an API around something, but I don't have any interesting data. If you want really large data sets, Amazon's Large Data Sets Reposity is right up your alley.
|
# ? Aug 18, 2016 03:56 |
|
Jsor posted:I still don't understand how buffer overflows that allow the user to execute arbitrary code work. I mean, the concept is straightforward, but I have no idea how I'd be able to consistently target the instruction memory I'd want to overwrite. If you actually have a copy of the program sitting around, sure, you can run a debugger on it, and since programs generally execute pretty consistently you can target the overrun. On a remote program handling multiple connections allocating memory in unpredictable addresses? No clue how people manage it. By the time people were exploiting remote programs, ASLR and W^X were in common practice. Before that, you could pretty much guarantee where your data would be loaded in memory statically. Jsor posted:If I had to guess you just inject the first <N> bytes with your malicious code and then just put in an absurd (i.e. megabyte sized or more) number of repeating "unconditionally jump to <start of malicious block>" statements afterwards. (Though if that's the case I'm not sure how you'd get the address of the start of the block it needs to jump to). This is known as a heap spray and was what was used to defeat ASLR before W^X came into common use. W^X stops people from executing memory that is also mapped writable, so you can't just jump to any writable memory. The latest technique is to use every set of "instruction, ret" in the original compiled code -- these are called "gadgets", and you chain them together by pushing all of them onto the address stack. This allows you to chain together a bootstrap program that writes the data you want, maps a page as executable and not writable, and jumps to it. There are programs to generate ROP chains for you now, even. You might think this would be unrealistic, but reminder that in x86 you can jump to unaligned instructions, so you basically need to find any instruction followed by "C2", "C3", "CA" or "CB", even if it's part of a larger data block.
|
# ? Aug 18, 2016 04:11 |
|
|
# ? May 15, 2024 03:11 |
|
Suspicious Dish posted:You might think this would be unrealistic, but reminder that in x86 you can jump to unaligned instructions, so you basically need to find any instruction followed by "C2", "C3", "CA" or "CB", even if it's part of a larger data block. Suspicious Dish posted:By the time people were exploiting remote programs, ASLR and W^X were in common practice. Before that, you could pretty much guarantee where your data would be loaded in memory statically.
|
# ? Aug 18, 2016 05:06 |