Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
ShoulderDaemon
Oct 9, 2003
support goon fund
Taco Defender

TheEffect posted:

If I change the "from" e-mail address to my boss's e-mail then it sends from his account without me knowing his password or anything like that. This seems to be a major security risk. How does this work?

Yep, this is how email works. Sender addresses are essentially unchecked; anybody can send email appearing to be from anybody else.

There are systems (such as SPF) which can ensure that at least the mail originated from a machine which is "supposed" to be able to send email for a given domain, and you can then have policies in place on all such machines (such as SMTP auth) which ensure that passwords are required to send email or whatever, but that all has to be configured specifically on a per-site basis. You should probably talk to your Exchange admin if you're concerned that Exchange isn't properly authenticating outgoing emails.

Adbot
ADBOT LOVES YOU

ShoulderDaemon
Oct 9, 2003
support goon fund
Taco Defender

LeftistMuslimObama posted:

Before I run off in the wrong direction, can someone tell me whether my approach is wrong?
You should probably pursue a depth-first search strategy rather than breadth-first. There's no reason to exhaustively produce a complete game tree; just try to find a single winning path as quickly as possible.

Also,

LeftistMuslimObama posted:

To prove the "no unwinnable deals" thing, I could also program it to play over every permutation of 52 cards dealt on a table and return whether it met an unwinnable game or not.
The number of permutations of 52 cards is 52!=80658175170943878571660636856403766975289505440883277824000000000000. That's more than 2225. If you could find solutions to a billion games per second, it would take you more than 2555957927300773166932059125670344664600395660872775 years to make it through all those permutations. I wish you the best of luck.

Proving that a nontrivial solitare game is universally winnable typically depends on analytic methods rather than brute force; for card games, you might begin by removing the symmetry caused by suits or whatever, identify all the ways that a player can "get stuck", and then trying to prove that there exists a strategy which avoids all possible ways of getting stuck and does not simply loop back to an earlier game state - such a strategy is always "making progress", and if you make progress consistently then you eventually win.

There are unwinnable FreeCell deals, but they are rare. AFAIK, there also does not exist a single strategy which is guaranteed to solve all winnable FreeCell games without backtracking.

ShoulderDaemon
Oct 9, 2003
support goon fund
Taco Defender

GrumpyDoctor posted:

I really want to know the solution that isn't a thicket of conditional logic.

Volmarias posted:

Also perhaps not limiting it to 6 coins and 3 moves.

In principle, you can always get a solution for N coins in ceil(ln2(N)) experiments by recursive division.

code:
divideList :: [a] -> ([a], [a], [a])
divideList = divideList' [] []
  where
    divideList' as bs [] = (as, bs, [])
    divideList' as bs [x] = (as, bs, [x])
    divideList' as bs (a:b:xs) = divideList' (a:as) (b:bs) xs

experiment :: Show a => [a] -> [a] -> IO ()
experiment [] _ = putStrLn "Impossible situation"
experiment [a] _ = putStrLn $ "The oddly-weighted coin is " ++ show a
experiment [a,b] [] = putStrLn "Not enough information"
experiment [a,b] xs@(x:_) = do
  putStr $ "Is " ++ show [a] ++ " the same weight as " ++ show [x] ++ "? (y/n) "
  c <- getLine
  case c of
    "y" -> experiment [b] (a:xs)
    "n" -> experiment [a] (b:xs)
    _   -> experiment [a,b] xs
experiment l [] = do
  let (l', xs, r) = divideList l
  let (as, bs, r') = divideList l'
  putStr $ "Is " ++ show as ++ " the same weight as " ++ show bs ++ "? (y/n) "
  c <- getLine
  case c of
    "y" -> experiment (r++r'++xs) (as++bs)
    "n" -> experiment (as++bs) (r++r'++xs)
    _   -> experiment l []
experiment l xs = do
  let (as, bs, r) = divideList l
  let xs' = take (length as) xs
  putStr $ "Is " ++ show as ++ " the same weight as " ++ show xs' ++ "? (y/n) "
  c <- getLine
  case c of
    "y" -> experiment (bs++r) (as++xs)
    "n" -> experiment as (bs++r++xs)
    _   -> experiment l xs

main :: IO ()
main = experiment [1, 2, 3, 4, 5, 6] []
divideList is a helper function which divides a list into two equal parts, and a remainder which is either the empty list or a single-element list.

experiment is a recursive function which generates experiments. It takes two parameters: a list of coins which includes the oddly-weighted coin, and a list of coins which does not include the oddly-weighted coin.

If the list of possibly-oddly-weighted coins is empty, then we're in an impossible situation. If the list of possibly-oddly-weighted coins is a single-element list, then we know the answer.

If we have two coins of unknown weight, and no coins that we know are the correct weight, then we don't have enough information to solve the problem (all we can do is verify that the two coins are, in fact, different weights).

If we have two coins of unknown weight, and at least one coin that we know is the correct weight, then we pick one of the unknown coins and weigh it against the known one. If they are the same, then the other one of the unknowns is the oddly-weighted one.

If we have more than two coins of unknown weight, and no coins of known weight, then we divide the coins in half, then divide one of the halves again. This gives us three sets: A (1/4 of the original set), B (1/4 of the original set), and R (the remainder). If A has the same weight as B, then the oddly weighted one is in R. If A differs from B, then either A or B has the oddly-weighted one.

If we have more than two coins of unknown weight, and some coins of known weight, then we divide the unknown coins in half, pick one of the halves, and weight it against the same number of coins from the known-good set (by construction, our known-good set if we have one is always at least half the total coins rounded down, which is more than the at most one quarter of the total coins that we are weighing against it, so we will always have enough known-good coins to do this).

This algorithm works for 6 coins in 3 experiments, and it scales well, but it's not perfect; there are more division rules that are needed to get the optimal answer. For example, this algorithm fails on three coins by entering an endless loop trying to weight the empty list against itself. It also gives a suboptimal path for 7 coins, by sometimes needing 4 experiments when only 3 should suffice. Doing it properly gets a little messier, but the general idea of recursive divide-and-conquer by trying to refine the set of possibly-oddly-weighted coins by half each time is fairly simple.

ShoulderDaemon
Oct 9, 2003
support goon fund
Taco Defender

JawnV6 posted:

Hate conditional logic? Smuggle the decisionmaking through flow control instead!

Yeah, unless specifically asked for a recursive, general-for-all-N solution I don't think I would ever produce the recursive formulation of this problem. It's vastly simpler to just write out the conditionals. There's a bunch of nontrivial cases in the small sets, and it's much harder to verify that it really does work in the constrained number of experiments allowed.

ShoulderDaemon
Oct 9, 2003
support goon fund
Taco Defender

GrumpyDoctor posted:

But my Haskell isn't good enough - can this determine the direction of the mid-weighted coin in the given number of experiments? That's part of the original problem. (Obviously it's just one more experiment, so the big-O complexity doesn't change.)

My algorithm doesn't attempt to determine that information. It's sort of conflated; on some paths it will have performed the right experiment to learn that information, but in other paths it won't and would need an additional experiment.

Edit: To be more specific, you would probably elect to extract that information post-facto; once you know the identity of the mis-weighted coin, you look through past experiments to find any experiment which involved that coin. That experiment will tell you the direction of the weight difference. It's possible with 6 coins to get an answer without ever actually weighing the mis-weighted coin (e.g. sets [1,2] and [3,4] are equal, coins 1 and 5 are equal, mis-weighted coin must be 6 which has never been weighed) but I think without examining that too closely that you will always have a "left over" experiment available that you can use. I'm not really interested in rewriting my algorithm to verify that, though.

ShoulderDaemon fucked around with this message at 20:41 on Nov 19, 2015

ShoulderDaemon
Oct 9, 2003
support goon fund
Taco Defender

LP0 ON FIRE posted:

mysql/php/encryption advice needed!

I'm going to be brutally honest. You're obviously confused, and speaking as a professional who works with crypto, I strongly get the impression that you have no idea what you're doing.

If this is a personal project, that's fine, and I can give you some advice on how to experiment and start to get a handle on this stuff. This advice probably won't involve solving your current problem, because I think you need to experiment with something simpler to begin with, and frankly your problem is crazy and doesn't seem like it makes sense, on a fundamental level.

If this is a professional project, then I suspect you are way beyond your expertise and need to just hire a contractor who knows this stuff. You won't get it right by yourself. I'm not trying to be hurtful here, but there's a lot of fiddly little details involved in this sort of thing that without the right experience you're incredibly likely to get wrong and never notice until someone is suing your company or stealing all your customers' credit cards. If you're in the Pacific Northwest, I can give you some recommendations of contractors who will help you. If you're not, then someone else here can probably point you to a reliable contractor.

ShoulderDaemon
Oct 9, 2003
support goon fund
Taco Defender

LP0 ON FIRE posted:

I don't want to take "safe risks", but all that information is individually encrypted, and the IVs will be stored on a separate server.

The fact that you think storing IVs separately helps is part of the problem, here; storing IVs on a separate server adds approximately zero security, but you don't understand enough about the process to know that. Having the information "individually encrypted", depending on how you did it, might be less secure than encrypting it all in a single batch. There are a lot of little details that you have to pay very close attention to if you want to do this right, and I don't think that we can over an Internet forum talk you through it.

I'm all for experimentation in order to learn, but the setup you've described is already bordering on needless-complex-and-probably-broken, and if you want to experiment you should begin with something simple, and not involving any real data.

TooMuchAbstraction posted:

Hey, we're talking encryption and security! That's awesome. I want to implement a basic client/server API to allow our client programs to send notifications to the server. The only big trick here is that only authorized clients should be able to use this service, and the details of the information they're sending should be kept private (i.e. not sent in the clear).

As nielsm said, use TLS with client certificates.

ShoulderDaemon
Oct 9, 2003
support goon fund
Taco Defender

LP0 ON FIRE posted:

I'm really interested to know why storing IV's on a separate server, especially if it's a vault adds almost no security.

Here's a message that I'm going to encrypt:
code:
$ echo "This is a test message that I am going to encrypt, and then decrypt with the wrong IV." > message.clear
$ xxd message.clear
00000000: 5468 6973 2069 7320 6120 7465 7374 206d  This is a test m
00000010: 6573 7361 6765 2074 6861 7420 4920 616d  essage that I am
00000020: 2067 6f69 6e67 2074 6f20 656e 6372 7970   going to encryp
00000030: 742c 2061 6e64 2074 6865 6e20 6465 6372  t, and then decr
00000040: 7970 7420 7769 7468 2074 6865 2077 726f  ypt with the wro
00000050: 6e67 2049 562e 0a                        ng IV..
OK, let's encrypt it with some key and IV:
code:
$ openssl enc -e -aes-128-cbc -nosalt -iv 0123456789abcdef -k "some sort of passphrase" -in message.clear -out message.crypt
$ xxd message.crypt
00000000: 36cc 0431 7047 f913 f1ea 8d96 f84d 6eb0  6..1pG.......Mn.
00000010: ce0c 30f6 804e fe80 1129 f881 2c38 32d4  ..0..N...)..,82.
00000020: 2bca a0d2 fe86 3457 3154 ae8b 94e6 c58b  +.....4W1T......
00000030: a730 3b5f b317 3e9d 042e f1c0 63e6 c97c  .0;_..>.....c..|
00000040: 4c82 9199 cbd0 ea54 3085 64bd c5d4 e445  L......T0.d....E
00000050: ef1d 12d0 8efe e71a 7d28 936f 6046 0ac9  ........}(.o`F..
And now let's decrypt it, but whoops, I forgot the IV and had to guess:
code:
$ openssl enc -d -aes-128-cbc -nosalt -iv 0000000000000000 -k "some sort of passphrase" -in message.crypt -out message.decrypt
$ xxd message.decrypt
00000000: 554b 2c14 a9c2 becf 6120 7465 7374 206d  UK,.....a test m
00000010: 6573 7361 6765 2074 6861 7420 4920 616d  essage that I am
00000020: 2067 6f69 6e67 2074 6f20 656e 6372 7970   going to encryp
00000030: 742c 2061 6e64 2074 6865 6e20 6465 6372  t, and then decr
00000040: 7970 7420 7769 7468 2074 6865 2077 726f  ypt with the wro
00000050: 6e67 2049 562e 0a                        ng IV..
Well, at least the first 8 bytes of it were safe. Of course, now that I have the rest of the message, I've got a much better chance of guessing those.

IVs are simply not intended to be secret. They are effectively part of the encrypted message; analysis of crypto algorithms assumes that any attacker with access to a message also has access to the IV. Often, they are not generated in a hard to predict fashion; typically only uniqueness matters, so the software generating them might not care to keep IVs from different messages uncorrelated. Keeping them secret doesn't help, it just makes your life harder, and encourages you to falsely believe that you are more secure than you actually are.

LP0 ON FIRE posted:

I read from lots of sources and told by others you must do this. If you were to store the IVs on the same database, on the same server, it would seem to me that would serve no purpose as an attacker would have all the IVs and encrypted values available to them right there if they broke into a database, and all they would need to guess right is one key to decrypt everything assuming all the keys are the same and not stored on the database.

As I showed, you can trivially decrypt all but the first block of a message with just the key when you're using CBC and don't have the IV. If your messages are e.g. email addresses, then the first part of peoples' email address is likely to correlate with their real name or user name, so if I was only missing the first 8 bytes of each address, then for a large number of messages if I would expect to be able to completely recover their address. IVs being secret just doesn't help you here in any particularly meaningful sense.

LP0 ON FIRE posted:

Making your information more secure by assigning every encrypted value the same IV also makes zero sense to me. If someone had to just guess one IV vs one for every entry, it kind of makes the former seem more dangerous. I don't know, maybe I'm wrong, and I'm really interested to hear why! Thanks!

Oh, you absolutely can't ever let yourself re-use the same IV. That would lead to all sorts of other attacks. You should just store the IV as part of the encrypted messages, and use a different IV for every message.

Look, this poo poo is hard to do right. If it's interesting to you, you should learn it; the world needs more cryptographers. But you shouldn't make your "I'm going to wade into encryption technologies" project involve real data. Hire a contractor who knows their poo poo, and ask if you can watch over his shoulder and ask dumb questions. Get familiar with the tools and algorithms one at a time, so you know what each step of the process is for and how to do it in isolation. Start developing an intuition for how protocols fit together and where weak spots are likely to appear. Look at some real protocols like the various X.509 family of messages, or OpenPGP, or TLS. Note how the same sorts of designs keep reappearing over and over, because they are well-understood and well-studied. Most importantly, build relationships in the community, so that on the rare case where you find yourself actually needing a new protocol, you can get it reviewed by people who aren't you, because nobody ever sees the problems in their own protocols.

Just, please, don't encrypt anything that you or your customers care about with a protocol that you designed by yourself. Especially not if it's your first time playing with cryptography.

ShoulderDaemon fucked around with this message at 21:58 on Nov 24, 2015

ShoulderDaemon
Oct 9, 2003
support goon fund
Taco Defender

Illusive gently caress Man posted:

While we're on the subject, I was playing with an idea / writing some code a while ago. If I'm encrypting (with say, aes128-gcm) a bunch of immutable butts, and randomly generating the key for each butt / never re-encrypting with that buttkey again, is it fine to use a zero IV? That was my assumption at the start, but then I started thinking about birthday poo poo / the chances of randomly choosing the same key twice with a large number of butts, and the consequences of that. Never got around to running the numbers.

Use a unique IV for every message, even if you aren't reusing keys.

There's a fair amount of "healthy paranoia" in crypto. We don't currently know of any serious issues with AES, but we have good reason to suspect that when some start to appear, they will (at least at first) be fairly narrow attacks; blocks with particular structures will be more vulnerable to cryptanalysis. Using unique, and ideally random, IVs gives us some hope that any such structures will be randomized within our message corpus, which makes them unlikely and hard to find, and thus probably increases our resistance to future attacks long enough for us to get wind of AES being likely-to-be-compromised-soon and allow a migration plan. Otherwise we might get unlucky and discover that all of our messages begin with easy-to-break blocks, which means that all of our keys may become suddenly vulnerable to attack.

GCM and related counter mode ciphers are particularly noteworthy in this regard because the actual encryption is being done on the counter, which is initialized by the IV, and if I had to pick any single block as being "most likely to cause problems" or "most likely to have some well-known acceleration structure for breaking" it'd be the all-zeroes block. Random IVs serve to make attackers' lives harder by giving them as little room for acceleration structures as possible.

ShoulderDaemon
Oct 9, 2003
support goon fund
Taco Defender

Suspicious Dish posted:

I have a dumb question about compilers. Are register files any faster / slower than L1? Wikipedia is saying that a Link Register (like in ARM / PPC) is faster than having it in main memory, but I imagine that Intel would be smart enough to store the return address or various areas around ESP in L1, because it makes all the sense in the world to.

Which makes me wonder if there's a different between L1 / a register in practice. #whoa

Registers are to a first approximation about 10 times faster than a L1 cache hit. That said, modern OOO processors are very good at hiding cache hits, so as long as the load can be scheduled reasonably ahead of any data dependency, it's probably going to be indistinguishable from if everything was in registers. Especially in the case of the instruction side; essentially every processor is going to have a call/return predictor at the front end that keeps 8 or so levels of return addresses in a local register file and predicts those about as well as if they were direct jumps.

ShoulderDaemon
Oct 9, 2003
support goon fund
Taco Defender

rjmccall posted:

Do you have details for this?

I know intimate details about one implementation wherein the decoder treats memory addresses of the form esp+k for small k as a register name, which is renamed as you'd expect. There's a spare store-combining buffer kept around that is used to commit writes to those registers to the L1 at retirement, and some logic to detect unexpected clobbers to that range (which typically happen by either writing to one of those addresses via another form of addressing, or because a different thread stole the cache line). It's... messy, and I can only share details because in practice it turned out to be a bad idea.

Suspicious Dish posted:

Ah, yeah, that makes more sense to me. Would it be valid to not actually ever write the address to physical memory? Because I doubt anything on the rest of your bus is actually going to look at the return address or CPU stack.

This depends on what you mean by "physical memory". You must have started a commit to the L1 by retirement, because otherwise you don't have any hope of maintaining coherence in the unlikely case that another thread does access those addresses. That said, highly-volatile thread-local stuff like stacks will tend to stay resident on a L1 or L2, and is reasonably likely to avoid being propagated out to DRAM.

Suspicious Dish posted:

Also, a lot of this stuff isn't very well documented (it's the part that makes the computer go fast, not the part that's behavior you should write to). Do you know of any documentation that talks about what modern predictors would do? Or does it mostly tend to be through-the-grape-vine, told-at-holiday-parties kind of material?

You may enjoy reading Agner Fog's, which he reverse engineered using direct timing measurements and his own understanding of processor design. Obviously, I couldn't possibly comment about how closely he managed to infer the design of any particular microarchitecture.

ShoulderDaemon
Oct 9, 2003
support goon fund
Taco Defender
For what it's worth, while there are a lot of ways that different processor models get customized, nobody is ever going to remove the old-school 80-bit FPU compatibility from modern large x86 processors. There are a few reasons: First, it's really small. You just don't save much space by removing it from the layout. Secondly, and related, it's not typically adjacent to any interesting scalable structures on the layout like caches, so even if you take it out it's really hard to use that space for something useful. You'd have to redesign the whole layout to make use of that space, and that would probably compromise performance of something else and require a whole new set of masks, so it'd be monumentally expensive. Third, it doesn't take any power unless you're using it. When you aren't generating FP operations, it's gated completely off. Fourth, you can't "just remove it" - you'd either have to add in microcode to emulate it, or you'd have to add a whole new set of fault flows to handle no-longer-supported instructions. Either way, that's a lot of validation effort that would be very easy to have problems in, all for a really marginal theoretical benefit for a very small number of customers.

The only place you might see this kind of decision is in the extreme-space-constrained embedded market, where you're contemplating things like "building an in-order core" and "not having any hardware floating point whatsoever". And while that market does exist, it's incredibly special-purposed and not really comparable to the kinds of model customization that the likes of Amazon and Google will demand on the modern large cores.

ShoulderDaemon
Oct 9, 2003
support goon fund
Taco Defender

Jabor posted:

Is this actually a ton of effort? My (perhaps rather naive) understanding was that the processor already has some logic that decodes illegal instructions as "break into supervisor mode and jump to the Invalid Opcode trap handler". So your FPU-less model would simply follow the same "I have no idea what this opcode is" process when it sees x87 instructions.

It's not hard, per se, but it is real effort and cost. You can't just casually remove microop types on a whim.

First of all, the decoder is a highly-optimized non-trivial state machine. You can change it - new instructions are added every generation - but when you build a new decoder you do need to actually test that it still decodes everything correctly, which means you're going to be spending some time walking through a validation suite with your design.

Secondly, the machine is microcoded, and what you are removing is not just the x87 instructions, but the x87 microops. This means that you need to audit all of your instruction flows and find all the places where ostensibly-non-x87 instructions are using the same hardware to handle edge cases or just as a support mechanism. Then you need to decide if you're going to remove those instructions too, or if you're going to rewrite those flows. If you go the rewriting route, that's a lot more testing, and probably some significant performance and power drifts.

And if you present this to any validation team, they're going to say the following to you: "Why do you want us to test a whole new decoder variant, just to remove a piece of hardware that's less than a third the size of the decoder?"

I am not kidding about that. The decoder is like 3% of the core. The 80-bit scalar FPU hardware is less than 1%. It's so small that it's just going to get squeezed in between gaps left by routing the superscalar execution hardware.

Now, all this starts to look different when you start looking at really small cores. For example, Intel Quark (aka Lakemont) is targeting a super-small embedded market, like functional devices the size of a SD card. Lakemont can be configured without x87, because in that case it might actually be a reasonable tradeoff - x87 is comparatively larger on an extremely small core, and the decoder and ISA are much simpler, so there's less validation effort associated with removing it. But that's really far away from what modern big cores look like - Lakemont is in-order, single-core, single-threaded, non-superscalar, 32-bit, and doesn't usually include things like "a level 2 cache". As soon as you start adding any of that stuff back in, the x87 compatibility starts to look like it might as well be free, so everybody includes it.

ShoulderDaemon
Oct 9, 2003
support goon fund
Taco Defender

Argyle Gargoyle posted:

I really don't know. I guess you're hinting that <iostream> should come from hangman.h but this must be flying over my head.
Doesn't that still give the main access to <iostream> twice when test.cpp and hangman.cpp are compiled together (as per 'g++ test.cpp hangman.cpp -o hangman')?
#include "hangman.h" at the top of hangman.cpp will just copy-paste the contents of hangman.h there.

hangman.h was provided to us for this assignment and we were only told to write the functions playHangman and guessLetter in the file hangman.cpp, so I believe that adding to the header file might not be the intended way to solve this, if the problem I stated above were otherwise resolved.

OK, so there's a few things I think will help you here:

First, test.cpp and hangman.cpp are not compiled together. They are unrelated, and do not affect each other. They are separate things. It just so happens that after being compiled, the results are getting put together to result in a complete program. But nothing you change in one will affect how the other is compiled.

Second, you can include <iostream> as many times as you want and it will act exactly the same as only including it once. System headers have guards that prevent them from misbehaving if included multiple times.

Third, there is no difference between including something in a .cpp file, and including something in a .h file which is included by that .cpp file. You should include headers in whatever files use things from those headers. You should put includes in header files if any use of that header would require also using the included header, because it makes it a little easier to avoid stupid mistakes.

Finally, this is something that's a lot easier to understand if you see what the preprocessor is actually doing. I'd strongly recommend making an appointment with your TA or professor and asking them to show you some examples of what the preprocessor does with a few simple files, independently of any of the C++ aspects of your assignment. You can play with the preprocessor on your own with "g++ -E", but I suspect you'll get a lot more out of a short guided example.

ShoulderDaemon
Oct 9, 2003
support goon fund
Taco Defender

Gravity Pike posted:

A self-signed cert will add encryption, but not authentication. It'll stop people on the LAN from sniffing passwords, which isn't nothing.

SSL/TLS actually supports a mode without any certificate at all specifically for this reason - encryption without authentication is still better than nothing at all. Sadly, literally ever web browser explicitly disables this mode. I often wish that they allowed it and just didn't show the lock icon at all or something.

ShoulderDaemon
Oct 9, 2003
support goon fund
Taco Defender

Nippashish posted:

What bizzaro rationale could they possibly have for disabling this?

In short, there's a downgrade attack possible on human operators.

If you support unauthenticated traffic on the same kind of connection as authenticated traffic, then when you try to connect to a server that advertises authentication, an adversary could MITM the connection, remove the authentication by reencryption, and present you with modified data. The browser would, presumably, correctly mark the data it was seeing as "insecure", but the (somewhat legitimate) argument is that browsers aren't really the things making policy decisions, humans are, and human operators seem to be really bad on the whole at making good decisions involving whether or not to trust websites.

Fundamentally, this is also why modern browsers have been getting more and more annoying about viewing websites with self-signed certificates - it's the same problem. If you make the warning about a self-signed certificate not very annoying, then a large population of users will just ignore the warning entirely, allowing MITM attacks which strip all meaningful security by simply replacing a legitimate certificate with a self-signed certificate owned by the adversary.

Note that this problem doesn't occur in other protocols which use SSL/TLS such as SMTP. In SMTP you can upgrade connections to encrypted/authenticated after the connection has been established by sending a "STARTTLS" command - an adversarial MITM could trivially intercept this command and simply respond "sorry not supported" to whichever peer sent it. But policy decisions involving SMTP such as "don't send email to a connection I haven't authenticated" are performed by software which is much more difficult to trick than a human, so it doesn't matter. If your SMTP connection has been MITMed such that it can no longer be authenticated, then your software will know that and will make the correct decision according to your email policy. And sure enough, lots of SMTP daemons and clients support SSL/TLS encryption without authentication, and treat it exactly the correct way (as if it were not meaningfully protected).

"Fixing" this for the web is probably impossible at this point; general consensus seems to be that you just can't trust random human operators to look at anything and make a meaningful security decision. The closest thing to a "right way" to fix this that I can think of would be incredibly draconian policies whose ship has long since sailed like "web browsers shouldn't ever submit forms to non-authenticated servers" which would obviously break a very large portion of the existing web. The web is based on a very generic protocol which has no way to identify "sensitive data" and can only make the most trivial kinds of policy decisions, and the interface that websites use to show data is extremely freeform and variable. When you're confronted with typical users being extremely aggressive about bypassing warnings without reading them as a matter of course, this makes a fairly untenable situation, and the browser makers have wound up in this losing war where they are trying to remove enough dangerous features to prevent the most obvious traps from routinely working.

I think if I was designing a web browser now I'd try something like accepting encryption without authentication and self-signed certificates without blinking, but treating them as if there was no security layer. I'd forbid entirely rendering pages with mixed authenticated/non-authenticated content. And as soon as a user started to type into a form on a page in non-authenticated mode, I'd beep and freeze input for 3 seconds and replace the entire page during that time with a giant emoji of a burglar or something. If the user keeps typing after that, then whatever, they've been warned that someone is gonna steal whatever they're trying to type.

And I can already think of at least one way around that. If an adversary MITMed a bank's website, and provided a login page to the user with an "onscreen keyboard" made using JavaScript and some text that said "Please use the provided keyboard for security" I bet 90+% of users would not only go right ahead and give that page their PIN, they'd feel good about doing so. Hell, one of the banks I used in the past had exactly that setup for some godforsaken reason. The only even marginally feasible mechanic I can think of to bypass that would be disabling JavaScript for non-authenticated content, and that ship won't sail.

ShoulderDaemon
Oct 9, 2003
support goon fund
Taco Defender

Suspicious Dish posted:

It's not. It's actually worse than plaintext, in that it might give users some form of false security.

Eh, I think this is just the downgrade issue I discussed above. I agree that you shouldn't advertise encryption without authentication to the user - it should appear exactly the same as a completely unsecured connection. Given that, I would always prefer to use encryption when possible; it prevents a large number of attacks which are not MITMs, which is to say, it helps. As attacks go, MITMs are a minority. It's problematic to say we should increase our vulnerability surface in order to make it more obvious to human operators, most of whom will not understand either way.

In the case of HTTP that's not good enough because of the concerns you'd get from introducing downgrades into HTTPS, but that's because the web ecosystem is sorta crazy, as I wrote above. It's the same reason HTTPS is a separate port and protocol than HTTP instead of using a STARTTLS-like construct like every other modern protocol does; what little security model we have for web traffic now irrevocably depends on being able to draw a hard line between the secure and insecure web, and anything that breaches that is going to cause users to do the wrong thing. We've engineered ourselves into this situation where untrained human operators are required to make continuous security judgements and allowing a https:// link to point to an unauthenticated resource would severely hamper what little ability they have to do so, and I don't think there's an easy way out.

ShoulderDaemon
Oct 9, 2003
support goon fund
Taco Defender

Suspicious Dish posted:

What attacks are you potentially talking about? Passive snooping by network operators that can't afford trivial MITMs?

For large networks, the idea of running MITM on every SSL connection becomes prohibitively expensive very quickly. MITM is cheap for directed attacks - those targeting either a small number of users or a small number of sites. Encryption without authentication does a really good job of protecting against, for example, wide-scale keyword searches on all network traffic by an ISP, or long-term collection of all user behavior, or capturing of sensitive data which is accidentally transmitted down the wrong channel (e.g. "whoops I had the wrong window focused when I pasted my account info, better delete that facebook post") or other more opportunistic attacks. Many of these attacks are really inexpensive compared to wide-scale MITM, and can be much harder to detect.

Suspicious Dish posted:

I would say that unauthenticated HTTPS is also dangerous because it allows people like fankey be able to check off the "uses HTTPS" checkbox and think that they're secure.

This is absolutely a legitimate concern, especially in the context of the web. Web security is garbage because it requires everyone involved to understand what's going on, and encryption without authentication would be an easy trap unless it came with bigger warning bells than we can easily deploy. FWIW, I think this is equally true of self-signed certificates, and I think the two should be treated identically - not advertised as secure in any way, not allowed to be mixed with other content, some way of forcing users to acknowledge that anything they transmit is likely to be compromised.

ShoulderDaemon
Oct 9, 2003
support goon fund
Taco Defender

Rocko Bonaparte posted:

Are there still any major websites using CGI for the pages? I am trying to contrast Django with the entirity of CGI, and it looks partisan to have ZERO examples to counterbalance even just Django--let alone MVC frameworks.

Edit: OMG I think Paypal still uses CGI. :stare:

CGI is just a protocol for a webserver to communicate with a webapp. You can't tell from looking at URLs or headers if CGI is being used. You can't compare it to something like Django, except insofar as Django might be talking to the webserver using CGI under the hood. Nowadays CGI is fairly rare because it's kind of slow, but it's not unheard of.

Back in the day, a lot of webservers used the convention that URLs ending in .cgi corresponded to a single-file webapp of the same name that used the CGI protocol. This convention is completely arbitrary and you should neither assume that URLs looking like that actually use CGI, nor that all CGI webapps must look like that.

ShoulderDaemon
Oct 9, 2003
support goon fund
Taco Defender

Rocko Bonaparte posted:

I know comparing MVC and CGI is apples-to-orange, so I will further clarify: as opposed to using a framework with the MVC pattern, how many big places are still vomiting all their code between piles of HTML tags?

I would be surprised if websites which violated the MVC pattern at least a little weren't an outright majority, especially among large businesses. Honestly, even "uses a well-defined framework" feels like a pretty high bar.

ShoulderDaemon
Oct 9, 2003
support goon fund
Taco Defender

TooMuchAbstraction posted:

Yeah, Fergus Mac Roich said what I would've said -- compare the memory used by the variables. The code you described has the variables containing copies of the same data, but those copies will be in different locations in memory. If they were in the same location in memory, then the two variables would be aliases of each other.

I...can't think of any languages that let you make a variable that's a straight alias of another variable. There has to be some kind of explicit memory shenanigans going on. That is, you can say "I have variable X stored at location Y, and I can write to Y to change the value of X", but you can't say "I have variable X stored at Y, and I have variable Z also stored at Y". You can easily have X and Z whose values are both Y, but those values aren't both stored at Y.

I think I'm doing a bad job of explaining this. :negative:

C++ references?

code:

int x = 3;
int &y = x;
++y; // Changes x

ShoulderDaemon
Oct 9, 2003
support goon fund
Taco Defender

peepsalot posted:

When using a purely function language where "variables" are not re-assignable, is there any sort of approach to creating a "running sum" type of function, that takes a list as input, and returns another list, where each element's value depends on the previous one, and do this in a non retarded way ( that means it should take no more than O(n) time )

so for example: running_sum([1,2,3,4,5,6,7,8,10])
would return: [1, 3, 6, 10, 15, 21, 28, 36, 45, 55]

The language in question is OpenSCAD, but I don't expect a large number of people to be familiar with it, so I'm wondering about a more general solution.

The normal thing to do is recursion and pass your state as an argument. Haskell:

code:
running_sum :: [Int] -> [Int]
running_sum xs = running_sum_helper 0 xs

running_sum_helper :: Int -> [Int] -> [Int]
running_sum_helper sum (x:xs) = (sum + x) : running_sum_helper (sum + x) xs
running_sum_helper _ [] = []
code:
*Main> running_sum [1,2,3,4,5,6,7,8,9,10]
[1,3,6,10,15,21,28,36,45,55]

ShoulderDaemon
Oct 9, 2003
support goon fund
Taco Defender

baka kaba posted:

People are saying it's normal to do it this way, but I'm not sure if that's basically going "eh it's a network thing it shouldn't trigger a result for a while, this is an easy race", or if there's some formal universal guarantee in the language that it won't be started until the function ends, so it's really an idiom that keeps configuring things simple and safely takes care of initiating the thing afterwards. It feels like it's the former, but maybe there's some clever stuff happening instead?

JavaScript is single threaded and has no preemption whatsoever. Your thread will never be interrupted by an event. The event queue will not under any circumstances start a handler while some other code is still running.

ShoulderDaemon
Oct 9, 2003
support goon fund
Taco Defender

Thermopyle posted:

Why was CGI set up to pass a bunch of stuff via environment variables but take POST/PUT via stdin? Seems like it wouldve been easier to just send it all one way or the other.

No particular reason I need to know this, just popped into my head this morning...

Environment variables need to be defined before the CGI starts, but stdin can be directly connected to the socket before the client has finished sending potentially large file upload data. Also avoids needing to buffer that large data in ram, if the CGI only needs to stream it.

ShoulderDaemon
Oct 9, 2003
support goon fund
Taco Defender

peepsalot posted:

Not exactly, because those have an unnecessary constraint about ordering (being for binary search trees), which I don't care about. And they don't result in a perfect balance.

I think I figured out a way to do it though, just keeping count of children for each node, and traversing to branches with the lesser value when inserting a single node. To merge two trees, I'll just deconstruct the smaller of the two and insert its nodes individually.
It might not be optimally efficient in memory or time complexity, but should result in perfect balance at least.

Do you really not care about ordering at all? Then just use a vector. The root node is at index zero, the subtree on its left is the first half of the remaining entries, and the subtree on its right is the second half of the remaining entries. It is, by definition, as close to perfectly balanced as it is possible to be. To combine two trees, just concatenate them -- note that this will dramatically change the logical positions of nodes within the trees being combined.

That said, you have no good way to do an actual lookup operation on this thing, because you aren't maintaining any sort of ordering on the structure. But you get insertion in constant time, and concatenation in linear time.

ShoulderDaemon
Oct 9, 2003
support goon fund
Taco Defender

Hughmoris posted:

Regex wizards, I call upon thee.

I have a text file full of questions/choices. My ultimate goal is to split it up and spit it in to a CSV using Python 3.8. I'm having trouble creating a capture group that will capture a multiline chunk.

https://regex101.com/r/zoprFw/1

In that example, I'd ideally have two capture groups. One for each question with accompanying choices. Any guidance is appreciated!

code:
"(^  \d.*\n((?!  \d).*\n?)+)+"gm

Assuming the whitespace before the questions is consistent. Otherwise it's harder.

Adbot
ADBOT LOVES YOU

ShoulderDaemon
Oct 9, 2003
support goon fund
Taco Defender

VagueRant posted:

Trying to write a Cron schedule expression that triggers on the last Sunday of March (at 00:50).

If I do:
50 00 25-31 3 SUN
it apparently treats the date range as an OR and therefore covers EVERY Sunday in March.

The pre-existing system this plugs into will only take a raw cron expression, I can't do anything fancy and dynamic that you could do in a terminal. Am I right in saying this is impossible?

This is the stupidest aspect of cron.

Per the manpage:
code:
Note: The day of a command's execution can be specified by two fields
— day of month, and day of week. If both fields are restricted (i.e.,
aren't *), the command will be run when either field matches the current
time. For example, “30 4 1,15 * 5” would cause a command to be run at
4:30 am on the 1st and 15th of each month, plus every Friday. One can,
however, achieve the desired result by adding a test to the command
(see the last example in EXAMPLE CRON FILE below).

EXAMPLE CRON FILE

#Execute early the next morning following the first
#Thursday of each month
57 2 * * 5 case $(date +d) in 0[2-8]) echo "After 1st Thursday"; esac
I have no idea why it works that way instead of being a logical and like every other field in a crontab. Everyone gets caught by this, it's never what people expect or want.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply