C/C++ Programming Questions Not Worth Their Own Thread

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > C/C++ Programming Questions Not Worth Their Own Thread

«‹›641 »

darkforce898: Sep 11, 2007

Private Speech posted:

Pretty sure he's asking about a DRM implementation, so simply blocking other processes from accessing a plaintext file may not work terribly well for a number of reasons. Doesn't seem anywhere as secure as what he's proposing in the first place, might be a reasonable extra to include though.

Anyway the described system sounds more than secure enough by embedded standards. There hasn't been an unbreakable DRM scheme where the hardware is in customer hands designed so far, not even by the likes of Sony, Nintendo and Microsoft throwing hundreds of millions and ASICs at it, so you probably won't design one either.

e: Or if it's an especially valuable high-volume item do what everyone else does and make a security dongle, or hire a team and spend a lot of money on it. But that's very much a cost-benefit problem at that point.

Yeah, I know it won't be unbreakable, just secure enough to stop people from changing values and from finding the secret keys with entry level tools.

Is anyone very familiar with the C API of OpenSSL? Is it possible to feed a password into the key/iv generation not as a plain char *? Can I stream it in from multiple sources? At some point the password will be entirely in memory, and i would like to not do that. It generate it from multiple sources (secret key, runtime generated hash, user api key) but if someone just attaches gdb to it they will be able to see it right at the moment it gets sent into OpenSSL.

darkforce898 fucked around with this message at 01:17 on May 17, 2020

# ? May 17, 2020 01:09

Adbot: ADBOT LOVES YOU

# ? Jun 9, 2024 02:43

Private Speech: Mar 30, 2011; I HAVE EVEN MORE WORTHLESS BEANIE BABIES IN MY COLLECTION THAN I HAVE WORTHLESS POSTS IN THE BEANIE BABY THREAD YET I STILL HAVE THE TEMERITY TO CRITICIZE OTHERS' COLLECTIONS

IF YOU SEE ME TALKING ABOUT BEANIE BABIES, PLEASE TELL ME TO

EAT. SHIT.

I wrote a longish answer before changing my mind, but the gist of it was that a) I haven't use the OpenSSL API myself, sorry, and b), how do you prevent someone from just no-op cracking your thing if they are savvy enough to use gdb to find the license key.

Either way having played around with reverse-engineering in IDA myself the answer mostly seems to be obfuscation and "executable packing", i.e. encrypting the executable itself - that might be what you want to look up. I haven't implemented it either though, sorry.

Hopefully someone else can help better.

Private Speech fucked around with this message at 02:21 on May 17, 2020

# ? May 17, 2020 01:56

nielsm: Jun 1, 2009

Make a license key file that's an extract of 4 kb of the compiled code (in memory) at some known offset, scrambled with some kind of fixed key. Descramble the license file and compare to your code in memory at several locations. If it matches a specific location then you have some particular feature set enabled.
I'm not a cryptographer nor a cracker, it would probably still be trivial to break for someone intent on it.

For some actual high-value product, make a hardware dongle that implements some necessary algorithms for the program and acts as a coprocessor, the program can't be functional without the dongle because parts of the code is simply missing.

# ? May 17, 2020 09:19

Xarn: Jun 26, 2015

Okay, I promised an effort post on C++ random number generation.

Summary: It is shite, using the standard library will bite you in the rear end at some point, and using external libraries is annoying because lol C++ tooling. Use PCG if you can.

:siren:

Warning: massive wall of text incoming. :siren:

------------------------------------------------------------------------------------

Well then, let's take a closer look. First, there is the old-school C way:

C++ code:

// Seeding:
std::srand(std::time(nullptr));

// Generate 1'000 random numbers in range 0-100'000
for (size_t _ = 0; _ < 1'000' ++_) {
    std::cout << rand() % 100'000 << '\n';
}

Seems easy enough, but runs into some problems.

1. The resulting numbers will be biased towards low numbers because of the modulus.
2. When I run this on my machine, I get 0 generated numbers larger than 50k. In fact, no matter how many times I generate a number, it never gets above 40k. Or 35k...

The reason for 2. is that rand on my machine only outputs numbers up to 32767. It is possible to check this with the RAND_MAX macro, and work around it by concatenating the results of multiple calls to rand, but that is getting pretty complex.

You can also fix 1. by using smarter way to go from rand's output to your desired range (see rejection sampling), but this is definitely getting into lot of non-trivial code territory. I just want to generate random numbers dammit!

Oh and rand does not have to be thread-safe, it is impl-defined which other functions call it, and the outputs differ between platforms...

All in all, rand is terrible and so C++11 standardized <random>. At first glance, it seems awesome. There is a range of different random number engines you can use for generating randomness, range of different distributions you can slap on top to get the distribution you want (you want Student's t-distribution? sure, it is here), and there is even a utility class that generates actually random numbers for you - std::random_device - which you can use to seed the random number engines.

Sounds great right? It would be, if every single part of it was not fundamentally broken. Oh and the usability is not great, but not that terrible either :shrug:

C++ code:

// Seeding. This is wrong, but simple.
std::mt19937 rng(std::random_device{}());

// Avoid constructing distribution all the time
std::uniform_int_distribution<> dist(0, 100'000);

// Generate 1'000 random numbers in range 0-100'000
for (size_t _ = 0; _ < 1'000; ++_) {
    std::cout << dist(rng) << '\n';
}

(I will get to why this example is wrong later.)

------------------------------------------------------------------------------------

The first problem with the snippet above is that the implementation of std::uniform_int_distribution is left to the implementation. This means that it is not only portable across platforms, it is not even guaranteed to be portable across different versions of the same stdlib. In other words, I hope you weren't planning to use the output for something like procedural generation, because it doesn't work for that...

But wait, the design of <random> is decomposed and pluggable, so you can just ("just") write your own distribution and keep using the rest, right? Well, no.

------------------------------------------------------------------------------------

The second problem with the snippet above is that std::random_device is badly specified and inscrutable. In theory, it should serve as an abstraction over some external enthropy source. In practice, an implementation is allowed to use any deterministic random number engine, e.g. mersenne twister, and you can absolutely run into that. Also in theory, std::random_device::entropy member function exists to let you detect this case.

Given the thesis of this post, are you surprised if I tell you that it is also broken? This time, the blame is shared between the standard and the implementations. The full signature is double entropy() const noexcept. The problem is in the return type. The standard provides a mathematical definition of entropy (thanks guys, I am sure that the stdlib maintainers couldn't look that up), but no guidance how to count entropy of external randomness sources, or expected return values.

This in turn means that each implementation does its own thing. The only sane one is MSVC. Their random_device is extremely thin wrapper over kernel function that returns cryptographically secure bytes, so their random_device::entropy always returns 32 (max entropy for unsigned int), and is inlined in the header for constant propagation*.

In order of sanity, the next implementation is libc++, which just returns 0. This is, of course, completely useless and invalid given that it has 4 different strongly-random backends and picks one of them based on compile time configuration, but at least it is obviously useless. There is some value in that.

The least sane implementation is libstdc++. The reason why it is least sane is that it can return either 0 (even for configurations where the backing external source of randomness is cryptographically secure), or, if it is configured to use /dev/urandom, it will attempt to query the kernel to get an estimate of how much entropy there is. This is not only useless, but dangerous, because it is not obviously useless. The underlying problem is TOCTOU - if you first check whether there is enough randomness**, and only then ask for the randomness, then by the time you ask, the randomness could've been depleted.

But wait, maybe we can hardcode which stdlib versions + configurations have sane random_device, reimplement our own distribution and still use the random number ~~generators~~ engines?

* But you still cannot use it for compile-time checking.
** I do not subscribe to depleteable randomness after initialization.

------------------------------------------------------------------------------------

The random number engines almost work. But something almost working means it is broken :v:

.

Let's go back to this line of the original C++ example:

C++ code:

std::mt19937 rng(std::random_device{}());

It seeds a specific version of Mersenne Twister with unsigned int worth of random data. Let's assume 4 bytes of random data, to make this easier. The internal state of mt19937 2496 (624 * 4) bytes. What this means is that for every state we can seed the rng into, there are 2**4992 (624 * 8) states we cannot seed the rng into.

This has some fun implications, like the fact that this code

C++ code:

int main() {
    std::mt19937 urbg(std::random_device{}());
    std::cout << urbg() << '\n';
}

will never print 7*. And if it prints 3046098682, then you can figure out the original seed in about 10 minutes.

Once again, in theory the standard provides tools to work around this. The tool is called SeedSequence, and the stdlib provides an implementation, std::seed_seq. Once again, once you put this into practice, it breaks down.

seed_seq is basically a wrapper over vector that you can give a bunch of randomness to, and then a random number engine
can extract stretched randomness from. You can use it like this:

C++ code:

auto rd_dev = std::random_device{};
std::seed_seq seq{rd_dev(), rd_dev(), rd_dev(), rd_dev()};
std::mt19937 urbg(seq);

This time we initialized our rng with 16 (4 * 4) bytes of randomness. Progress! There are just 2 problems with this.

1. There is no way to know how much randomness you need to feed into a SeedSequence to fully initialize a random number engine T.
2. std::seed_seq is very precisely specified by the standard. The implementation forced by standard is not a bijection.

Fun thing about 1. is that std::mersenne_twister happens to provide a member variable that you can query** for how much random input it requires for full initialization, but not other random number engine does. The MT thing is just an accident of standardization :v:

.

But 2. means that even if you somehow fix this, you still cannot use std::seed_seq for initialization, because it generates same results for different inputs. This is one of the examples:

C++ code:

#include <array>
#include <iostream>
#include <random>

int main() {
    std::seed_seq seq1({0xf5e5b5c0, 0xdcb8e4b1}),
                  seq2({0xd34295df, 0xba15c4d0});

    std::array<uint32_t, 2> arr1, arr2;
    seq1.generate(arr1.begin(), arr1.end());
    seq2.generate(arr2.begin(), arr2.end());

    std::cout << (arr1 == arr2) << '\n';
}

(Godbolt: https://godbolt.org/z/-SCJCI)

So yeah, you have to write your own SeedSequence to seed random number engines properly, and you have to somehow keep track of much seeding needs to happen.

* In fact, about 30% of all possible 4-byte unsigned int values cannot be printed.
** after you do some math on it.

------------------------------------------------------------------------------------

So, to recap:

1. If you need cross-platform reproducibility (not that rare), you cannot use distributions from std::
2. If you need actual randomness, you have to implement it yourself, or hardcode list of platforms + configurations where you can use std::random_device
3. To properly seed the standard-provided random number engines, you have to write your own SeedSequence and then hardcode the size of the RNEs you actually use.

After doing the above, it is simpler to just write your own everything, or use 3rd party.

# ? May 17, 2020 10:04

Xarn: Jun 26, 2015

That looked way smaller in preview.

# ? May 17, 2020 10:04

Jeffrey of YOSPOS: Dec 22, 2005; GET LOSE, YOU CAN'T COMPARE WITH MY POWERS

Xarn posted:

The implementation forced by standard is not a bijection.

Okay this is where my eyes bugged out. I'm so sorry you had to learn this the hard way. Thank you for writing this.

# ? May 17, 2020 10:51

Xarn: Jun 26, 2015

I actually started getting mad while writing all this poo poo out.

# ? May 17, 2020 13:25

Eezee: Apr 3, 2011; My double chin turned out to be a huge cyst

Jeffrey of YOSPOS posted:

Okay this is where my eyes bugged out. I'm so sorry you had to learn this the hard way. Thank you for writing this.

Does that matter though? As long as it's an injection, wouldn't it be fine for the purpose it's supposed to fulfill?

Edit: Eh, I guess you can input an arbitrary number of arguments into the seed_seq initialisation, which would mean it would have to be a bijection to be injective. Is it injective for a reasonable amount of arguments though?

Eezee fucked around with this message at 16:41 on May 17, 2020

# ? May 17, 2020 16:33

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

Eezee posted:

Does that matter though? As long as it's an injection, wouldn't it be fine for the purpose it's supposed to fulfill?

If you have a prng with 64 bits of state, and you attempt to provide it 64 bits of initial entropy with std::seed_seq, you end up with less than 64 bits of entropy in your initial state.

That doesn't seem particularly "fine for purpose" to me.

# ? May 17, 2020 16:43

Xarn: Jun 26, 2015

Eezee posted:

Does that matter though? As long as it's an injection, wouldn't it be fine for the purpose it's supposed to fulfill?

Edit: Eh, I guess you can input an arbitrary number of arguments into the seed_seq initialisation, which would mean it would have to be a bijection to be injective. Is it injective for a reasonable amount of arguments though?

If I remember my English function classifications properly, injection would suffice as long as you assume that entropy-in-seed-seq <= entropy-needed-by-rng.

It is not injective either, see the example.

# ? May 17, 2020 16:54

Qwertycoatl: Dec 31, 2008

darkforce898 posted:

Are there any resources that can help me implement encryption and decryption of files and communication from a client to a server?

I'm writing a program in C that will run on embedded arm devices and has a configuration file that enables and disables certain features. How do I secure the configuration so that a user cannot just change it by hand?

Right now it has a secret key in the binary that is obsfucated that I am going to combine with an API key. I know that putting the secret key in the binary isn't going to stop someone from running Ida pro, but it will stop someone from running strings on it. And then running aes256cbc on the file to read and write.

Is there a better way to do this? I don't want to reinvent the wheel. Also, is there a better resource for OpenSSL api docs than just stack overflow and the wiki?

I think the ARM Trustzone stuff is what you want to be using if you want it to be Actually Secure, but you may not have access to it on your system, and it's probably very complicated (I've never used it) and also probably needs an expert to get it right.

# ? May 17, 2020 17:29

Xerophyte: Mar 17, 2008; This space intentionally left blank

Specifically, a bijection is a function that is both injective (each input produces a unique output) and surjective (it writes to every possible value in its output range). For a function from a finite set to another finite set of the same rank, such as a function from 32 bit integers to 32 bit integers, these are all the same thing and so seed_seq is none of them.

Far as I can tell the objective of seed_seq was never to preserve entropy, it's to produce a roughly unbiased seed state of specified length from biased data of some shorter length. The algorithm specified is the "standard" Marsenne Twister state initializer, taken from the original paper (well, from this improvement from 2007 by one of the author's PhDs) and used in most implementations. How useful that is may be determined by the reader.

[E:] I should probably add that the reason seed_seqis specified in algorithmic detail is that if it wasn't then the Marsenne Twister engine would be implementation defined. It would be nice if it used a seeding algorithm that preserves the entropy of the input state, but to my knowledge it's quite hard to create such an algorithm without also preserving any bias in the input state. Not really my field though, maybe there are much better solutions out there now than there were in 2007ish.

Likely this is my bias showing but I'm not very bothered by the loss of entropy. None of the algorithms in std::random are intended to be used for cryptography that I'm aware: they're all various low-state generators that can be easily reversed, made to be fast while having a decent distribution qualities. It makes sense to me that something like seed_seq, which is effectively an implementation detail for the Marsenne Twister engine, is likewise not intended for cryptographic purposes.

I'm a lot more upset by the fact that the distributions aren't portable, since that was a property I thought was guaranteed. I guess it's not the end of the world to write my own bits-to-distribution transform, gods know I've done it enough times before, but bleh at having to when C++ finally bothered to make a set of RNGs with well-defined platform invariant behavior.

Xerophyte fucked around with this message at 17:43 on May 17, 2020

# ? May 17, 2020 17:35

Dominoes: Sep 20, 2007

Bros, just think about it. Most of the software you actually use is written in C/++. Not necessarily the weird stuff corporations use internally, or that hides behind web servers. But the stuff you use. Windows, Linux, your microwave, your car, chrome, firefox, every game you play, MS office, your profession-of-choice software that costs more than your car, the open-source alternative that's hit or miss, libre office etc etc etc. Communities in other languages proudly cherry pick a few popular examples written in their lang, and highlight them on marketing pages... the rest is here.

Dominoes fucked around with this message at 17:38 on May 17, 2020

# ? May 17, 2020 17:35

Xarn: Jun 26, 2015

Xerophyte posted:

It makes sense to me that something like seed_seq, which is effectively an implementation detail for the Marsenne Twister engine,

Don't expose it as the way to provide non-trivial seed sizes then? :v:

Xerophyte posted:

is likewise not intended for cryptographic purposes.

It is also unfit for statistical purposes.

----------------

Really, the question is, if you operate with URBG concept everywhere, why are you trying to create something that takes biased data and magics out more-but-unbiased data?

# ? May 17, 2020 17:48

Xerophyte: Mar 17, 2008; This space intentionally left blank

Xarn posted:

Don't expose it as the way to provide non-trivial seed sizes then?

It is also unfit for statistical purposes.

Far as I am aware it's exposed because it's pretty common for users to want to write something like

code:

  std::vector<cool_512bit_prng> rngs;
  for (int i = 0; i < 10; ++i) {
    cool_512bit_prng rng;
    rng.seed(i);
    rngs.push_back(rng);
  }

and expect the 10 pRNGs to be independent and produce data that is "as good" as if they'd fully specified all 512 bits of state to some truly random bit values. To do that you need a magic unbiasing function when seeding, at least for a Marsenne, and std::random decided to standardize on the one used in most Marsenne Twister implementations. They exposed it so anyone else can use the same function, since my understanding is that it's pretty commonly used for other RNGs with the same problem. I don't really know enough about the subject to say if there was a better option, but it doesn't immediately strike me as a terrible choice. What's a better algorithm to use?

I'm not entirely sure I follow you on the second, when would entropy preservation matter for statistical purposes? Modern numerical computing and Monte Carlo simulations usually use no-entropy fixed sequences for speed and reproducability, I don't think a pRNG dropping some entropy would faze anyone as long as the output was well-distributed.

# ? May 17, 2020 18:14

taqueso: Mar 8, 2004

Xerophyte posted:

Modern numerical computing and Monte Carlo simulations usually use no-entropy fixed sequences for speed and reproducability

That sounds interesting if you have a link to any further reading.

# ? May 17, 2020 18:30

Falcorum: Oct 21, 2010

Xarn posted:

After doing the above, it is simpler to just write your own everything, or use 3rd party.

The good news is this won't happen with std::audio, std::network, or std::graphics.

# ? May 17, 2020 19:09

Xerophyte: Mar 17, 2008; This space intentionally left blank

taqueso posted:

That sounds interesting if you have a link to any further reading.

Not sure I have a good single source and I'm only truly familiar with the graphics applications. There's a nice article from SIGGRAPH 2013 called Quasi-Monte Carlo Image Synthesis in a Nutshell which covers most of that side of things, but unfortunately I can't find a preprint. Wikipedia has an article on Low-discrepancy sequences which is more general. It gets a little too technical but at least covers the terms. Someone should someday let the math community know that the universal intro-to-everything encyclopedia isn't really the best place to worry about precise technical correctness, but I digress.

Short version is: using "true" random input for things like function sampling to compute means, do numerical integration and so on is generally not desirable. First, your samples will not typically be equally distributed and you will have clusters which reduces convergence. Second, you will not get the same sequence twice in a row.

The sample distribution impacts the speed of integration: you will randomly oversample some areas of the domain and undersample others. By using a fixed sequence you can construct it in such a way that by taking the first N points you always get a set that's evenly distributed. Constructing such a sequence with good statistical properties is computationally quite hard, especially for higher dimensions, so it's pretty common to pre-compute a few sequences and then rotate, mirror and otherwise scramble them if you need variants.

For Monte Carlo integration and mean computations specifically, a low-discrepancy sequence can converge to the mean with an error proportional to log(N)�/N -- theoretically with a better power on the log, practically not so much for the sequences I remember seeing papers for -- as opposed to 1/sqrt(N) for random samples.

The lack of reproducability is more of a practical issue. For testing purposes it's practical to have your program produce a consistent output so you know when you've accidentally changed it. Likewise, it's often a good thing if different users get the same bias in their results to make comparisons easier. You can of course get a consistent output with a pRNG too, but then by definition you don't have any entropy there either.

I'm sure that working with quasi Monte Carlo integration is biasing me to a "but who cares?" response to the various STL entropy preservation issues; I use RNGs in a field where their entropy is immaterial and I'm doing the dumb human thing where I assume my little world is universal. I know it's important in crypto as part of making it hard to compute an RNG's state from its output, but since I don't care about crypto I stopped worrying there. I'm genuinely curious as to where else it's important.

# ? May 17, 2020 19:45

taqueso: Mar 8, 2004

Thanks, I'll see if I can't dig up that paper.

# ? May 17, 2020 19:52

Xerophyte: Mar 17, 2008; This space intentionally left blank

Actually, since PBRT version 3 is freely available -- better online than in book form now really -- you should just read their chapter on sampling if curious about the graphics side of it. See: http://www.pbr-book.org/3ed-2018/Sampling_and_Reconstruction/Stratified_Sampling.html

# ? May 17, 2020 20:18

Gniwu: Dec 18, 2002

Xarn posted:

Okay, I promised an effort post on C++ random number generation.

Summary: It is shite, using the standard library will bite you in the rear end at some point, and using external libraries is annoying because lol C++ tooling. Use PCG if you can.

As the person whose newbie-problem provoked your effortpost in the first place, you have my gratitude! Unfortunately, most of the technical aspects of what you wrote are beyond me (which I was warned about when I asked for a better rng solution than 'srand' several pages back!), but the message I am taking away from your words is the same one that many posters in this thread seem to be repeating over and over: C++ is extremely clunky and frustrating. I'm curious why that seems to be the prevailing attitude among highly experienced C++ users, yet it remains the dominant programming language to this day. Is there really nothing better around for general purpose applications? Or is that all because of inertia, since the language has been (sort of) with us since the 1970s?

# ? May 17, 2020 22:02

UraniumAnchor: May 21, 2006; Not a walrus.

Gimmick Account posted:

As the person whose newbie-problem provoked your effortpost in the first place, you have my gratitude! Unfortunately, most of the technical aspects of what you wrote are beyond me (which I was warned about when I asked for a better rng solution than 'srand' several pages back!), but the message I am taking away from your words is the same one that many posters in this thread seem to be repeating over and over: C++ is extremely clunky and frustrating. I'm curious why that seems to be the prevailing attitude among highly experienced C++ users, yet it remains the dominant programming language to this day. Is there really nothing better around for general purpose applications? Or is that all because of inertia, since the language has been (sort of) with us since the 1970s?

To paraphrase another quote: "C++ is the worst form of programming, except for all the others."

It doesn't really super excel at any one class of problem but it does lots of things very well, and it's widely supported. If what you need isn't in the standard library, then there's almost always a well supported open-source solution.

Certainly a lot of the worst bits are at least indirectly caused by needing to support a little bit of everything and/or legacy code, as is the case with anything that has a significant ecosystem. It's much harder to remove or "fix" (i.e. break backwards compatibility) something than it is to add it.

# ? May 17, 2020 22:25

Foxfire_: Nov 8, 2010

Xarn posted:

This has some fun implications, like the fact that this code
C++ code:
int main() {
    std::mt19937 urbg(std::random_device{}());
    std::cout << urbg() << '\n';
}
will never print 7*. And if it prints 3046098682, then you can figure out the original seed in about 10 minutes.

Being able to recover the seed doesn't make sense as an objection. It's a Mersenne Twister. You can predict all past and future values from any handful of outputs, regardless of how it's initialized. None of the std::random generators are suitable for cryptography.

If the generator had a required state -> (output,next state) function (which would be nice!), it would be useful to be able to explicitly set the initial state explicitly so you could generate the same sequences on different platforms, but without that you don't lose anything much. If the generator is already an unknown function, having the initial state be generated from a standardized N bits -> 19937 bits function instead of an implementation-defined one wouldn't help you get predictable outputs. And if you're initializing from 2^32 initial seeds, it's not particularly surprising that you can't get all 2^19937 possible generators out

darkforce898 posted:

Are there any resources that can help me implement encryption and decryption of files and communication from a client to a server?

I'm writing a program in C that will run on embedded arm devices and has a configuration file that enables and disables certain features. How do I secure the configuration so that a user cannot just change it by hand?

Right now it has a secret key in the binary that is obsfucated that I am going to combine with an API key. I know that putting the secret key in the binary isn't going to stop someone from running Ida pro, but it will stop someone from running strings on it. And then running aes256cbc on the file to read and write.

Is there a better way to do this? I don't want to reinvent the wheel. Also, is there a better resource for OpenSSL api docs than just stack overflow and the wiki?

What exactly are you trying to protect and what does the attacker have access to? If they can freely read and modify your program, there's nothing actually secure you can do and you're restricted to making it as annoying as possible.

That config file thing doesn't sound that annoying to break, depending on how you implemented it. Set up a breakpoint at the point where the file is opened, then trace forward from that until the decryption is done and see what it looks like. Then just patch the binary to set up the decrypted data and jump to that point instead of ever touching the file at all.

Ways to make it more annoying:
- Keep as much of the program code encrypted/compressed at any given point as possible. Make it so the attacker can only see a little window of code at any given time and has to gather up those windows to see the whole thing
- Recheck things at multiple points in time and at multiple places in memory. Make it so they need to patch many things in a self-consistent way to make it pass
- Don't fail immediately when you detect the program is inconsistent with itself. If check #1 passed but check #2 fails, wait 10mins, then fail so the attacker has a slower iteration cycle.
- Mix in information that is unique to the system it's running on. It's easier to figure out some constant license key than something like SHA1(constant key|device SN)
- Don't use strings for your config file magic constants. Pick some other constants you're already using in the program and give those values extra meaning.

If you need actual security, you'll need hardware support though.

# ? May 17, 2020 23:03

Dominoes: Sep 20, 2007

Gimmick Account posted:

the message I am taking away from your words is the same one that many posters in this thread seem to be repeating over and over: C++ is extremely clunky and frustrating. I'm curious why that seems to be the prevailing attitude among highly experienced C++ users, yet it remains the dominant programming language to this day.

Unfortunately, most languages seem to have problems near the level of C/++'s. I'm curious specifically why there have been relatively few attempts at making performance-comparable languages.

- Python: Slow, tough to make executables, bad tooling
- Javascript etc: A clusterfuck, not as fast as C
- Ruby: Not that active, plus issues similar to Python's, not as fast as C
- Haskell: Inflexible functional styling isn't natural for many problems, not as fast as C
- Java: Verbose, forced OOP, not as fast as C
- Go: Missing multiple important features, not as fast as C
- Rust: Might be the unicorn we need(???), new and unpopular

etc

I'm curious how Jai will turn out. I hope it doesn't get too pigeon-holed into games or friction-reduction. Overall, it seems to welcome high-level concepts and quality of life improvements, while sacrificing neither performance nor sharp edges. Nim also seems promising.

Dominoes fucked around with this message at 00:10 on May 18, 2020

# ? May 18, 2020 00:04

Dominoes: Sep 20, 2007

-delete

# ? May 18, 2020 00:42

Xerophyte: Mar 17, 2008; This space intentionally left blank

The short answer as to why C++ is clunky and frustrating is that it's because computers are clunky and frustrating. Programming languages try to hide this fact, but if you want to have direct access to the low level functionality of the machine then it's hard to keep the frustrating clunkyness of the machine from bleeding though.

C++ goes even further in attempting to support both very low level functionality and very high level abstractions at the same time, while also being mostly agnostic to the platform its running on. Combining all that is very, very hard which is why sane languages don't try. C++ is (hopefully!) not the best attempt that could be, but it has (arguably!) been the best attempt we've had for the last 30 years or so.

Well, the attempt with the most accumulated inertia, at least.

# ? May 18, 2020 01:07

OneEightHundred: Feb 28, 2008; Soon, we will be unstoppable!

Dominoes posted:

I'm curious how Jai will turn out. I hope it doesn't get too pigeon-holed into games or friction-reduction. Overall, it seems to welcome high-level concepts and quality of life improvements, while sacrificing neither performance nor sharp edges. Nim also seems promising.

I looked at it in the past and I don't feel like it's really solving much aside from having a proper reflection system, and some of the things it loses are pretty bad. "Game programmers aren't afraid of manual memory management" so no RAII and no refcounting? OK. It has function polymorphism, but doesn't have generic/template types.

The one thing it has that I think should just be in C++ is opt-out default initialization. The value of uninitialized scalar values in C++ is undefined anyway, so if that were to change to default zero-fill, it shouldn't (in theory) be a compatibility break, but still provides a way to avoid expensive memory fills when that matters.

Xerophyte posted:

The short answer as to why C++ is clunky and frustrating is that it's because computers are clunky and frustrating. Programming languages try to hide this fact, but if you want to have direct access to the low level functionality of the machine then it's hard to keep the frustrating clunkyness of the machine from bleeding though.

A lot of why it's clunky is because of legacy jank in the language design, and how it expresses concepts rather than the concepts that it has to express. The current focus on build times is partly because C++ has multiple things that affect the processing of later code, and allows stupid things like type definitions being duplicated or even being different in different TUs, so concatenating different .cpp files in different orders is not guaranteed to produce the same program, or even compile at all, a problem that some other languages don't have.

I don't think it's actually that hard to design a better language. C# 2.0 with manual memory management would be better than C++ for most use cases.

# ? May 18, 2020 21:28

roomforthetuna: Mar 22, 2005; I don't need to know anything about virii! My CUSTOM PROGRAM keeps me protected! It's not like they'll try to come in through the Internet or something!

OneEightHundred posted:

I don't think it's actually that hard to design a better language. C# 2.0 with manual memory management would be better than C++ for most use cases.

Conceptually, but then you have to roll in all the issues, for example C#'s ordered map doesn't expose a way to 'find the nearest value' like std::map's lower_bound, so if you want that then you have to either use something horribly inefficient (convert to another structure) or make your own redblack tree (or use a third party one).

The standard libraries have their flaws, but they've had a long time to evolve and work out the kinks, and there's a lot of them - getting all that in a new language wouldn't be easy.

# ? May 19, 2020 01:36

Xarn: Jun 26, 2015

The standard library does not work out kinks, because that would be backwards compatibility break. C++ just happens to be lucky that its basic standard library was conceived by Stepanov.

-----------------------------

Anyway, I hard disagree with this

Xerophyte posted:

The short answer as to why C++ is clunky and frustrating is that it's because computers are clunky and frustrating.

an absurd amount of the clunkiness of C++ is either self-inflicted (std::addressof, casts to void in generic libraries, unicorn initialization, ADL, etc, etc), or caused by backwards compatibility with C, where they were self-inflicted (zero-terminated strings, overflow behaviour, uninitialized-by-default). There is definitely some intrinsic complexity to computers and hardware, but C++ used to punt on it pretty hard -- how long did it take until C++ acknowledged threads, or cache lines, or alignment, or ...

# ? May 19, 2020 08:44

nielsm: Jun 1, 2009

Speaking of standard library, would a change of STL to use modules have allowed making non-backwards compatible breaks, like removing the std::vector<bool> specialization? Or can they allow versioning of the library in a way that could allow that kind of changes?

# ? May 19, 2020 09:04

Xarn: Jun 26, 2015

Back to randomness now :v:

Xerophyte posted:

Far as I am aware it's exposed because it's pretty common for users to want to write something like
code:
  std::vector<cool_512bit_prng> rngs;
  for (int i = 0; i < 10; ++i) {
    cool_512bit_prng rng;
    rng.seed(i);
    rngs.push_back(rng);
  }
and expect the 10 pRNGs to be independent and produce data that is "as good" as if they'd fully specified all 512 bits of state to some truly random bit values. To do that you need a magic unbiasing function when seeding, at least for a Marsenne, and std::random decided to standardize on the one used in most Marsenne Twister implementations. They exposed it so anyone else can use the same function, since my understanding is that it's pretty commonly used for other RNGs with the same problem. I don't really know enough about the subject to say if there was a better option, but it doesn't immediately strike me as a terrible choice. What's a better algorithm to use?

I'm not entirely sure I follow you on the second, when would entropy preservation matter for statistical purposes? Modern numerical computing and Monte Carlo simulations usually use no-entropy fixed sequences for speed and reproducability, I don't think a pRNG dropping some entropy would faze anyone as long as the output was well-distributed.

So the question here is, do the users in your example expect to get the same numbers across runs, or do they just want bunch of PRNGs? Because for the second, in world where I got to make some changes to <random> APIs, the code to get you full seeding looks like this

C++ code:

std::vector<cool_512bit_prng> rngs;
for (int i = 0; i < 10; ++i) {
    rngs.emplace_back(std::random_device{});
}

As to the other thing, one of my problems with seeding MT with 32 bit seed is that you cannot get roughly 35% of possible outputs as the first generated number. If you interpose seed_seq between the 32bit seed and the MT, you get the same result... there is no 32 bit seed that, after being passed through seed_seq, gets you 5 as the first result. Or 7, 8, 9, 11, 12, 13, 14, 16, ...

Foxfire_ posted:

Being able to recover the seed doesn't make sense as an objection. It's a Mersenne Twister. You can predict all past and future values from any handful of outputs, regardless of how it's initialized. None of the std::random generators are suitable for cryptography.

Right, Mersenne Twister outputs its internal state directly, so if you can watch it for a bit, you know what state it is in. It should take more than 1 observation though.

Foxfire_ posted:

If the generator had a required state -> (output,next state) function (which would be nice!), it would be useful to be able to explicitly set the initial state explicitly so you could generate the same sequences on different platforms, but without that you don't lose anything much. If the generator is already an unknown function,

Generators are perfectly reproducible across platforms. Distributions aren't.

Falcorum posted:

The good news is this won't happen with std::audio, std::network, or std::graphics.

(well, to be fair I am reasonably sure graphics aren't happening ever)

# ? May 19, 2020 09:04

Xarn: Jun 26, 2015

nielsm posted:

Speaking of standard library, would a change of STL to use modules have allowed making non-backwards compatible breaks, like removing the std::vector<bool> specialization? Or can they allow versioning of the library in a way that could allow that kind of changes?

Of course not.

# ? May 19, 2020 09:04

Xerophyte: Mar 17, 2008; This space intentionally left blank

Xarn posted:

So the question here is, do the users in your example expect to get the same numbers across runs, or do they just want bunch of PRNGs? Because for the second, in world where I got to make some changes to <random> APIs, the code to get you full seeding looks like this
C++ code:
std::vector<cool_512bit_prng> rngs;
for (int i = 0; i < 10; ++i) {
    rngs.emplace_back(std::random_device{});
}
As to the other thing, one of my problems with seeding MT with 32 bit seed is that you cannot get roughly 35% of possible outputs as the first generated number. If you interpose seed_seq between the 32bit seed and the MT, you get the same result... there is no 32 bit seed that, after being passed through seed_seq, gets you 5 as the first result. Or 7, 8, 9, 11, 12, 13, 14, 16, ...

In the use cases I'm more familiar with, initializing an RNG from random_device would not be desirable. I still can't think of any use case outside of crypto where you'd want to introduce it, to be honest. The objective is to get N separate PRNGs that produce a consistent sequence of bits each yet are not correlated in any way in spite of being initialized from a set of seed numbers [0, N) much smaller than the set of RNG states.

In typical Monte Carlo simulation and statistical analysis use cases I expect you want your generated sequence to be consistently reproducible while maintaining good statistical qualities. One of the good statistical qualities is certainly outputting the entire range so not including all initial values is a flaw. The flaw appears because the seed_seq algorithm was designed to provide an MT that yields an equidistributed sequence in [0, 1)^d for higher dimensions d on average than the algorithm used in the original MT paper (which I was just an LFSR, I think), without a significant cost in speed when initializing the RNG.

The highest dimension of equidistribution is pretty often used as a shorthand to rank PRNGs but it's definitely not the end-all be-all one true measure. I'd definitely say it's more informative about the quality of the RNG than looking at the possible initial values, though.

I'm not really sure what the current gold standard is for determining RNG quality. On the crypto side of things you have a bunch of test suites like TestU01 and the NIST Test Suite, but they're not designed to measure quality outside of the crypto-specific space where you do care about things like preserving input entropy etc.

Xerophyte fucked around with this message at 10:01 on May 19, 2020

# ? May 19, 2020 09:56

Zopotantor: Feb 24, 2013; ...und ist er drin dann lassen wir ihn niemals wieder raus...

Xarn posted:

how long did it take until C++ acknowledged threads, or cache lines, or alignment, or ...

Arguably, that�s the fault of standardization and having many stakeholders from vastly different backgrounds. When the standard has to allow implementation on anything from an embedded ARM to an IBM mainframe, you�re going to get something that sort of works everywhere, but which probably doesn't have the particular features that you would like.
Also, you could program multithreaded C++ in 1996 if you had a BeBox, before the standard even existed.

# ? May 19, 2020 22:06

Subjunctive: Sep 12, 2006; ✨sparkle and shine✨

Zopotantor posted:

Also, you could program multithreaded C++ in 1996 if you had a BeBox, before the standard even existed.

Or if you were a Rogue Wave licensee! They were the STL before the STL in a bunch of ways.

# ? May 19, 2020 23:38

Dominoes: Sep 20, 2007

Does anyone know how to return multiple values in embedded C++? It seems like Tuples aren't working, or I'm messing something up:

C++ code:

tuple<double, double> f() {
    return make_tuple(1., 2.)
}

Bash code:

C:\.../Anyleaf.h:42:5: error: 'tuple' does not name a type

     tuple<double, double> calibrate(CalSlot slot, double pH);

     ^~~~~

exit status 1
Error compiling for

# ? May 20, 2020 02:31

csammis: Aug 26, 2003; Mental Institution

Is there any reason you can't just use reference parameters?

edit: did you mean std::tuple ?

# ? May 20, 2020 02:40

OneEightHundred: Feb 28, 2008; Soon, we will be unstoppable!

Zopotantor posted:

Arguably, that�s the fault of standardization and having many stakeholders from vastly different backgrounds. When the standard has to allow implementation on anything from an embedded ARM to an IBM mainframe, you�re going to get something that sort of works everywhere, but which probably doesn't have the particular features that you would like.
Also, you could program multithreaded C++ in 1996 if you had a BeBox, before the standard even existed.

It's largely just because of the infrequency of updating the standard, and in some cases it's forgiveable because of how recently they became important. C++03 was ratified before the Core 2 Duo shipped.

I think some more glaring omissions are things like std::filesystem only just coming in C++17, because you know opening and reading/writing files was important enough to make it into the ANSI C standard but 3 decades later it turns out scanning a directory's contents might be important too! (And it still has nothing for asynchronous IO.)

OneEightHundred fucked around with this message at 02:56 on May 20, 2020

# ? May 20, 2020 02:53

Dominoes: Sep 20, 2007

csammis posted:

Is there any reason you can't just use reference parameters?

edit: did you mean std::tuple ?

code:

C:\...Anyleaf.h:42:10: error: 'tuple' in namespace 'std' does not name a template type

     std::tuple<double, double> calibrate(CalSlot slot, double pH);

          ^~~~~

exit status 1

I went with Tuple since it was the first thing that came up on Google results, and is the idomatic way of doing it in most langs. I'll look at ref params. I tried an array too, but it seems like you can't return that, unless you get clever with pointers.

Dominoes fucked around with this message at 03:02 on May 20, 2020

# ? May 20, 2020 02:58

Adbot: ADBOT LOVES YOU

# ? Jun 9, 2024 02:43

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

Did you actually include the <tuple> header?

# ? May 20, 2020 03:00

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > C/C++ Programming Questions Not Worth Their Own Thread

«‹›641 »