|
This is probably more a question for 'Ask General Programming Questions Worth Their Own Doctoral Thesis' but any recommendations on where to start researching ways to compare two sets for equality? (Assuming you can cache whatever information you want as the sets are constructed). Failing that, a proof that you can't do better than iterating over each member of each set?
|
# ? Dec 5, 2009 07:05 |
|
|
# ? May 16, 2024 18:47 |
|
the talent deficit posted:This is probably more a question for 'Ask General Programming Questions Worth Their Own Doctoral Thesis' but any recommendations on where to start researching ways to compare two sets for equality? (Assuming you can cache whatever information you want as the sets are constructed). Failing that, a proof that you can't do better than iterating over each member of each set? Do you have any more context? There are ways you can design your set representation to make inequalities likely to be detected earlier rather than later in the most common cases for your problem, and there are representations that depend on some degree of observable sharing which can change your decision algorithm from O(n) in the number of elements in the list to O(n) in the number of modifications made to the two lists since the last time you confirmed they were equal, although in practice few people care to use these except for rare cases such as infinite sets. Dense sets can be checked for equality in constant time. Consequently, if you are dealing with disjoint unions of dense sets, you can reduce to O(n) in the number of dense subsets with the right representation. More practically, it's often possible to derive quick lemmas about what kinds of modifications are possible, and check only restricted subsets for equality. If your representation is the naive sorted-list representation of a list, and you have no prior information to base your test on, then the proof that testing for equality is O(n) is rather trivial; the complexity of list equality is typically taught as part of any introduction to complexity class in a CS program.
|
# ? Dec 5, 2009 07:36 |
|
the talent deficit posted:This is probably more a question for 'Ask General Programming Questions Worth Their Own Doctoral Thesis' but any recommendations on where to start researching ways to compare two sets for equality? (Assuming you can cache whatever information you want as the sets are constructed). Failing that, a proof that you can't do better than iterating over each member of each set? Maybe I'm misunderstanding the question but I'm not sure I see the problem? If the sets are sorted thats trivially O(n). If one is sorted it's trivially O(log(n)). If neither is sorted, you can easily get nlog(n) by sorting/comparing. In any situation you can get O(n) with hashing.
|
# ? Dec 5, 2009 07:40 |
|
That Turkey Story posted:If one is sorted it's trivially O(log(n)). Surely you mean O(n log(n)). O(log(n)) is the cost of a single lookup for a sorted-list-set, and in any case it would be somewhat absurd for the cost to compare equality of two sorted lists to be higher than the cost to compare a sorted and an unsorted list.
|
# ? Dec 5, 2009 08:01 |
|
Woops yeah nlog(n).
|
# ? Dec 5, 2009 08:11 |
|
Use an associative crytographic hash function and a balanced binary search tree -- stick hashes at every node describing the subtree beneath at the respective node.
|
# ? Dec 5, 2009 21:55 |
|
Let's say we invented some sort of software-based intelligence... If we decompiled said software, would it lead to some weird algorithm we've never seen before that unlocks the mysteries of life? Or would it be pretty normal looking?
|
# ? Dec 7, 2009 19:22 |
|
I was going to type up some smartass answer but if you're actually serious about this as a topic, it's worth its own thread.
|
# ? Dec 7, 2009 19:26 |
|
Triple Tech posted:Let's say we invented some sort of software-based intelligence... Or neither. As Ars Techinca put it when talking about the supercomputer allegedly as powerful as a cat brain a few weeks back: quote:In a nutshell, when a simulation of a complex phenomenon (brains, weather systems) reaches a certain level of fidelity, it becomes just as difficult to figure out what's actually going on in the model—how it's organized, or how it will respond to a set of inputs—as it is to answer the same questions about a live version of the phenomenon that the simulation is modeling. So building a highly accurate simulation of a complex, nondeterministic system doesn't mean that you'll immediately understand how that system works—it just means that instead of having one thing you don't understand (at whatever level of abstraction), you now have two things you don't understand: the real system, and a simulation of the system that has all of the complexities of the original plus an additional layer of complexity associated with the models implementation in hardware and software.
|
# ? Dec 7, 2009 19:36 |
|
quote:If we decompiled said software, would it lead to some weird algorithm we've never seen before that unlocks the mysteries of life? Or would it be pretty normal looking? INSUFFICIENT DATA FOR MEANINGFUL ANSWER.
|
# ? Dec 7, 2009 19:50 |
|
Triple Tech posted:Let's say we invented some sort of software-based intelligence... Read GEB.
|
# ? Dec 9, 2009 01:52 |
|
Another OCaml question: I'm trying to implement the software I've written in C++ for my senior thesis in OCaml, (well, the interesting bits) in order to learn OCaml and hopefully use it or another functional language in my graduate work. Currently, I'm just trying to write a program which echoes a dataset that's been piped into the program from the terminal. EG code:
Here is the (relevant and closed under itself) code currently: code:
code:
code:
In addition, why does the ocaml interactive interpreter blow so hard, as well as many other interactive interpreters? I can't move around with the arrows, it doesn't have a command history, etc... I mean it's hard for me to find it useful when I have to backspace out 80 characters because I made a spelling mistake. I'm left with copying and pasting into the terminal, or loading the module when I start the interpreter, or using a #use statement. Just ranting, really. Interactive mode is really nice to have. Also, is there a way I can make my pathetic head and tail functions better/safer? I thought about something like if <> [] then some_list_expr else [] but I think that would cause more headaches than needed to get rid of a compiler warning, which should not have been taken lightly in the first place. EDIT: Ah! Perhaps in x::y where y is some list of objects x must be another object and not another list of objects? I need list concatenation? EDIT2: Got that one! I needed the @ operator. Now I'm getting an reference to undefined global `Str'. Need to use or load something, most likely. Maybe not because I'm compiling natively? I try "ocamlc file.ml str.cma -o file" as well as "ocamlc file.ml Str.cma -o file" and both give me "Reference to undefined global `Str'". I'm linking something wrong. Ah, I got it! Str.cma needs to come before file.ml. Makes sense. Correct code for completeness: code:
causticfluids fucked around with this message at 06:40 on Dec 9, 2009 |
# ? Dec 9, 2009 05:50 |
|
causticfluids posted:In addition, why does the ocaml interactive interpreter blow so hard, as well as many other interactive interpreters? Run them inside rlwrap or inside an Emacs buffer.
|
# ? Dec 9, 2009 06:23 |
|
shrughes posted:Run them inside rlwrap or inside an Emacs buffer. Ah, wow, thanks for the great tip on rlwrap. I am surprised I've never encountered it before. I should have thought about readline.
|
# ? Dec 9, 2009 06:37 |
|
ShoulderDaemon posted:Do you have any more context? There are ways you can design your set representation to make inequalities likely to be detected earlier rather than later in the most common cases for your problem, and there are representations that depend on some degree of observable sharing which can change your decision algorithm from O(n) in the number of elements in the list to O(n) in the number of modifications made to the two lists since the last time you confirmed they were equal, although in practice few people care to use these except for rare cases such as infinite sets. I was criminally vague initially. Sorry. This is what I really want: I've got two systems, completely distinct from one another. They're organized as sets of nodes containing various state. Right now, we're using a tree representation (merkel trees) to calculate a single hash of the state of each system we can compare trivially. However, this restricts us to having systems with identical internal organization. If any state is moved from one node to another, or the number of nodes changes, the hashes now represent the same state, but are distinct and non comparable. I know there are O(n) algorithms to calculate this state, but O(n) is, unfortunately, too much overhead (as N is very large when discussing the entire set). What I really need is a hash function such that hash(A, hash(B, hash(C, hash(D, E)))) = hash(E, hash(D, hash(C, hash(B, A)))) (for any permutation of A, B, C, D, E). I know these exist, but I'm unable to find anything on practical implementations.
|
# ? Dec 9, 2009 09:49 |
|
Why would someone want an in-memory database? My assumption here is that the data is important and not disposable. I get that moving databases to RAM makes it tons faster, but doesn't that make the data not safe in case of a power outtage or crash? How do these in-memory databases address this issue? Or are there really use cases where people don't care about the longevity of the data? If they did care, how and when would the data be written to disk?
|
# ? Dec 9, 2009 16:47 |
|
Triple Tech posted:Why would someone want an in-memory database? I think the fastest possible database server would use an in-memory database that did full daily backups and kept an incremental change log for recovery purposes. It's also pretty easy to copy data from database to database - 2 statements per table in MySQL to make an exact copy (as far as I know, at least). If you wanted to manipulate a large dataset as fast as possible in order to generate a report, you could copy it to an in-memory database, do whatever you want quickly and discard the database.
|
# ? Dec 9, 2009 17:16 |
|
Triple Tech posted:Why would someone want an in-memory database? http://en.wikipedia.org/wiki/In-memory_database
|
# ? Dec 9, 2009 17:17 |
|
the talent deficit posted:What I really need is a hash function such that hash(A, hash(B, hash(C, hash(D, E)))) = hash(E, hash(D, hash(C, hash(B, A)))) (for any permutation of A, B, C, D, E). I know these exist, but I'm unable to find anything on practical implementations. Assuming you can't have duplicate data elements, you can trivially do the Zubrist thing. For each distinct data element, compute a high-entropy* hash for just that element. Normally you'd simply choose a distinct random number, but because you have multiple systems that you probably don't want to have to synchronize the random numbers between, you may find it simpler to just use a cryptographic hash of the data here. To hash a database with data elements A, B, and C, simply take the XOR of the element hashes. XOR is Abelian, which is the property you want. *High entropy is not strictly the property you want here, which is really just orthogonality, but it's much simpler to deal with because it's not a global property and tends to result in orthogonal sets. ShoulderDaemon fucked around with this message at 18:00 on Dec 9, 2009 |
# ? Dec 9, 2009 17:57 |
How do you ask a senior developer about the formatting of their code without offending them? It's so hard to read when it's 5,000 lines of: code:
|
|
# ? Dec 10, 2009 00:05 |
|
fletcher posted:How do you ask a senior developer about the formatting of their code without offending them? This is what I hate about where I work. Emphasis on senior developer. He's generated three jobs because of the lovely work he does. When push comes to shove, we can't confront him about the code he makes because it will hurt his "feelings".
|
# ? Dec 10, 2009 00:53 |
|
Get everyone on the team to agree on a set of coding standards and then refactor the whole code base with some automated tool.
|
# ? Dec 10, 2009 03:15 |
|
MEAT TREAT posted:Get everyone on the team to agree on a set of coding standards
|
# ? Dec 10, 2009 05:05 |
|
fletcher posted:How do you ask a senior developer about the formatting of their code without offending them? Auto-format as you come across them? As long as everyone agrees on the format auto-format generates, you're golden
|
# ? Dec 10, 2009 09:21 |
|
KARMA! posted:Auto-format as you come across them? You should do this. Do not dicuss it with the other developers if there is no current coding formatting standard, you just need to get your formatting style in as many pieces of code as possible. If approached, and only if approached, tell them you've heard Google uses similar standards to the ones you've implemented in a few of their groups, and you were just about to bring up the issue for discussion anyhow.
|
# ? Dec 10, 2009 09:33 |
|
Janitor question since I had to janitate today... If I have Cron -> Process A -> rsync running, so I can't see the output of Process A or rsync, is there a way I can get that STDOUT/STDERR to my terminal after it has launched?
|
# ? Dec 12, 2009 01:27 |
|
Triple Tech posted:Janitor question since I had to janitate today... Redirect it to a file then tail that file?
|
# ? Dec 12, 2009 02:34 |
|
You can redirect a process that's already running?
|
# ? Dec 12, 2009 03:41 |
|
Triple Tech posted:Janitor question since I had to janitate today... Cheap answer: wait until cron emails the output to you, then cat it to your terminal. It's very nontrivial to redirect output from an already-started process. The easiest approach is to ptrace the process, reopen the relevant file descriptor to point somewhere else, then resume its execution; note that this may gently caress with any buffers the program is maintaining internally.
|
# ? Dec 12, 2009 04:10 |
|
I've started hacking for Gnome / Nautilus. I've been writing a patch for the god-awful way it handles list view column widths (not very intelligently). For the longest time, I've used gedit, and it's worked great for most thing's I've written up to about 5 files of ~200 lines or so. It can actually be extended pretty well through python and the hundred-ought plugins. But now, I'm dealing with something which has thousands of lines of code across hundreds of files. How does one usually approach such a project? What editor / IDE do you use? With a general lack of documentation of which header certain prototypes are coming out of, how does one figure this poo poo out without reading through 10,000 lines of C. Is there some special trick to tackling these bigger codebases or is it still pretty much "use your editor, get good at git/svn/whatev, read what docs there are, don't be stupid, effort."
|
# ? Dec 12, 2009 07:25 |
|
ctags and cscope are both useful programs for large unfamiliar codebases in C. ctags is designed to be used by a frontend provided by your editor, cscope has its own interface.
|
# ? Dec 12, 2009 07:35 |
|
Triple Tech posted:You can redirect a process that's already running? Missed that bit of your post; I thought you had control over starting/stopping the process. As mentioned, redirecting process output while running is unfun at best.
|
# ? Dec 12, 2009 09:51 |
|
The real answer is we need a more mature logging framework... Probably STDOUT multiplexed to a file and a database... The problem was an rsync that was taking too long. Usually it's supposed to be done almost instantly but someone dropped a multi gigabyte file into the sync directory and we took that as one of the servers dying. So we had to kill our main process, slice out the code that it gone hung up on, and run it from the console to stare at it. So. Ghetto.
|
# ? Dec 12, 2009 16:56 |
|
Triple Tech posted:The real answer is we need a more mature logging framework... Probably STDOUT multiplexed to a file and a database... tee(1)
|
# ? Dec 12, 2009 23:50 |
|
Otto Skorzeny posted:tee(1) I've only seen tee work for file descriptors, not completely different output media like e-mail and database. I was thinking more log4j/log4perl. I find operating a systems group by babysitting e-mails to be an extremely ghetto/inefficient solution.
|
# ? Dec 13, 2009 05:08 |
|
tee works with filenames and with stdout, if it worked on file descriptors you could tee to a socket (I suppose on plan9 you could do that because it uses 9p rather than berkeley sockets)
|
# ? Dec 13, 2009 05:39 |
|
Otto Skorzeny posted:tee works with filenames and with stdout, if it worked on file descriptors you could tee to a socket (I suppose on plan9 you could do that because it uses 9p rather than berkeley sockets) You can use, for example, /proc/self/fd/5 in Linux. You can do stuff with mkfifo, cat, and nc too.
|
# ? Dec 13, 2009 06:55 |
|
batch question: is it possible to make Labels and GOTO commands work within a FOR loop? I'm trying to have a loop perform a task 1-3 times, and after each time this task runs I want to run a timer and check that it has finished properly, which will repeat until it is. However, when I run the script to FOR loops token (%%a) is lost the first time it encounters a GOTO command, so the contents of the loop will only run once:code:
|
# ? Dec 15, 2009 17:29 |
|
BizarroAzrael posted:.bat ohgod So what reasons are there for not using a proper scripting language like Python to do this? It seems kinda... counterproductive.
|
# ? Dec 15, 2009 18:39 |
|
|
# ? May 16, 2024 18:47 |
|
Or if not Python, how about Powershell or VBScript? All of your questions are about batch file functionality that you don't seem to be totally clear on and can't find documentation about (this is because batch files suck). I would imagine this would be executed one hell of a lot easier and faster in a more usual scripting language.
|
# ? Dec 15, 2009 21:58 |