Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
For what it's worth, if you're dead-set on reading a single character at a time, you should use fgetc, not fread. Analogously, if you're dead-set on undergoing open-heart surgery unaesthetized, you should at least get good and drunk first.

Adbot
ADBOT LOVES YOU

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

Mola Yam posted:

I'm basically a programming retard, and while I've managed to cobble this Excel VBA together from various tutorials, it doesn't work, and I don't even know enough about what I'm doing wrong to know what to google for to try to fix it.

I don't know the Excel API well enough to know if you're doing something obviously wrong there, but if you're using this function as, well, a function, you'll want
code:
Return State
at the end.

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

Viper2026 posted:

Would that be something like this...?

Applicative order: evaluate arguments, then apply
procedure to values

Normal order: substitute argument expressions for
corresponding parameters in body of procedure
definition, then evaluate body

Proper "lazy" evaluation (as it appears in e.g. Haskell) is a little more subtle than that, but yeah, you get the general idea.
code:
if
is a special form in LISP/Scheme that can't be duplicated by a simple user-defined function, precisely because it only evaluates one of its second and third arguments, and which one depends on the result of the first. I believe it can be duplicated with a macro, though.

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
Conceivably, a compiler could compile a string switch into an int switch on a perfect hash, where each case verifies the match. I don't know of any compilers that actually do it, though.

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

slovach posted:

This is probably stupid, but:

Should local variables be avoided on a function that's called an upwards of 100 times a second?

Generally, no. In a compiled language, local variables are mostly an abstraction for expressing how data flows through a function, and using "extra" variables will often make absolutely no difference in the final code. Interpreters don't necessarily have the flexibility to get away with that, so you might pay for using "too many" locals with a little extra memory overhead. Regardless, it is almost guaranteed to not cause a substantial performance impact.

If you are absolutely dying for performance, and you have done everything else in your power, and you are absolutely certain that a function is a hotspot, and you magically know that a variable is not going to be allocated into a register, and you are writing a non-reentrant function in non-PIC code on the right sort of architecture, then it can sometimes improve performance to declare a local variable static rather than auto. But you should probably just forget I ever said that.

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
Also, 100 times a second is not very often.

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

Avenging Dentist posted:

Sure it will. In most languages, local variables get pushed onto the call stack, so you'll be dealing with a (very) small but measurable cost to memory. Chances are you're not going to be losing much (if anything) in the way of speed, though.

But that's exactly what I'm talking about : in an optimizing compiler, local variables are mostly just an abstraction for describing how data flows through a function. If your extra variable actually alters the mechanics of the function such that more data is suddenly flowing around, then sure, you're increasing register pressure, and you might force things onto the stack. But I'm assuming we're talking about equivalently-written functions, except that one of them is naming its temporaries or its constants or something.

That is, if your optimizer doesn't compile this:
code:
int five = 5;
int a = foo();
float b = bar(a);
return baz(b, five);
into the same result as this:
code:
return baz(bar(foo()), 5);
then you really, really need a new optimizer.

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
Okay, sure, yeah. I was definitely just thinking about POD types. In my defense, very few languages have stack-allocatable types with opaque construction/copying semantics.

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

tripwire posted:

Should I be using malloc to declare strings in the parent function? Or, should I just declare a pointer in the outer function and have the child malloc its own string, passing it to the parent by reference?

1) I assume your HTTP responses can get large. You seem to have figured this out already, but to be clear: all the advice about stack-allocated char arrays really only applies to fairly small, preferably fixed-size arrays, and for anything larger you really want to be using malloc().

2) Dealing with strings in C involves a lot of obnoxious grunt work with char arrays, and your life will be massively easier if you can use a C++ string class.

3) If you have to use pure C, then you should feel free to use malloc in whatever way is most convenient. However, there's a general rule you should always keep in mind in any language with manually-manged memory: every time you allocate more memory, you should always have a plan for how you're going to free it. If it's not immediately obvious from the code, you might even want to add a comment, like:

code:
/* belongs to the depot_t, freed when depot is resized or destroyed */
depot->missile_bay = malloc(n * sizeof(missile_t))
(note that I didn't need to cast the result of malloc because void* is implicitly convertible to any pointer type and vice-versa).

Applying this rule, there's nothing intrinsically wrong with returning a reference to malloced memory from a function; it just means that the caller has to take responsibility for freeing it (unless you're doing something very tricky). If you can keep that straight, though, there's no reason to avoid it. You do need to avoid things like mixing allocated and unallocated memory; for example, this is bad, because the caller can never know if it's safe to free the result:

code:
  char* copy_string(const char* str) {
    if (int len = strlen(str)) { /* might be a C++-ism, I can never remember */
      char* copy = malloc(len);
      memcpy(copy, str, len);
      return copy;
    }else return "";
  }

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

Boz0r posted:

Does anyone know if there is some sort of cool algorithm for cutting up a random convex/concave polygon into triangles in a smart way, or if it's impossible to do automatically?

If you know that a closed polygon is convex, you can divide it into triangles by just picking a point and drawing segments to all the non-adjacent points. Concave polygons will be a major problem; depending on your application, though, you might be able to use negative polygons to some effect. What are you trying to do?

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

tripwire posted:

In PCRE I was capturing the URI with this pattern:
(/(?:[^\r\n/ ]+(?:/[^\r\n/ ]+)*)?)

It wants a slash first, optionally followed by one or more valid characters, followed by slash with some more characters which can appear many times or not at all.
How do I codify that as a regular expression which doesn't use nested backreferences? I want to apply the * operator (zero or many times) on the nested substring but I don't want to capture the nested substring because that will fuckup later captures I have to do.

Do you absolutely have to do this with regular expressions? It's pretty straightforward with strchr and strrchr.

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

tripwire posted:

Well it doesn't absolutely have to be done with regular expressions, but how would I validate the URI without them? I guess I could make a big wack of if statements that do roughly the same thing, it just seems kludgy.

That regexp is validating exactly three things: that there aren't any CR or LF characters in the URI (which is probably guaranteed by your parser anyway), that the URI starts with a slash, and that all the components are non-empty. You'll need to do better validation than that anyway (e.g. to invalidate URIs like /mysite/../../../etc/passwd), and that validation will need to look at individual components, so....

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

tripwire posted:

There is no parser, I'm using a big regex to parse the whole thing. The whole perl compatible regexp looks like this:
(POST|GET|HEAD|LINK|PUT|DELETE) (/(?:[^\r\n/ ]+(?:/[^\r\n/ ]+)*)?) HTTP/(1.[10])\r\n((?:[^\r\n]*\r\n)*)\r\n

Incidentally, RFC 2616 says you should ignore initial CRLFs in HTTP requests.

Anyway, HTTP is a line-based protocol, and you have to keep reading the header until you get to an empty line, so I assume you're just looping on a read-and-append until this regexp matches — in which case you'll never actually detect failure anyway.

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

tripwire posted:

I know for a fact that I've completely hosed up on dynamically allocating; I thought I had it figured out but I was getting hard crashes from double freeing/corruption so I just removed every offending free() until it stopped complaining. If someone can tell me when/how to allocate in the context of that program without causing corruption I'd really appreciate it.

code:
current_request  = xmalloc(sizeof(struct full_request));
current_response = xmalloc(sizeof(struct full_response));
char * output = xmalloc(sizeof(char)*160);
This has nothing to do with your malloc problems, but: you don't need to use malloc here, since you're just creating temporaries for the lifetime of a single function call. Just declare these on the stack.

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

Jo posted:

Is there a name for this kind of problem? I'm really at a loss for anything other than exhaustive search. :gonk:

I'd call it a "maximal intersection" problem.

Jo posted:

The elements are not ordered (rather, cannot be ordered because order is meaningless).

You can always impose an ordering on a set (n.b. please do not bring up the axiom of choice). Unless O(n log k) (k being the size of the largest set) is too much, I'd just do the standard sort-intersection, except of course you don't actually need to materialize the intersections.

EDIT: I left the algorithm deliberately vague just in case this is a homework assignment. If you need that clarified, I'd be happy to.

rjmccall fucked around with this message at 06:33 on Oct 12, 2008

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

Jo posted:

I was hoping to avoid sorting, as the function I have to determine if an object is in a set is featurePoint.equals( fp2 ). equals does a rough evaluation and returns true if the right number of attributes match up.

Ugh, yeah, structures with deep equality semantics are just really difficult to work with. I can't see how you can improve substantially over the brute-force algorithm without further assumptions, though.

EDIT: Oh, that sounds cool. Can you explain a little (or even just give me a pointer) about what an image feature is and what it means to evaluate it? Maybe I can help you with an ordering.

rjmccall fucked around with this message at 07:16 on Oct 12, 2008

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

Stephen posted:

This is why I was hoping there was something built-in that I could use, since I always miss this kind of poo poo. Fortunately I never have to verify anything other than http: or https:

URI schemes have to match [-+.a-zA-Z0-9]+, and colon is not a valid hostname character ([-.A-Za-z0-9]+, ignoring internationalization), so if you're willing to require absolute URLs, you can just prepend http:// if the URL doesn't match ^[-+.a-zA-Z0-9]+:.

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

Avenging Dentist posted:

Good job, you just broke URLs for everything with an explicit port.

Hey, I knew I was forgetting some valid use for colons.

Yeah, probably just go with a heuristic, or demand more user intelligence.

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

ShoulderDaemon posted:

If you have char buffer[] = "Hello, World!" and you want a char * to just the second word, you can use char *world = &buffer[7] or, more directly, char *world = buffer + 7.

This, and if you need a copy for some reason you can use strcpy (or memcpy and an explicit terminator if it's a non-terminal substring).

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

Avenging Dentist posted:

Just type Ctrl+D for EOF.

(Probably doesn't work on Windows, but 1) I don't want to doublecheck, 2) gently caress Windows anyway)

IIRC, it's Ctrl+Z on Windows, although who the gently caress knows what that does to stdin in a Windows terminal.

Nahrix, there are two problems you'll have when using a terminal. The first is that terminal input is line-buffered by default, so you won't get your 'character at a time' until the user hits enter. The second is that, yes, EOF is a little weird on a stream attached to a terminal. That said, I don't really see the difficulty; just invent an 'end' command, check for it in the loop, and break if you see it?

rjmccall fucked around with this message at 07:21 on Nov 7, 2008

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

Vanadium posted:

You will probably want to use filter, as in filter ((/=) 4 . fst) yourList.

Haskell lets you write (/= 4), and it does what you think.

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

Vanadium posted:

How is ((/= 4) . fst) better than ((/=) 4 . fst)?

Well, the grouping is more obvious, but you're right, it doesn't make a difference in this case.

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

Bonus posted:

So, does anyone have any hints or suggestions for making this something better than O(n2)?

You can create immutable data structures in non-functional languages. The language might not guarantee that it's immutable, but it's easy enough to just. not. mutate it.

Also, it doesn't change the asymptotic complexity, but in your restricted setting there's a pretty obvious encoding for paths requiring only N-1 bits (where N is the length in segments of the main roads).

PT6A: the problem setup is more restricted than the general problem over graphs, so it's worth considering whether we can improve over the general case. In this case, of course, the answer is "not really", although we should get better constant factors.

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

Bonus posted:

Well that's what I'm doing here, instead of mutating the list with the path, I'm copying it. It's just that when you have a lazy language, you usually get persistence for free, so you don't incur time penalties for making a copy.

But that's what I mean. In a functional language, you would use a data structure that doesn't require a copy, and you can use that exact same data structure in Python. My Python might be rusty, but:

code:
class Cons:
  def __init__(self,hd,tl):
    self.head = hd
    self.tail = tl
    if tl:
      self.cost = hd.cost + tl.cost
    else:
      self.cost = hd.cost

def shortest_path():
  left = Cons(left_segments[0],None)
  right = Cons(right_segments[0],None)
  
  i = 0
  while i < n - 1:
    cross = crosses[i]
    if (left.cost + cross.cost < right.cost):
      right = Cons(cross, left)
    elif (right.cost + cross.cost < left.cost):
      left = Cons(cross, right)

    i = i + 1
    left = Cons(left_segments[i], left)
    right = Cons(right_segments[i], right)
  
  # whatever post-processing you need to do here

rjmccall fucked around with this message at 23:38 on Nov 25, 2008

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

Bonus posted:

I'm going to also use a trie for storing words of the alphabet and searching through them, etc. What's the best way to store the edges in a node?

The general rule with tries is that you should use a fixed-size array unless the character set is prohibitively large. 256 is not prohibitively large, but (say) the Unicode set is. Even if the "flat" character set is prohibitively large, you should consider splitting individual characters into sequences of smaller sets, e.g. with Unicode by branching on the bytes of a UTF-8 encoding.

Bonus posted:

If I use an array that I increase and decrease in size, I save space but I have to traverse the edges to find the next node.

If there's a clear pre-processing phase, you could keep them in sorted order and do a binary search. But again, this isn't worthwhile unless you can't use fixed-size arrays.

Bonus posted:

If I just use an array of 256 fields (one for each character in ASCII), I can just index them to get the next node but 256 fields times 8 for each (char, node pointer) pair is 2Kb.

1. Why would you need to store the character in each entry?
2. Another advantage of fixed-size arrays (if you're using C/C++) is that you can keep the array inline in each node, which should help space usage a little (and significantly reduce your running time).
3. The only reason to support 256 branches at each node is if you want to treat strings as opaque binary data, which is reasonable if they're 1) actually binary data or 2) arbitrary Unicode text. Otherwise, you should figure out what assumptions are valid about your data, then go hog-wild: notably, case-insensitive English words only need 26 branches per node (plus a few if you allow compounds / hyphenated compounds / contractions).

Bonus posted:

I could also use a hash table but I don't know if that's worth it.

This is a decent option if your character set is very large (or infinite).

rjmccall fucked around with this message at 23:04 on Dec 7, 2008

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

PT6A posted:

Stupid question about QuickSort: I understand why most of the algorithm works, but I'm confused as to swapping the pivot element at the end with whatever the last unsorted element is. If the pivot is greater than all other elements being sorted, wouldn't this result in the algorithm giving an incorrect result? Yet none of the pseudocode implementations I've seen check to make sure this is not the case.

I assume you're talking about the last step in the in-place partition phase; Wikipedia gives representative pseudocode for this:
code:
function partition(array, left, right, pivotIndex)
     pivotValue := array[pivotIndex]
     swap array[pivotIndex] and array[right] // Move pivot to end
     storeIndex := left
     for i  from  left to right &#8722; 1
         if array[i] &#8804; pivotValue 
             swap array[i] and array[storeIndex]
             storeIndex := storeIndex + 1
     swap array[storeIndex] and array[right] // Move pivot to its final place
     return storeIndex
The check you suggest is unnecessary because storeIndex will be exactly equal to right in that case, so the swap will be a no-op.

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

Sleeper354 posted:

I have a question related to threading. I'm taking a class where its a pretty big chunk. I want to know the difference between the following:

Condition Variable
Lock
Semaphore

From my understanding (my most recent one from observing) is that a condition variable is just that a variable, a Lock is a condition variable, and a Semaphore is a combination of the previous two. However, this definition has changed several times. I understand how they work, but I'm having trouble distinguishing them.

Locks, in their most basic form, support two operations: acquire and release. Acquiring a lock prevents other threads from acquiring the lock; hence, if different parts of the program follow the protocol of always acquiring a lock before continuing, the lock guarantees that only one thread can execute any of those parts at once. When you're done with the lock, you have to explicitly release it to permit other threads to acquire it again. This is also known as a mutual-exclusion lock, hence "mutex".

Locks may or may not be: blocking (if you request a lock, your thread stops executing until the lock is available), recursive (a single thread can safely acquire the lock multiple times), multi-mode (the lock can be held in more than one way; some ways may have weaker exclusion guarantees), transferable (from thread to thread), fair (a thread requesting the lock is guaranteed to eventually acquire the lock).

Semaphores are essentially just atomic counters; they can be implemented very efficiently on most hardware. The classic operations are increment (V, which always succeeds immediately) and decrement (P, which blocks until the value is positive before decrementing). Semaphores are directly useful for throttling the number of threads using some resource; this, in turn, means you can use them to implement locks.

Semaphores may or may not be: fair (blocking threads are guaranteed to be woken), bounded (there's a meaningful upper limit on the count: zero-one semaphores are a special case, and are essentially non-recursive mutexes).

Condition variables are basically a way to wait for something to happen. Threads wait on the condition, which causes them to block until another thread signals (a.k.a. notifies) the condition, which then wakes a single waiting thread.

Condition variables may or may not be: broadcasting (they support a "signal all" operation which wakes all blocked threads), "lossless" (i.e. they accumulate extra signals, which are then automatically consumed by threads that try to wait), fair (i.e. threads get woken in the order they waited), Hoare-style (the woken thread begins to run immediately; very powerful on cooperatively-multitasked systems), associated with a lock (which must be held when waiting or notifying, is automatically released during waiting, and is automatically re-acquired after wake-up).

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

Rocko Bonaparte posted:

I have some MySQL database questions I feel kind of bad putting in its own thread since I imagine they're still beginners questions, and I'm not sure if it's really something for SH/SC. Maybe if there was a megathread for that stuff, I'd be there.

SQL/databases megathread

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

Ensign Expendable posted:

Could someone explain dynamic programming to me? I understand the whole concept of reusing previous calculations, and the "largest continuous sum in an array" example everyone seems to give makes sense, but none of the larger problems do.

Dynamic programming is very straightforward. First off, it only applies to problems where the solution to a problem can be expressed in terms of the solutions of slightly smaller problems. You then solve the problem "bottom up": resolve the smallest problems first, remember their answers, and then keep using those to answer the next-smallest problems. So the key is to find some way to state the solution to the whole problem in terms of the solutions to smaller problems.

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

NightGyr posted:

This is for an in-depth game, so there may be a few hundred or thousand objects in the game state, and turns may come in hours apart, so it would be silly to keep it running while waiting.

It sounds like, in the worst case, your server will be marshaling a few hundred megabytes every few minutes. This is not the sort of scale that's worth optimizing for; just do the easiest thing until it really starts biting you. If/when you do decide to get fancy, I would suggest using a memory-sensitive cache of game state; Python has weak references, which should help a lot with that.

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
Well, let's start with lowering the constant factors of your current algorithm.
  • You're passing a lot of unused parameters. Don't.
  • I'm almost positive that none of the current JavaScript implementations ever do CSE on property lookups (including array access). You should really be doing this yourself on expressions like, e.g., rows[i].potChildRows, which can be hoisted out of a hot loop.
  • Arrays have a push method for appending values to the end of the array, and it really should be faster than what you're doing.

Okay, now, the algorithm. Suppose we're solving WxH. Let R(W) be the number of unique rows of width W, and N(W) be the expected number of potential neighbors for a row of width W. Then your algorithm has these running time recurrences:
pre:
  T_W(W,H) = T_findAllRows(W) + T_findMatches(W) + T_findUniqueTables(H,W)
  T_findAllRows(W) = 1 + T_findAllRows(W-2) + T_findAllRows(W-3)
  T_findMatches(W) = R(W) * R(W)
  T_findUniqueTables(H,W) = R(W) * T_findTables(H,W)
  T_findTables(H,W) = N(W) * T_findTables(H-1,W)
T_findAllRows is O(2^W). Use dynamic programming; you should be able to get it down to θ(R(W)*W) (you can count them in θ(W)).

T_findMatches is θ(R(W)^2); that's asymptotically optimal, I think, but you can cut the constant factor roughly in half by taking advantage of the fact that matchesBelow is symmetric and never reflexive.

T_findTables is θ(N(W)^H). Again, there's a dynamic programming algorithm which can drop this to O(H*R(W)*N(W)).

Finding the dynamic programming algorithms is really the challenge of the problem, so have fun. EDIT: or AD can post one of them, whatever.

rjmccall fucked around with this message at 06:58 on Jan 5, 2009

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

ToastedZergling posted:

Wow, thank you, I never have seen code written out like that or thought to think about my code like that. What does theta mean exactly? Recommend any books / articles on optimization / dynamic programming?

This is all just informal reasoning about asymptotic running times. Wikipedia is not the worst imaginable reference for all this, but you'd probably be happiest taking a real course in algorithms from somewhere.

θ(f) is just the set of things that perform asymptotically the same as f. O(f) is really just an upper bound; e.g. I described T_findTabes as O(2^W) because I'm not really sure what it is, but all the others were &theta(something).

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

Avenging Dentist posted:

Just to be completely nit-picky, you should probably use Θ instead of θ. Not that it matters a ton in this situation, since little-theta wouldn't really make sense (at least not in the sense that Θ is Ο and &Omega).

Hmm, good point. I've been using θ for ages, but you're right, it's a Θ.

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

TSDK posted:

Also the casting and pseudo-optimisation of turning one divide into two multiplies is a bit dumb. For quick and dirty, I generally use something like:
code:
clock_t start = clock();
// STUFF
clock_t stop = clock();
printf( "Time = %f ms\n", (float)( stop - start ) / ( CLOCKS_PER_SEC / 1000 ) );

The division-to-multiplication optimization is not a bad one, although it would obviously be better to turn one division into one multiplication instead of two. If CLOCKS_PER_SEC is a compile-time constant (likely), your compiler should turn this into the optimal thing, but it probably wouldn't if it isn't.

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
One thing I want to make clear: the sort of math that makes you a better general programmer (logic and discrete mathematics) is very different in both subject and style from the standard (American) high school math progression (trigonometry and calculus). Lots of people enjoy the former who hated the latter, and not just because the latter was coincident with high school.

3D games programming is heavy on linear algebra, which is sometimes kindof partly covered in calculus sequences.

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
If you're working with string representations of poker hands, odds are very good that it's all 7-bit ASCII and the following will work just fine:

code:
int[] set = new int[128];
foreach (char c in str) set[c]++;
EDIT: but you'll probably be much happier in both the short and long term if you parse the string into a collection of cards.

rjmccall fucked around with this message at 10:34 on Jan 13, 2009

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

Deadpan Science posted:

I have a question that' not really about programming, but it's something I've been wondering about. How do most companies keep their employees from stealing their programs and selling them? It seems like this would be a big problem, especially for small software development businesses that don't have the money to hire a lawyer to constantly file suit against former employees.

Any software company that writes shrink-wrappable software — i.e. software that can be sold prepackaged to businesses or consumers — is going to be large enough to keep a lawyer on retainer. Smaller companies usually write software that's targeted to individual clients, which is inherently difficult to sell black-market, unless the original client was the government and you're committing treason. You could at best steal your company's framework code and use it to satisfy contracts, but you'd be surprised at how inbred local programming circles can be, and if you're caught once....

EDIT: and of course you'd never voluntarily use it yourself.

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
It looks like PAL is written against (some version of) the Linux kernel; syscalls on Solaris are not going to follow the same protocols as syscalls on Linux. You will need to either (1) find a Linux system to test these examples on or (2) learn how to convert Linux syscalls into Solaris ones.

I assume this code listing is for main. The code you've listed seems to be passing the second argument directly to open as the filename argument; that doesn't type-check, as the second argument should be a char**, and open wants a char* there. I think if you followed that instruction with movl 4(%ebx), %ebx then you might get the right result, aside from the whole syscall-convention mess.

Alternatively, this might be for start; I'm not sure what the initial memory layout is for start on Linux.

rjmccall fucked around with this message at 07:05 on Jan 21, 2009

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
This may not lessen your confusion, but why are you expecting %eax to contain errno? I'm honestly not certain where errno is going to be, but %eax should have the return value from open, i.e. either a file descriptor or -1. That said, 2 doesn't sound right; I would expect 3.

Your best reference for this would be to disassemble libc, look for _open, and see what it does.

EDIT: I scanned the OpenSolaris source, and it does look as if all system calls are made with the syscall number in %eax and the arguments on the stack. Can you post the code you tried with that?

rjmccall fucked around with this message at 11:21 on Jan 21, 2009

Adbot
ADBOT LOVES YOU

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
First off, you need that to be $0440 (or 0444, whatever), it's octal. Second, I think you just want pushl %ebx (or for that matter, pushl 4(%ebx) instead of the line above). Third, you're right, it looks like you need the return address to be the top of the stack; good luck on that one.

EDIT: or hey, maybe you don't need that third one; but I bet you used to and there's some crazy binary-compatibility going on.

EDIT 2: Ignore that bit about binary compatibility. It's obviously just a gap left for the convenience of writing syscall C stubs; i.e. you can implement open as simply:

code:
  movl $5, %eax
  lcall $0x27, $0
  ret

rjmccall fucked around with this message at 12:17 on Jan 21, 2009

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply