Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
TheresaJayne
Jul 1, 2011

carry on then posted:

as if there's any prestige in working for a company outside the apple/google/facebook triangle anyway

I know someone who quit google and now works for linkedin

Adbot
ADBOT LOVES YOU

Volmarias
Dec 31, 2002

EMAIL... THE INTERNET... SEARCH ENGINES...

TheresaJayne posted:

I know someone who quit google and now works for linkedin

Congratulate him on his work anniversary.

Every hour.

carry on then
Jul 10, 2010

by VideoGames

(and can't post for 10 years!)

TheresaJayne posted:

I know someone who quit google and now works for linkedin

How do they like working in the backwaters of the software industry?

TheresaJayne
Jul 1, 2011

carry on then posted:

How do they like working in the backwaters of the software industry?

They have had plenty of good jobs, they enjoy what they are doing.

They worked at MySql, Google, blizzard, and now LinkedIn.

feedmegin
Jul 30, 2008

carry on then posted:

How do they like working in the backwaters of the software industry?

Id like it just fine for a big enough salary :getin:

Linear Zoetrope
Nov 28, 2011

A hero must cook
Why is an inverted index called "inverted"?

TooMuchAbstraction
Oct 14, 2012

I spent four years making
Waves of Steel
Hell yes I'm going to turn my avatar into an ad for it.
Fun Shoe

Jsor posted:

Why is an inverted index called "inverted"?

I had to look up what an inverted index is, but okay. Generally in mappings you map from some kind of "address" or key or whatever to the data that is associated with that key. In an inverted index you map from the data to the key instead. It's inverted because it's backwards (compared to the normal approach), which is what inverted means.

Bognar
Aug 4, 2011

I am the queen of France
Hot Rope Guy
In the context of full text indexes, a forward index answers the question "Given a document, what words appear in that document?". An inverted index answers the opposite question "Given a word, what documents does that word appear in?". In this case, the document is the key and the words are the data.

The MUMPSorceress
Jan 6, 2012


^SHTPSTS

Gary’s Answer

Bognar posted:

In the context of full text indexes, a forward index answers the question "Given a document, what words appear in that document?". An inverted index answers the opposite question "Given a word, what documents does that word appear in?". In this case, the document is the key and the words are the data.

That feels wrong to me (not saying it is wrong, though). What use is an index of words where the key is documents? You'd just open the document and search it then. Your example of inverted index is exactly what I'd think of as an actual index. It is analogous to the index of an actual book.

Bognar
Aug 4, 2011

I am the queen of France
Hot Rope Guy

LeftistMuslimObama posted:

That feels wrong to me (not saying it is wrong, though). What use is an index of words where the key is documents? You'd just open the document and search it then. Your example of inverted index is exactly what I'd think of as an actual index. It is analogous to the index of an actual book.

One could argue that a book's Table of Contents is an index keyed by documents with words as data. Or, I suppose more appropriately as the corollary to a book index, the page is the document keyed by its page number with words as data. A book index inverts that forward index so that the key is the word and the page number is the data. Dunno who picked the naming scheme, but the index of a book is a classic example of an inverted index.

Hughmoris
Apr 21, 2007
Let's go to the abyss!
I'm trying to move beyond simple, short scripts and create my first real project. I want to create a barebones replica of PLEX, or XMBC. I want to scan my local media library, go out and scrape IMDB for information for the found files (season, episode title, episode description, cast, rating etc...) and store that information. Then, maybe present that information in a fancy HTML layout and be able to play the files in a browser.

I really don't have any experience with storing and retrieving data. I'm thinking I don't need a full fledge relational database since I'm not doing anything too intensive. Is there an alternative data storage type I should look at for small projects like this, or is now as good as time as ever to explore sqlite?

raminasi
Jan 25, 2005

a last drink with no ice
You could see if your language has any good serialization support, and just use that. You won't be able to do anything remotely database-y in an efficient way, but you might not have to think too hard about storing the data.

aerique
Jul 16, 2008

Hughmoris posted:

Is there an alternative data storage type I should look at for small projects like this, or is now as good as time as ever to explore sqlite?

SQLite is nice, but since you're chewing off quite a lot already[1] for a first project I'd say go for flat files (which do add their own problems) or like the above commenter said: check out serialization support so you can just read and write objects from disk.

You could also just check out SQLite before you start on this project. That might be a better approach.

[1] but not an unrealistic amount, it's not like first time gamedevs making an MMO

FieryBalrog
Apr 7, 2010
Grimey Drawer
SQLite is really easy to use.

Steve French
Sep 8, 2003

LeftistMuslimObama posted:

That feels wrong to me (not saying it is wrong, though). What use is an index of words where the key is documents? You'd just open the document and search it then. Your example of inverted index is exactly what I'd think of as an actual index. It is analogous to the index of an actual book.

How do you think you find the contents of a document given the ID of a document? Using an index. You get the ID of a document given (part of) the contents of a document with an inverted index.

The MUMPSorceress
Jan 6, 2012


^SHTPSTS

Gary’s Answer

Steve French posted:

How do you think you find the contents of a document given the ID of a document? Using an index. You get the ID of a document given (part of) the contents of a document with an inverted index.

I guess I'm being too literal, because this definition, to me, sounds like a directory in a file system is an index, for example. I'd always learned of indices as telling me the names/locations of the document (or analogous thing) keyed on contents.

csammis
Aug 26, 2003

Mental Institution

LeftistMuslimObama posted:

I guess I'm being too literal, because this definition, to me, sounds like a directory in a file system is an index, for example. I'd always learned of indices as telling me the names/locations of the document (or analogous thing) keyed on contents.

I think that's too specific a way of thinking. Indexes in the broadest sense just tell you the location of a object based on a key (like a page in a book when given the key of a topic's name). When you're indexing relational database tables and you're using normal database indexes the object in question is a row in the database and the key would be the table's primary key. In a forward index the object is the document's contents and the key is the document's ID or name or whatever. An inverted index uses the tokens / contents of a document as multiple keys pointing at its ID.

So yeah you could think of the fully qualified path of a file being a key to finding that file, but not an index itself.

The MUMPSorceress
Jan 6, 2012


^SHTPSTS

Gary’s Answer

csammis posted:

I think that's too specific a way of thinking. Indexes in the broadest sense just tell you the location of a object based on a key (like a page in a book when given the key of a topic's name). When you're indexing relational database tables and you're using normal database indexes the object in question is a row in the database and the key would be the table's primary key. In a forward index the object is the document's contents and the key is the document's ID or name or whatever. An inverted index uses the tokens / contents of a document as multiple keys pointing at its ID.

So yeah you could think of the fully qualified path of a file being a key to finding that file, but not an index itself.

Would it be accurate to consider the full filesystem on disk to be an index, under this definition? In this case, being file contents indexed by document IDs?

csammis
Aug 26, 2003

Mental Institution

LeftistMuslimObama posted:

Would it be accurate to consider the full filesystem on disk to be an index, under this definition? In this case, being file contents indexed by document IDs?

There are some similarities in the particulars of implementation - many types of RDBMS indexes are built as b-trees - but in my opinion they're not that similar in concept. When a developer is talking about indexes they're almost certainly talking about indexes like they would about dictionaries. Given a key get this value, never mind how it's actually implemented because in the majority of cases it doesn't matter. SELECT * FROM DOCUMENTS WHERE ID = 1 does a lookup in a primary key index for the value 1 and returns the row found there. I don't think most developers think of filesystems like dictionaries. They're reasoned about like trees because they are trees* and they're accessed like trees. If someone was talking to me about filesystem indexes I'd think they were talking about search engines like Spotlight (which is an index of a filesystem but not the filesystem itself).




* Not literally every filesystem is a tree or a directed graph. Every so often someone proposes a filesystem that isn't hierarchical in structure and are modeled on more database-y concepts. WinFS was Microsoft's shot at such an idea but the project was cancelled. If someone says "filesystem" it's a pretty safe bet they're talking about a tree-based model until they smugly correct you.

TooMuchAbstraction
Oct 14, 2012

I spent four years making
Waves of Steel
Hell yes I'm going to turn my avatar into an ad for it.
Fun Shoe
On the other hand, as an abstraction you can think of the full path to a file being a key you can use to find that file. I've certainly had programs in the past where I mapped file paths to file objects.

csammis
Aug 26, 2003

Mental Institution
Right, like I said in my previous post :v:


And this is a great username / post combo

Plorkyeran
Mar 22, 2007

To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed

LeftistMuslimObama posted:

Would it be accurate to consider the full filesystem on disk to be an index, under this definition? In this case, being file contents indexed by document IDs?

I would consider a key requirement for something to be an index that you can regenerate it from the thing being indexed. This is not the case for a filesystem, as the files themselves do not store their names or locations. If you had some stupid filesystem where you could open files by name by iterating over every file and checking if it matches somehow, then the filename -> inode mapping would be an index.

The MUMPSorceress
Jan 6, 2012


^SHTPSTS

Gary’s Answer

csammis posted:

There are some similarities in the particulars of implementation - many types of RDBMS indexes are built as b-trees - but in my opinion they're not that similar in concept. When a developer is talking about indexes they're almost certainly talking about indexes like they would about dictionaries. Given a key get this value, never mind how it's actually implemented because in the majority of cases it doesn't matter. SELECT * FROM DOCUMENTS WHERE ID = 1 does a lookup in a primary key index for the value 1 and returns the row found there. I don't think most developers think of filesystems like dictionaries. They're reasoned about like trees because they are trees* and they're accessed like trees. If someone was talking to me about filesystem indexes I'd think they were talking about search engines like Spotlight (which is an index of a filesystem but not the filesystem itself).




* Not literally every filesystem is a tree or a directed graph. Every so often someone proposes a filesystem that isn't hierarchical in structure and are modeled on more database-y concepts. WinFS was Microsoft's shot at such an idea but the project was cancelled. If someone says "filesystem" it's a pretty safe bet they're talking about a tree-based model until they smugly correct you.

OK, so an index of a file system by document name would be something like "given a document name, tell me where in the file system it is", and a reverse index would be "given a word, tell me all the documents that contain this word", and you could use the two together to get the locations of all documents that contain a given word?

Dr Monkeysee
Oct 11, 2002

just a fox like a hundred thousand others
Nap Ghost
I feel like you're looking for some platonic ideal of these terms. The details of an index is context-specific. An index just means "given THIS I can look up THAT". It might be primary ids to table rows, file paths to file contents, urls to web pages, whatever.

In a context where an index means "given THIS give me THAT" an inverted index just means "given THAT give me THIS". What THIS and THAT are is entirely up to what it is you're trying to do.

csammis
Aug 26, 2003

Mental Institution
I typed up almost exactly what Dr Monkeysee said before I saw it was posted

The MUMPSorceress
Jan 6, 2012


^SHTPSTS

Gary’s Answer

Dr Monkeysee posted:

In a context where an index means "given THIS give me THAT" an inverted index just means "given THAT give me THIS". What THIS and THAT are is entirely up to what it is you're trying to do.

OK, this makes sense. I was just hung up on how "reverse index" was initially defined in the conversation, because that type of "reverse index" is what we just call an "index" in our software (that is, given a value for a field, give me all the records that have that value in that field).

MeruFM
Jul 27, 2010
Is it worth making a single-column index on a column that has very few possible states, like a app-level managed enum?

I know indexes give a bigger boost on larger cardinalities but what's the threshold?
When doing a query like "where state = 'A'" and there's only A-E, does indexing speed that up at all or only worth it if you do a lot of read queries on it?

ultrafilter
Aug 23, 2007

It's okay if you have any questions.


The performance boost from an index depends more on the size of the table and how it's laid out on disk than the number of values that it can take. Imagine that you have a table with a billion entries, and only five of them are A. If you don't have an index and your table isn't arranged according to the column you're interested in, then you have to scan the whole thing in order to do your query. If you have an index, you can just get those five entries with a lookup.

Sedro
Dec 31, 2008

ultrafilter posted:

The performance boost from an index depends more on the size of the table and how it's laid out on disk than the number of values that it can take. Imagine that you have a table with a billion entries, and only five of them are A. If you don't have an index and your table isn't arranged according to the column you're interested in, then you have to scan the whole thing in order to do your query. If you have an index, you can just get those five entries with a lookup.
And in that case you're better off creating a functional index on just the A's

22 Eargesplitten
Oct 10, 2010



I'm trying to work out a proof by induction of an algorithm I wrote. It's a shortest path algorithm that works a lot like a binary search tree, except it can have a lot more than two branches. It finds a path (that doesn't loop back), compares it to the length of the path found on the next branch, keeps the shortest, and keeps going like that until it gets all the way back up to the top. I know it works by experimentation, but I'm struggling to write a good induction statement. Here's what I've got:

Base: The shortest path from A->B is the shortest path between A and B.

Induction: The shortest path between any two points will also contain the shortest path between any two points on the path.

I'm also not really sure how to prove that. I know it's true, it makes sense, but I don't know how to prove it.

ultrafilter
Aug 23, 2007

It's okay if you have any questions.


You're gonna need to give a slightly more detailed description of the algorithm than that, preferably in pseudocode.

The MUMPSorceress
Jan 6, 2012


^SHTPSTS

Gary’s Answer
Start by trying to define your base and inductive cases in more concrete terms. You've described intuitively why your algorithm works, but your proof by induction needs actual concrete definitions to work.

In your base case, what you're trying to show is that the shortest path between Pi and Pi+1 is the edge directly linking them. Your recursive case is that for any node Pi and Pk, the shortest path between them is Pi and Pk-1 plus the path between Pk-1 and Pk.

The first case is trivially true. If you can prove the second case is true for some arbitrary pair of points and then also prove that its truth for one case implies its truth for every successive case, you've got your proof.

Dr. Stab
Sep 12, 2010
👨🏻‍⚕️🩺🔪🙀😱🙀

22 Eargesplitten posted:

Induction: The shortest path between any two points will also contain the shortest path between any two points on the path.


You want to prove that? Because that's easy.

Say you have a shortest path from A to C that passes through some arbitrary point B. Assume the path used from B to C is not shortest. That is, there exists some shorter path from B to C. Then we can construct a path from A to C through B using our shortest path from B to C that is shorter than the given shortest path from A to C, a contradiction.

JawKnee
Mar 24, 2007





You'll take the ride to leave this town along that yellow line

Dr. Stab posted:

You want to prove that? Because that's easy.

Say you have a shortest path from A to C that passes through some arbitrary point B. Assume the path used from B to C is not shortest. That is, there exists some shorter path from B to C. Then we can construct a path from A to C through B using our shortest path from B to C that is shorter than the given shortest path from A to C, a contradiction.

This is not a proof by induction though, unless I'm severely missing something

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

22 Eargesplitten posted:

I'm trying to work out a proof by induction of an algorithm I wrote. It's a shortest path algorithm that works a lot like a binary search tree, except it can have a lot more than two branches. It finds a path (that doesn't loop back), compares it to the length of the path found on the next branch, keeps the shortest, and keeps going like that until it gets all the way back up to the top.

So, enumerating all possible simple paths using depth-first search and taking the minimum? Well, whatever, you didn't claim it was a good algorithm.

You're going to have trouble stating an inductive principle that follows the recursion in your algorithm because your recursive calls don't work on strictly smaller problems because of the loop check. This is a general problem with graph algorithms. Usually, proofs on graphs have to build up something about the graph inductively, e.g. by inducting on the size of the graph or the set of edges.

22 Eargesplitten
Oct 10, 2010



Yeah it's an algorithm I wrote for a sophomore level class. I've already seen ways I could simplify it somewhat but nothing that would reduce the order from O((|V-1|)!).

So what you're saying is I should be trying to prove that all paths consist of individual edges, and what's true for edge AB will be true for edge BC on path ABC?

LeftistMuslimObama posted:

Start by trying to define your base and inductive cases in more concrete terms. You've described intuitively why your algorithm works, but your proof by induction needs actual concrete definitions to work.

In your base case, what you're trying to show is that the shortest path between Pi and Pi+1 is the edge directly linking them. Your recursive case is that for any node Pi and Pk, the shortest path between them is Pi and Pk-1 plus the path between Pk-1 and Pk.

The first case is trivially true. If you can prove the second case is true for some arbitrary pair of points and then also prove that its truth for one case implies its truth for every successive case, you've got your proof.

That makes a lot of sense as well. Now I just need to figure out how to prove it.

E: Okay, so this is what I've got.

P is the shortest path between the subscripted points. I was using S, but one of the pages I was reading used that for sum, and it seems like that's probably a standard notation for a sum function where sigma isn't available.

Pi,k = Pi,k-1 + Pk-1,k.

Base: P1,2 = P1,1 + P1,2. There is 0 distance between the same point, so P1,2 = 0 + P1,2. Therefore, P1,2 = P1,2.

Proof: Pi,k+1 = Pi,k + Pk,k+1. Subtract Pi,k from both sides, you get Pi,k+1 - Pi,k = Pi,k - Pi,k + Pi,k+1. Reduce, Pk,k+1 = Pk,k+1.

Does that seem right?

22 Eargesplitten fucked around with this message at 20:23 on Nov 8, 2015

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
It is really unclear what you're trying to prove there. I mean, that's the definition of path weight, yes.

You're trying to prove the correctness of your algorithm, right? So you're trying to prove that it produces a minimal path, which is to say, that there does not exist a path with lower weight. What I said last night might have been misleading — there is a sense in which your recursive calls work on a smaller problem, because of the overall effect of the ccle check. That will let you induct, if you can find it.

22 Eargesplitten
Oct 10, 2010



So I have to prove that my algorithm finds a correct path? It's basically establishing a path and then counting the distance as it returns. I wrote it up as pseudocode.

code:
int FindShortest
{
   While the current node is not the destination node
   {
     Mark this node visited
     For every unvisited node
     {
        FindShortest on the new node
        Add the returned distance and the distance between the current node and unvisited node (the distances are stored in an array).
        Compare that distance with the previous shortest distance
        If the returned distance is shorter, that's now the new shortest distance.
      }
      Return the shortest distance
   }
   If it was the destination node, return 0
}
So my base case is that the two nodes are the same, distance = 0. That has to be the shortest possible route, because it's not possible to have negative distances. Then the 0 gets returned, and has the distance between the returned node and the current node added to it. Then that distance is compared with the other distances, and the shortest is returned. I've been staring at this post for 20 minutes, and I'm not sure what I need to prove. Comparing the distances to find the shortest distance will return the shortest distance. Sending the shortest distance to be compared with other short distances will return the shortest distance. But I don't see how to put it mathematically.

22 Eargesplitten fucked around with this message at 00:51 on Nov 9, 2015

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

22 Eargesplitten posted:

So I have to prove that my algorithm finds a correct path?

That's certainly what "a proof [...] of an algorithm" means, as you put it in your first post, and it's what we've all been assuming you're trying to prove.

I get the algorithm you're trying to describe, and there definitely is a correct algorithm like that. Like I said before, it's just a depth-first search to enumerate all paths. But for what it's worth, your pseudo-code is imprecise about something important (what happens if there are no edges from the current node to an unvisited node?) and slightly wrong about how it tracks visitation.

It sounds like you're either not sure what to induct over, or not sure how to invoke induction. The latter is easy: you just say "here I have a call to FindShortest on <a problem that's smaller in some way>, and by the inductive hypothesis I know that it gives me a correct answer to that problem, thus when I use that result, ...". But the trick is that it does need to be a smaller problem in some definable way that you can legitimately induct over, and you need to be conscious of the implicit "parameters" to your function, like this ability to mark nodes as visited that appears out of nowhere in the middle of your pseudo-code.

Adbot
ADBOT LOVES YOU

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



Sedro posted:

And in that case you're better off creating a functional index on just the A's

How does that help in that case? I've never seriously used the two databases that support functional indexes, so this is a new thing to me.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply