Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
csammis
Aug 26, 2003

Mental Institution

Sedro posted:

And in that case you're better off creating a functional index on just the A's

That presumes the table with a billion rows is only ever being queried for the 'A's. Without knowing a larger set of use cases for the index it's probably not a good idea to just blanket recommend functional-on-a-subset indexing.

Adbot
ADBOT LOVES YOU

MeruFM
Jul 27, 2010

Munkeymon posted:

How does that help in that case? I've never seriously used the two databases that support functional indexes, so this is a new thing to me.

Since it's only indexing values of the function, it's only useful for subsets of that function but as a result much smaller. Not sure about its effect on insertion speed

re original answer, thanks for the info. I was unsure if enum columns naturally just sort so a binary search would have been good enough to grab all 5 in a theoretical 500k lines or whatever.

Tea Bone
Feb 18, 2011

I'm going for gasps.
I'm sure this must have a simple solution but I just can't seem to wrap my head around how to phrase the query.

I have a database for a booking system with columns 'start_date' and 'end_date' (stored as unix timestamps). On the front end of the software the user selects a month and the booked days in that month are pulled from the database. At the moment it looks like this "SELECT * from bookings where [1st of the month] <= start_date AND [last of month] >= end_date" Which kind of works but we get a problem if the booking crosses a month, for example if the start date is 30th of October and the end date is 2nd of November, if we're looking at November, obviously the start_date falls outside the query.

TooMuchAbstraction
Oct 14, 2012

I spent four years making
Waves of Steel
Hell yes I'm going to turn my avatar into an ad for it.
Fun Shoe

Tea Bone posted:

I'm sure this must have a simple solution but I just can't seem to wrap my head around how to phrase the query.

I have a database for a booking system with columns 'start_date' and 'end_date' (stored as unix timestamps). On the front end of the software the user selects a month and the booked days in that month are pulled from the database. At the moment it looks like this "SELECT * from bookings where [1st of the month] <= start_date AND [last of month] >= end_date" Which kind of works but we get a problem if the booking crosses a month, for example if the start date is 30th of October and the end date is 2nd of November, if we're looking at November, obviously the start_date falls outside the query.

So the user wants to see all days in a given month that are booked, and you may have bookings that start before the month or end after it but still have booked days during it? You need to do a basic interval intersection test. You have two ranges: [A1, B1] is the month you're looking at, and [A2, B2] are the start and end dates of a given booking. The easiest way to do this kind of test is to check for the situations where the two intervals don't intersect, which are fairly easy to identify. If B2 < A1, then the booking ends before the month starts. If A2 > B1, then the booking starts after the month ends. Any other combination results in an intersection -- go ahead and imagine the possible arrangements of A2 and B2 with respect to A1 and B1. There aren't that many possibilities.

22 Eargesplitten
Oct 10, 2010



rjmccall posted:

That's certainly what "a proof [...] of an algorithm" means, as you put it in your first post, and it's what we've all been assuming you're trying to prove.

I get the algorithm you're trying to describe, and there definitely is a correct algorithm like that. Like I said before, it's just a depth-first search to enumerate all paths. But for what it's worth, your pseudo-code is imprecise about something important (what happens if there are no edges from the current node to an unvisited node?) and slightly wrong about how it tracks visitation.

It sounds like you're either not sure what to induct over, or not sure how to invoke induction. The latter is easy: you just say "here I have a call to FindShortest on <a problem that's smaller in some way>, and by the inductive hypothesis I know that it gives me a correct answer to that problem, thus when I use that result, ...". But the trick is that it does need to be a smaller problem in some definable way that you can legitimately induct over, and you need to be conscious of the implicit "parameters" to your function, like this ability to mark nodes as visited that appears out of nowhere in the middle of your pseudo-code.

Sorry, I always seem to either make pseudocode almost exactly code or too abstracted. My problem is I'm not sure what to induct on. I think my base case is going to be that if the start is the end, the distance will be zero, and zero has to be the shortest. But I'm not sure where to go from there.

Is it possible to use multiple equations for one algorithm's proof? In class we always proved very simple algorithms.

I'm not asking for you to tell me what I need to do, I get that you're trying to have me figure it out, and I appreciate that. I just feel like I'm missing something important.

To my shame, everything is adjacent to everything. I need to rewrite the poo poo out of this program. I see a ton of ways to make it quicker / easier to read.

I guess I didn't put in a case for handling there being no unused nodes because of that. Eventually, its going to hit the correct node before it runs out.

Steve French
Sep 8, 2003

Tea Bone posted:

I'm sure this must have a simple solution but I just can't seem to wrap my head around how to phrase the query.

I have a database for a booking system with columns 'start_date' and 'end_date' (stored as unix timestamps). On the front end of the software the user selects a month and the booked days in that month are pulled from the database. At the moment it looks like this "SELECT * from bookings where [1st of the month] <= start_date AND [last of month] >= end_date" Which kind of works but we get a problem if the booking crosses a month, for example if the start date is 30th of October and the end date is 2nd of November, if we're looking at November, obviously the start_date falls outside the query.

Have fun with the case where you're looking at dates in November and there exists a booking from October to December.

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
If this code is actually in use, and you have a complete graph so the factorial explosion is real, then you should really, really just implement Dijkstra's or some other non-exponential shortest path algorithm. But let's try to work out the proof of this one for now.

You can't induct over the length of the minimal path. One way to see that that can't possibly work is to mentally flip the induction around: if you start with the base case, and apply the induction step iteratively, can you get to all possible graphs, or do you just get some arbitrary, unknowable subset of graphs?

I think the problem you're having is that you keep thinking that you have to prove the solution correct, and so you can prove something by the inducting in the structure of the solution. That's not how it works. You have to prove that your algorithm works on an arbitrary graph, which means you have to induct somehow on the graph you start with. That lines up well with the fact that it's a recursive algorithm; if you can show that the recursive call works on a strict sub-problem, then induction will tell you that it works correctly on the sub-problem, and you can use that fact together with the local behavior of your algorithm to show that it works correctly on the larger problem. So what is it about your recursive calls that makes them operate on a strict sub-problem?

22 Eargesplitten
Oct 10, 2010



I'm not sure what you mean by strict sub-problem. Do you mean it's a smaller part of the whole problem because it calls the same function, but with a graph one node smaller?

If I understand you right, you're saying that I can't prove the shortest paths starting from the destination, because we don't know what the other paths hold. Is that correct? That was the conclusion I was coming to.

Should I be assuming that the recursive call brings back the correct distance, and then finding the smallest distance based off of those assumed correct numbers and the calculations in the next step?

E: this isn't in use, this was just a class project. The more I look at it, the less I like it, but this was one of my most ambitious projects. I'm having library conflicts on VS 2015, so I might just rewrite it completely.

22 Eargesplitten fucked around with this message at 21:03 on Nov 9, 2015

nielsm
Jun 1, 2009



For the recursive case you have to prove that:
1. It makes the problem space smaller in a meaningful way (i.e. fewer nodes and/or edges)
2. That the reduction in problem space will eventually cause any graph to reduce to the base case
3. That the recursive step chooses the correct solution to the sub-problem

The MUMPSorceress
Jan 6, 2012


^SHTPSTS

Gary’s Answer

nielsm posted:

For the recursive case you have to prove that:
1. It makes the problem space smaller in a meaningful way (i.e. fewer nodes and/or edges)
2. That the reduction in problem space will eventually cause any graph to reduce to the base case
3. That the recursive step chooses the correct solution to the sub-problem

Addendum: when proving the recursive call, it is generally acceptable to prove "assuming that the previous/next (depending on how you model it in your head) call returned the correct solution, the current recursive call returns the correct solution". Combined with proving the base case, you can then prove that each recursive call is necessarily correct based on the results of the previous one, with the most previous one being the base case.

So, the biggie here is that it doesn't sound like your algorithm does number 1 in nielsm's list.

22 Eargesplitten
Oct 10, 2010



It ignores more and more of the nodes as it goes on, which seems similar. I'm going to see if I can change it for each instance to delete them and fill in with the one previously at the end, though. Like the problem in the newbie thread OP.

I'll see if I can figure it out from here. Can I use multiple equations for different parts of the algorithm, or do they have to be unified?

The MUMPSorceress
Jan 6, 2012


^SHTPSTS

Gary’s Answer

22 Eargesplitten posted:

It ignores more and more of the nodes as it goes on, which seems similar. I'm going to see if I can change it for each instance to delete them and fill in with the one previously at the end, though. Like the problem in the newbie thread OP.

I'll see if I can figure it out from here. Can I use multiple equations for different parts of the algorithm, or do they have to be unified?

Your equations should represent what the algorithm is doing. You should have one for your base case and one for your recursive case. If you can't define a single equation for all of your recursive calls, how do you induct on it?

nielsm
Jun 1, 2009



22 Eargesplitten posted:

I'll see if I can figure it out from here. Can I use multiple equations for different parts of the algorithm, or do they have to be unified?

I'm mostly familiar with equations being used for time or space complexity proofs, not correctness proofs. Sure use equations if they make expressing the proof simpler, but overall I'd expect the proof to be textual.

The MUMPSorceress
Jan 6, 2012


^SHTPSTS

Gary’s Answer

nielsm posted:

I'm mostly familiar with equations being used for time or space complexity proofs, not correctness proofs. Sure use equations if they make expressing the proof simpler, but overall I'd expect the proof to be textual.

The way I learned it, you should try to express your recursive case as an equation so you can substitute the smaller case into the larger one to prove that the smaller case being correct necessitates the larger case being correct. They're not strict equations, but more representing what's happening in the algorithm using variables with subscripts that represent what step they attained their values at.

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe
That kind of formalism works amazingly for really simple proofs, but is very awkward to maintain for a proof of any sophistication without the help of an actual proof environment, which has other drawbacks. Also, the predicate here is not really an equation.

Anyway, if you get the core insight that the visitation set means you're effectively recurring on a graph with one fewer node in it, you can just argue (convincingly) that the set has that effect, and then proceed to make the rest of the argument as if the graph were actually smaller; you don't need to literally copy the graph every time just to make the proof easier.

22 Eargesplitten
Oct 10, 2010



Okay. In that case, I will just do a written proof, I find that a hell of a lot easier. I thought that wasn't considered to be a real proof.

Tea Bone
Feb 18, 2011

I'm going for gasps.

TooMuchAbstraction posted:

So the user wants to see all days in a given month that are booked, and you may have bookings that start before the month or end after it but still have booked days during it? You need to do a basic interval intersection test. You have two ranges: [A1, B1] is the month you're looking at, and [A2, B2] are the start and end dates of a given booking. The easiest way to do this kind of test is to check for the situations where the two intervals don't intersect, which are fairly easy to identify. If B2 < A1, then the booking ends before the month starts. If A2 > B1, then the booking starts after the month ends. Any other combination results in an intersection -- go ahead and imagine the possible arrangements of A2 and B2 with respect to A1 and B1. There aren't that many possibilities.

Great, thanks! This is one of those things we take for granted how easy it is for a human to work out.

Steve French posted:

Have fun with the case where you're looking at dates in November and there exists a booking from October to December.

Yeah I know. There shouldn't be that many cases where this happens but there's a possibility I guess.

LP0 ON FIRE
Jan 25, 2006

beep boop
I have one table missing from the sidebar on phpMyAdmin. How do I get it back? I can still access the table if I click on the database and go to the list of tables.

meatbag
Apr 2, 2007
Clapping Larry
Does anybody know of any good books to teach me a better programming mindset? I just returned to university to take a bachelors in informatics, and I think it's really interesting. We're learning Java in the introductory class, going to learn C in two semesters.

My problem is that I am coming from a political science background, and I don't really know much math. My programming assignments tend to work, just be really convoluted and impractical. It's a weird request, but I feel like I can't quite grasp how a good program should be constructed.

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe
It's not a strange request, we've all been there. I'm not aware of any material for this -- it just took me a lot of practice and thinking. If you have specific questions about your thinking or your code, sometimes it can help to ask how we would solve it.

TooMuchAbstraction
Oct 14, 2012

I spent four years making
Waves of Steel
Hell yes I'm going to turn my avatar into an ad for it.
Fun Shoe
Honestly this isn't something that I think most people explicitly train in. It's just a matter of a) practice, b) working with well-written codebases (as an example), c) working with shittily-written codebases (as a counter-example), and d) getting advice from others (who hopefully know better than you do).

And I think the counter-examples are at least as important as the examples. Having to debug code written by someone who uses single-letter variable names (for things other than loop iterators/coordinates) is a great way to teach you how important descriptive variable names are. Having to manually trace where some library call comes from because the previous person abused namespaces / did "import *" / etc. will teach you how important proper namespacing is. Extending well-written code is obviously, vastly easier than extending badly-written code.

"Fortunately", if you ever get a job in software development, bad examples are extremely easy to come by...

Bob Morales
Aug 18, 2006


Just wear the fucking mask, Bob

I don't care how many people I probably infected with COVID-19 while refusing to wear a mask, my comfort is far more important than the health and safety of everyone around me!

meatbag posted:

Does anybody know of any good books to teach me a better programming mindset? I just returned to university to take a bachelors in informatics, and I think it's really interesting. We're learning Java in the introductory class, going to learn C in two semesters.

My problem is that I am coming from a political science background, and I don't really know much math. My programming assignments tend to work, just be really convoluted and impractical. It's a weird request, but I feel like I can't quite grasp how a good program should be constructed.

Find books on programming patterns and things like Code Complete. Helping you name variables and functions, eliminating redundant code, and logically grouping/ordering things is a huge improvement.

There's also the 'read other peoples code', but if you don't know what you're looking for it can be kind of pointless. Maybe find someone who can help point stuff out.

Cryolite
Oct 2, 2006
sodium aluminum fluoride
How do people generally commit stuff to GitHub without including sensitive information like API keys that could be a big problem if they become public?

For some of my projects I've been stuffing sensitive keys/passwords into a "keys.json" file that I specify in my .gitignore, and then reading data out of that file when the application starts up.

Is there a better way of doing this, or some industry-preferred method? It seems pretty ad-hoc.

nielsm
Jun 1, 2009



Cryolite posted:

How do people generally commit stuff to GitHub without including sensitive information like API keys that could be a big problem if they become public?

For some of my projects I've been stuffing sensitive keys/passwords into a "keys.json" file that I specify in my .gitignore, and then reading data out of that file when the application starts up.

Is there a better way of doing this, or some industry-preferred method? It seems pretty ad-hoc.

Have the software look for a configuration file. Include a template configuration file, under a different name, in the repository. Add the name of the actual config file, the user/developer should use, to .gitignore.

Peristalsis
Apr 5, 2004
Move along.

meatbag posted:

Does anybody know of any good books to teach me a better programming mindset? I just returned to university to take a bachelors in informatics, and I think it's really interesting. We're learning Java in the introductory class, going to learn C in two semesters.

My problem is that I am coming from a political science background, and I don't really know much math. My programming assignments tend to work, just be really convoluted and impractical. It's a weird request, but I feel like I can't quite grasp how a good program should be constructed.

In addition to what others said, keep two more things in mind:
1) The "right" approach depends a lot on the project and who is (and will be) working on it. For example, over-engineering a small, simple app with every design pattern you can shoehorn into it is not a good thing. Also, you want your code to be maintainable by whoever comes after you, and if those people are almost certain to be idiots, you'll be doing them a favor if you avoid clever tricks and subtle optimizations. Likewise, computationally intensive apps may have to value optimization above code elegance.
2) Your tastes and preferences will evolve during your career. Even if you make a perfectly valid set of choices for an app 3 years into your first job, you might find that you take a completely different approach if you do the exact same project 15 years later. Neither is necessarily right or wrong, but the more cool things you read about, the more of them you'll want to try out, and different ones appeal to you more at different times.

For now, I suggest trying to get things done in your programs as simply as possible. Don't worry about optimizing performance (as long as the programs are usable), just worry about not making stuff harder than it needs to be - that's challenging enough. If your instructors don't go over actual code in class, find ways to compare what you do on your assignments with others, or go over your program structures with a TA. Something that looks like an esoteric trick in lecture becomes pretty easy to remember and internalize when you realize it could have saved you 12 hours of coding and testing the more roundabout route you took. Ultimately, it's just a matter of experience to be able to determine when going down a chain of annoying detours and indirections is due to a bad approach, and when it's necessary because of the complexity of the program.

And most of the iconic software development books that people talk about are going to be pretty hard to digest when you don't have a good number of classes (and maybe some real-world experience) under your belt. Don't let that discourage you, just understand that reading the Gang of Four book on design patterns is going to be a pretty frustrating experience until you have the background to understand the structures they describe (which can be pretty abstract), and the experience (i.e. large projects that failed horribly due to lack of a coherent overall structure) to appreciate why design patterns are useful.

Sex Bumbo
Aug 14, 2004
E:nm

Sex Bumbo fucked around with this message at 19:16 on Nov 12, 2015

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

Cryolite posted:

How do people generally commit stuff to GitHub without including sensitive information like API keys that could be a big problem if they become public?

For some of my projects I've been stuffing sensitive keys/passwords into a "keys.json" file that I specify in my .gitignore, and then reading data out of that file when the application starts up.

Is there a better way of doing this, or some industry-preferred method? It seems pretty ad-hoc.

That's a fine way. Another is to read your stuff from environment variables.

JawnV6
Jul 4, 2004

So hot ...
I haven't used it, but this purports to address that issue: https://code.sealedabstract.com/drewcrawford/FISA

hooah
Feb 6, 2006
WTF?
I've got a Matlab function that I want to test for a range of input parameters. I'd like to store the 2-dimensional result as elements in a multi-dimensional array. That is, if there are three parameters for the function, I create a 3-dimensional array, each element of which is a 2-dimensional array. Is there a way to do that, or will I have to add a couple dimensions to my storage array and then try to figure that out on the other end?

meatbag
Apr 2, 2007
Clapping Larry
Thanks for the advice guys :)

gariig
Dec 31, 2004
Beaten into submission by my fiance
Pillbug

meatbag posted:

Thanks for the advice guys :)

If you have time I highly suggest redoing older assignments but not the one you just did. Also, don't look at your old code. Start from scratch and see if you come up with a better way of doing the assignment. If you can find someone to talk to do that.

For books I liked Pragmatic Programmer but it's hard to suggest when you are just getting started. It's more for someone with a little experience under their belt. I'd stick it in your Amazon wishlist and come back to it later. For now, write lots and lots of code. Try working ahead in your coursework or find another book about the languages you are using or will be using.

Bob Morales
Aug 18, 2006


Just wear the fucking mask, Bob

I don't care how many people I probably infected with COVID-19 while refusing to wear a mask, my comfort is far more important than the health and safety of everyone around me!

Are there some tutorials that explain how to do high-availabilty stuff? Not at the 'cloud' level but how you would manually implement it.

Imagine this, a file uploading service, with a web front end that runs on 2 servers in case one gets busy or goes down. You upload a file and then it gives you the URL you can then access the file from. I'm imagining a load balancer handling these front end servers.

Now behind the scenes, whichever of the 2 servers gets the uploaded file, writes it out to say 3 file servers, and stores it in some redundant pattern. Perhaps splitting it up into chunks so that if one of the servers goes down you can read it from the other two.

All the while there's a database that tracks what files are where etc.

What is all that poo poo called and how do I learn more about it?

Also, what's the best way to spin up a couple VM's to play around with this instead of buying a bunch of cheap VPS's from some crappy provider on LowEndBox?

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.

Cryolite posted:

How do people generally commit stuff to GitHub without including sensitive information like API keys that could be a big problem if they become public?

For some of my projects I've been stuffing sensitive keys/passwords into a "keys.json" file that I specify in my .gitignore, and then reading data out of that file when the application starts up.

Is there a better way of doing this, or some industry-preferred method? It seems pretty ad-hoc.

This sounds like a problem for Consul or Zookeeper. Maybe it's time to hire some devops?

the talent deficit
Dec 20, 2003

self-deprecation is a very british trait, and problems can arise when the british attempt to do so with a foreign culture





Cryolite posted:

How do people generally commit stuff to GitHub without including sensitive information like API keys that could be a big problem if they become public?

For some of my projects I've been stuffing sensitive keys/passwords into a "keys.json" file that I specify in my .gitignore, and then reading data out of that file when the application starts up.

Is there a better way of doing this, or some industry-preferred method? It seems pretty ad-hoc.

encrypt it: https://github.com/mozilla/sops

nielsm
Jun 1, 2009



Bob Morales posted:

What is all that poo poo called and how do I learn more about it?

Sounds mostly like traditional Distributed Systems stuff, you should be able to find at least 20 years worth of literature on it.

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe

Bob Morales posted:

Are there some tutorials that explain how to do high-availabilty stuff? Not at the 'cloud' level but how you would manually implement it.

Imagine this, a file uploading service, with a web front end that runs on 2 servers in case one gets busy or goes down. You upload a file and then it gives you the URL you can then access the file from. I'm imagining a load balancer handling these front end servers.

Now behind the scenes, whichever of the 2 servers gets the uploaded file, writes it out to say 3 file servers, and stores it in some redundant pattern. Perhaps splitting it up into chunks so that if one of the servers goes down you can read it from the other two.

All the while there's a database that tracks what files are where etc.

What is all that poo poo called and how do I learn more about it?

Also, what's the best way to spin up a couple VM's to play around with this instead of buying a bunch of cheap VPS's from some crappy provider on LowEndBox?

We've reinvented this a bunch and given it a bunch of different names. At one point it was called "distributed systems". Then we started calling it "service-oriented-architecture" for some reason.

You can be as high-tech or low-tech about this as you want. The extremely low-tech solution would be to install HAProxy or nginx or some tool that lets you load balance requests between two other systems.

Your white-label MegaUpload clone would talk to some distributed file server to store data. Again, you could go as dumb as NFS + RAID, or you could go with OpenStack Swift or a CDN as your object store.

If you want your applications to restart even when they're done, then you need something which does "cluster orchestration", which is an extremely fancy and vague term for "keeping more than two computers working together". Mesos seems to be a popular tool for this, but CoreOS has their Fleet, Docker has their Docker-Swarm, Google has Kubernetes, and AWS has whatever product that nobody uses.

Reminder that a company here usually starts in some position and then wants to capture all of the market. Docker, CoreOS, Hashicorp, Mesosphere are all major players in the "cluster space", and Google, Rackspace and Red Hat are trying their damnedest. I have no idea who's winning. Nobody does.

And all of them get swamped by the occasional OSS dump from Netflix, Airbnb, Pinterest, Twitter or LinkedIn.

It's now my job to do "cloud poo poo" and trying to find what's going on is nearly impossible. We at work decided on Mesos/Marathon for scheduling/deployment, Ansible for the initial provision of the Mesos slaves, HAProxy/marathon-bridge (but maybe Synapse eventually! Who knows? We might want to run stuff outside of Marathon at some point!) for service discovery, and then AWS/EC2 for running the drat thing.

the talent deficit
Dec 20, 2003

self-deprecation is a very british trait, and problems can arise when the british attempt to do so with a foreign culture





Bob Morales posted:

Are there some tutorials that explain how to do high-availabilty stuff? Not at the 'cloud' level but how you would manually implement it.

yes, but you're asking two different questions here

quote:

Imagine this, a file uploading service, with a web front end that runs on 2 servers in case one gets busy or goes down. You upload a file and then it gives you the URL you can then access the file from. I'm imagining a load balancer handling these front end servers.

this is easy

you stand up multiple stateless web front ends behind something like HAProxy or AWS ELB. where i work all our services run on either the jvm or the erlang vm and are trivial to jail so we don't use docker or any other containers. we just use AWS autoscale groups and AWS codedeploy for provisioning. some autoscale groups contain a single application, some contain more. each application gets it's own AWS ELB. this kind of stuff is usually called 'devops' these days

quote:

Now behind the scenes, whichever of the 2 servers gets the uploaded file, writes it out to say 3 file servers, and stores it in some redundant pattern. Perhaps splitting it up into chunks so that if one of the servers goes down you can read it from the other two.

All the while there's a database that tracks what files are where etc.

What is all that poo poo called and how do I learn more about it?

this is hard

first you have to learn about consistency and availability. most people have heard of the CAP theorem (you can only get two of consistency, availability and partition tolerance, and you have to pick partition tolerance) but almost no one can define consistency or availability as they are used in the CAP theorem. consistency as used in CAP means a very specific type of consistency where there is one and only one possible history of operations (writes and reads) that all participants (servers and clients) share. availability means that if a server has not failed (and usually that means failed as in stopped completely) any client must be able to get a reply to any request and the server isn't allowed to reply 'try later' or 'sorry not right now'. if you relax these constraints, you can cheat CAP

systems like cassandra, zookeeper, postgres, s3, et cetera all make different tradeoffs. cassandra provides consistent transactions with availability, zookeeper provides consistency with an option to forgo it when availability is important stale data is tolerable. postgres has a dial you can turn from from full availability to full consistency depending on needs at the moment. s3 is available with only vague promises of eventual consistency

once you learn about consistency and availability (this is a really good starting point) and you figure out what you need and what you can actually get implementation of solutions usually comes down to grabbing the best fit off the shelf, or, if you're ambitious, reading a bunch of research papers by peter bailis, leslie lamport, fb schneider, marc shapiro and eric brewer. chris meiklejohn has a fantastic list here

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe
Also, even though it's not done yet, I've actually gotten quite a lot of mileage out of Designing Data-Intensive Applications.

zfleeman
Mar 12, 2014

I wonder how you spell Tabasco.
The web design/development thread must be on DB3, or something, so I'm going to ask my potentially asinine question here.

I run a headless server in my basement that hosts my low-traffic websites/low-traffic every other project I work on. It's dinky and held together with paperclips and duct tape, but it has worked for me for the most part for many years. Well, I'm kind of fed up with random outages and router+AT&T modem issues, so I'm looking to migrate a lot of the things I do on my server to some hosting service. My roommate suggested that I use a LAMP instance with AWS, and even though that EC2 service is free for 12 months, it kind of sounds like overkill for what I need it to do: host a couple simple websites I have built with HTML and PHP. I also need to upload four JPEGs a minute via FTP for my security camera.

I'm not super well-versed on what AWS can provide me. Most, if not all of my knowledge in this realm is self-taught without any formal training, so forgive me if I'm being incredibly ignorant. I think AWS would get the job done for what I need it to do, but if there is a cheaper long-term solution to my very simple hosting needs, I'm all ears.

TLDR: I need a cheap/low-maintenance host. Any suggestions?

Adbot
ADBOT LOVES YOU

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



zfleeman posted:

TLDR: I need a cheap/low-maintenance host. Any suggestions?

Check in http://forums.somethingawful.com/showthread.php?threadid=3289126 ?

You'd probably be fine with a cheap shared host.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply