p-lang thread: (now (have you (problems two)))

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > YOSPOS > p-lang thread: (now (have you (problems two)))

«‹›1784 »

MononcQc: May 29, 2007

Shaggar posted:

actually, out of curiosity how are you collecting their logs right now? do you hijack common logging libs and/or provide them an official heroku log api?

Stdout (or whatever) data is piped to individual syslog services, which then push it to the router. Our router basically aggregates various standard syslog transports from different components in the system (app itself, routing logs, DB logs, etc.) and re-routes them to user-configured endpoints. These can be the command line 'heroku' thing (lets you see the recent ones, tail them live), done over an API call, you can ask the router to redirect the log streams to whatever endpoints you want (TCP Syslog, Syslog/HTTPs if you ask for it), in any combination whatsoever.

# ? Sep 17, 2013 18:50

Adbot: ADBOT LOVES YOU

# ? Jun 11, 2024 15:11

Shaggar: Apr 26, 2006

ah, so you're looking for a way to handle the redirect to a custom endpoint and what format and protocol that should look like?

# ? Sep 17, 2013 18:53

Shaggar: Apr 26, 2006

if that's the case i think sticking w/ syslog is fine cause everything has a way to parse syslog. if you want to cleanup the stuff before it you're looking at a project to develop a common logging library for all hosted languages

# ? Sep 17, 2013 18:55

MononcQc: May 29, 2007

Shaggar posted:

ah, so you're looking for a way to handle the redirect to a custom endpoint and what format and protocol that should look like?

Pretty much. Multiple endpoints, in fact. You could have 4 TCP-Syslog endpoints that go to different service and an HTTP one to your own special sauce thingy, while still using the CLI tools to read them live. Or you could have none and just shut it all down.

The HTTPS pipe was added for specific users who felt their logs were somewhat sensitive and didn't feel safe sending them over TCP across different data centers. TCP/TLS syslog could have been used, but at this point I have no idea what went behind the decision making as it happened way before I joined.

Shaggar posted:

if that's the case i think sticking w/ syslog is fine cause everything has a way to parse syslog. if you want to cleanup the stuff before it you're looking at a project to develop a common logging library for all hosted languages

Yeah afaik, they went with syslog especially because everything uses and supports it.

# ? Sep 17, 2013 18:56

Notorious b.s.d.: Jan 25, 2003; by Reene

MononcQc posted:

It's hairy for the custom logging protocol to be built on HTTP, but it shares some things for practical reasons. Syslog messages are individually stateless, and HTTP shares that. Syslog APIs are explicitly client-server, because that's how logs are produced and consumed, and HTTP shares that.

These considerations haven't been chucked out for our bizarro logging meta-protocol.

Interoperation is a given problem because nobody else is using it except those who ask for such features and consumers. At best, the format is trivial and uses HTTP as a wrapper, and optionally attaches some metadata (that can safely be ignored). By default we push for the RFC-standard TCP-Syslog implementation, though, and the bizarro HTTP(s) one is only given on-demand if need arises.

if this is the case, where did your gigantic insane list of bullet points come from?

because the system you just described is way simpler and probably inadequate to the requirements you specified earlier

# ? Sep 17, 2013 18:58

fritz: Jul 26, 2003

tef posted:

still, if your dataset fits in memory or a single disk then hadoop isn't likely going to be cost effective

otoh let's get a whole bunch of machines, install hdfs, then a job manager to run over a dataset in parallel, which each step reading from and serializing to disk

yeah if you can fit that poo poo into memory you load it up and run w/ it

# ? Sep 17, 2013 19:07

Notorious b.s.d.: Jan 25, 2003; by Reene

fritz posted:

yeah if you can fit that poo poo into memory you load it up and run w/ it

Don't underestimate what fits in memory.

You can get an x86 box with 3.0 TB of RAM now. That box is 3 TB of RAM + 40 cores inside 5 rack units. Pretty soon it will be 3TB + 48 or 64 cores.

# ? Sep 17, 2013 19:11

MononcQc: May 29, 2007

Notorious b.s.d. posted:

if this is the case, where did your gigantic insane list of bullet points come from?

because the system you just described is way simpler and probably inadequate to the requirements you specified earlier

These can be user demands or requirements depending on their apps and use cases.

The system I just described uses multiple protocols to support all/most of the bullet points from my list on a per-demand basis depending on what each customer needs. You can use whatever. Here's the breakdown and how these 2-3 transport formats manage to work right:

quote:

maintain long-lived connections (reconnecting is always slow)

This is supported both by HTTP(s) (1.1 has keepalive by default) and TCP by design.

allow for multiple sessions/connections between any two endpoints (i.e. a service provider should be able to receive data on behalf of many different users from the same node)

This is supported by TCP on a per-connection basis, and can be done with HTTP too given the information is log-specific. Still we open a connection per app, so that their buffers are not being shared in software.

allow processing timeouts (infinite timeouts are a bad thing)

This can be done both ways with any kind of timeout on send and waiting for an ack or a response. Because Syslog RFC5424 works with UDP also, we can timeout only client-side (the router in this case) and declare a loss, kill the connection if we're really unsure and it were to have a bad effect, and start again.

support control flow mechanisms (we can drop messages, but must be able to know about it to report the losses)

Any synchronous protocol can do it. This makes UDP syslog a non-starter for our use case though, because without that synchronous behaviour, we can't know if a message was dropped and report on it.

allowing for retries/retransmissions is a good plus, but not vital (we turn ambiguous case in reported losses)

TCP syslog doesn't do this, because individual messages following a disconnection are lost. HTTP(s) allows us to do it because you have a reply/response pattern and you can add metadata specifically added and documented to mention it. TCP/TLS wouldn't be able to do it. The retry feature is useful for low-traffic log channels that have important information you do not want to drop, and do not want duplicated. We can work towards an 'at most once' mechanism for that.

batching is useful given the nature of logs (throughput > low latency)

TCP-Syslog supports batching over a single frame if messages allow it. HTTP(s) allows to use content-length to send a single bigger message while still supporting retransmission.

optionally encrypt data for safe transmission over unsecured networks

HTTPS does it. TCP/TLS would.

be supported by as many platforms and languages as possible (who gives a poo poo about a thing nobody can use)

HTTPS does it very well, and you will not have any problem with any single middleman. Arguable for TCP/TLS when it comes to middlemen, but app support should be there.

be conceptually simple enough, for the same reasons as above

It is somewhat simple. It's Syslog over HTTP(s). Otherwise TCP-Syslog is a well known standard

the protocol should be extensible, but at this point this is just a plus

Headers are extensible. They're not the best at it, but they can do an okay job in a pinch. TCP-Syslog isn't extensible.

can unambiguously represent string data -- encoding can be agreed upon if standard, though

Either defined by an RFC in TCP-Syslog, or through content-encoding values in HTTP(s)

compression is a plus, but not necessary

Supported by HTTP(s)

be CPU-efficient so that a single user doesn't make CPUs spin 100%
be memory-efficient for the same reason
be bandwidth-efficient, for the same reason

TCP Syslog is efficient enough. HTTP is arguable, depends on your IO pipeline and compression. HTTPS isn't the best due to crypto. If we can manage to send most data over TCP-Syslog, we are good to go. Which we do.

So depending on if you really want security, or things like retries before dumping messages for lower-level apps, you can ask and use a specific protocol that supports that feature better.

You don't implement all of these in a single protocol, you give a choice and let individual users make decisions based on what they need once they feel dissatisfied with the standard option (TCP-Syslog).

I have then been told by this thread that giving user choices is a dumb thing and I should be ashamed and worked on problems better, and that the list of requirements above was an horror despite us being able to do an acceptable job at dealing with them in production.

MononcQc fucked around with this message at 19:19 on Sep 17, 2013

# ? Sep 17, 2013 19:15

fritz: Jul 26, 2003

Notorious b.s.d. posted:

Don't underestimate what fits in memory.

You can get an x86 box with 3.0 TB of RAM now. That box is 3 TB of RAM + 40 cores inside 5 rack units. Pretty soon it will be 3TB + 48 or 64 cores.

our 'big machine' at work has 32gb ram and a 1tb drive.

# ? Sep 17, 2013 19:18

havelock: Jan 20, 2004; IGNORE ME; Soiled Meat

Shaggar posted:

representation as in json vs xml is not what we're talking about. we're talking about representation of Person v1 vs representation of Person v2. those are located at different services. using accept to route them is retarded. using urls to route them is not.

Not a fan of this av change, but nicely done whoever is responsible.

Accept is the exact way that a client says 'hey i can understand v1 of a person in json, or a v2 of a person in xml, or just gimmie back some random text if that's the best we can do'.

Why does the client have to know or care where a given implementation version of a service lives?

# ? Sep 17, 2013 20:07

Shaggar: Apr 26, 2006

because only certain versions of the service understand person v1 or person v2. assuming that theres one service that understands everything is stupid.

custom version routing using accept is retarded. urls are the correct way to do it.

# ? Sep 17, 2013 20:12

Brain Candy: May 18, 2006

tef is shaggar and mononcQc is fishmech. wtf is happening in this thread!?

# ? Sep 17, 2013 20:37

Malcolm XML: Aug 8, 2009; I always knew it would end like ｔｈｉｓ．

Brain Candy posted:

tef is shaggar and mononcQc is fishmech. wtf is happening in this thread!?

Didn't you get the headers, they negotiated a new version of posting but kept the names the same

# ? Sep 17, 2013 20:41

Janitor Prime: Jan 22, 2004; PC LOAD LETTER

What da fuck does that mean; Fun Shoe

Malcolm XML posted:

Didn't you get the headers, they negotiated a new version of posting but kept the names the same

lmao

# ? Sep 17, 2013 20:45

Zlodo: Nov 25, 2006

Malcolm XML posted:

Didn't you get the headers, they negotiated a new version of posting but kept the names the same

# ? Sep 17, 2013 20:46

Notorious b.s.d.: Jan 25, 2003; by Reene

fritz posted:

our 'big machine' at work has 32gb ram and a 1tb drive.

let's do the math here:

weekly cost of a software engineer, fully loaded:
$2730.00
($91k average salary, * 1.5 for office space/bennies/taxes, 2 weeks vacation)
weekly cost of a server w/ 768 GB of RAM, 20 cores, and RHEL unlimited guests
$146.25
($23,419 list, 48 month financing)

if they can't afford 5% of your salary for development environments, how loving broken is their accounting? how many other things in the organization are this dysfunctional?

if they value you so little they won't pay for even one virtualization server, it's very probable they just don't know how much your team costs. time to find a job where you can deliver real business value

Notorious b.s.d. fucked around with this message at 21:02 on Sep 17, 2013

# ? Sep 17, 2013 20:57

Notorious b.s.d.: Jan 25, 2003; by Reene

incidentally i used 48 month financing because that's what dell defaults to, i would have to actually apply for a loan to get a more realistic 36 month quote

on the other hand, you can probably get that server for 60% of list if you buy more than 1 or 2 of them. it probably balances out in the end

# ? Sep 17, 2013 20:59

Nomnom Cookie: Aug 30, 2009

that calculation is why in memory DBs are the future. RAM is so fuckin cheap

# ? Sep 17, 2013 21:00

Notorious b.s.d.: Jan 25, 2003; by Reene

Nomnom Cookie posted:

that calculation is why in memory DBs are the future. RAM is so fuckin cheap

i don't know anyone whose database fits in 768 GB of RAM.

but it *is* why SQL databases are still awesome in the age of "big data." 768 GB is more than enough cache to have a useful working set out of your 3 TB database

# ? Sep 17, 2013 21:03

Malcolm XML: Aug 8, 2009; I always knew it would end like ｔｈｉｓ．

Notorious b.s.d. posted:

i don't know anyone whose database fits in 768 GB of RAM.

but it *is* why SQL databases are still awesome in the age of "big data." 768 GB is more than enough cache to have a useful working set out of your 3 TB database

IIRC microsoft calculated that the working set of something like 90% of their "big data" calc was under 32GB

# ? Sep 17, 2013 21:28

Shaggar: Apr 26, 2006

the next version of sql server has heck of ton which lets you put all ur poo poo in memory all the time and apparently uses some lockless system that still guarantees integrity. sounds like black magic, but we'll see.

# ? Sep 17, 2013 21:34

JewKiller 3000: Nov 28, 2006; by Lowtax

Malcolm XML posted:

Didn't you get the headers, they negotiated a new version of posting but kept the names the same

Nice!

# ? Sep 17, 2013 21:59

MononcQc: May 29, 2007

Brain Candy posted:

tef is shaggar and mononcQc is fishmech. wtf is happening in this thread!?

for a second I thought I actually had received a new avatar here

# ? Sep 17, 2013 22:03

MrMoo: Sep 14, 2000

Nomnom Cookie posted:

semantic versioning for apis

/v2.4.5/boners/69/stroke
/v2.4.x/boners/69/stroke
/v2.x.x/boners/69/stroke

so x is interpreted as "latest"

gently caress me.

/v2.4.5/boners/69/stroke
/v2.4/boners/69/stroke
/v2/boners/89/stroke

# ? Sep 18, 2013 01:15

MononcQc: May 29, 2007

the best thing is dealing with combined versions. Client v2.3.7 deals with API v1.12.2, but so does v2.3.6 or something :toot:

# ? Sep 18, 2013 01:19

suffix: Jul 27, 2013; Wheeee!

a good use for content negotiating is serving webp images to browsers that support them, and jpeg to browsers who don't.
users want the image, not a file, and browsers can use any of the image formats they support interchangeably.
(well, mostly. when Facebook tried it, users got confused because they were saving thumbnails and getting webp files. drat those implicit requirements!)

i don't like using content negotiation for api versioning because

1. your special snowflake protocol probably doesn't need its own mime type. if you use a generic type like text/xml or application/json you can read it with browsers or generic adapters.

2. most formats you would be using (form-data, json, xml, protocol buffers, thrift) are extensible enough to support backwards compatibility by adding optional fields, or in the worst case, an explicit version field. any upgrade severe enough to require a format change is probably going to change your URI layout as well.

# ? Sep 18, 2013 01:54

Nomnom Cookie: Aug 30, 2009

MrMoo posted:

gently caress me.

/v2.4.5/boners/69/stroke
/v2.4/boners/69/stroke
/v2/boners/89/stroke

nope make the wildcards explicit

# ? Sep 18, 2013 02:04

MononcQc: May 29, 2007

make the api purposefully return bad results or behave wrong as a built-in chaos monkey so that users learn to be fault tolerant

# ? Sep 18, 2013 02:30

Nomnom Cookie: Aug 30, 2009

im pretty sure the storage people at work are already doing that

# ? Sep 18, 2013 02:58

havelock: Jan 20, 2004; IGNORE ME; Soiled Meat

suffix posted:

a good use for content negotiating is serving webp images to browsers that support them, and jpeg to browsers who don't.
users want the image, not a file, and browsers can use any of the image formats they support interchangeably.
(well, mostly. when Facebook tried it, users got confused because they were saving thumbnails and getting webp files. drat those implicit requirements!)

i don't like using content negotiation for api versioning because

1. your special snowflake protocol probably doesn't need its own mime type. if you use a generic type like text/xml or application/json you can read it with browsers or generic adapters.

2. most formats you would be using (form-data, json, xml, protocol buffers, thrift) are extensible enough to support backwards compatibility by adding optional fields, or in the worst case, an explicit version field. any upgrade severe enough to require a format change is probably going to change your URI layout as well.

Regarding #1, just support both. I've usually had application/json or application/xml just return the latest version of the resource in json or xml respectively. Then everything works nicely in the browser and real clients who depend on specific versions can use your custom media type with the embedded version info in it.

The rels for any links you embed in your resources will likely be vendor specific anyway (and they definitely aren't part of the application/json media type), so that seems to fit better with a custom media type.

# ? Sep 18, 2013 04:23

fritz: Jul 26, 2003

Notorious b.s.d. posted:

let's do the math here:

weekly cost of a software engineer, fully loaded:
$2730.00
($91k average salary, * 1.5 for office space/bennies/taxes, 2 weeks vacation)

weekly cost of a server w/ 768 GB of RAM, 20 cores, and RHEL unlimited guests
$146.25
($23,419 list, 48 month financing)

if they can't afford 5% of your salary for development environments, how loving broken is their accounting? how many other things in the organization are this dysfunctional?

if they value you so little they won't pay for even one virtualization server, it's very probable they just don't know how much your team costs. time to find a job where you can deliver real business value

I had to agitate for months to get that machine, prior to that I was doing everything on an 8g MacBook pro from 2011. just a couple weeks ago I discovered that they had several 64+gb servers in some hosting somewhere that had been sitting idle for literal years, right now I think they're stacked in a back room.

my stock was supposed to vest today. well see how much they value me when I give two weeks notice tomorrow.

# ? Sep 18, 2013 06:36

fritz: Jul 26, 2003

fritz posted:

I had to agitate for months to get that machine, prior to that I was doing everything on an 8g MacBook pro from 2011. just a couple weeks ago I discovered that they had several 64+gb servers in some hosting somewhere that had been sitting idle for literal years, right now I think they're stacked in a back room.

my stock was supposed to vest today. well see how much they value me when I give two weeks notice tomorrow.

our cto is selling us as a big data company, right now people are all agitated because we're bringing a new thing online that'll come to like four (4) terabytes per year.

# ? Sep 18, 2013 06:39

JewKiller 3000: Nov 28, 2006; by Lowtax

jesus loving christ can we get past http urls vs headers and talk about actual programming languages again

here let me start: php is not a programming language, it's a set of functions that you can call, but probably should not

# ? Sep 18, 2013 08:06

minidracula: Dec 22, 2007; boo woo boo

OK, I skimmed through most of the HTTP chat for the past few pages until I got to this one and just had to post, furiously swiping my thumb up the screen of my phone, so please excuse even less reading than usual, but:

1. I think lots of things should use BEEP.

2. Use a real man's protocol: PCIe Gen 3.0 x16. DMA bitch.

If and/or how serious I am about either of these is left as an exercise to the reader.

# ? Sep 18, 2013 13:03

tef: May 30, 2004; -> some l-system crap ->

use php

# ? Sep 18, 2013 13:05

Blotto Skorzany: Nov 7, 2008; He's a PSoC, loose and runnin'
came the whisper from each lip
And he's here to do some business with
the bad ADC on his chip
bad ADC on his chiiiiip

tripletef

# ? Sep 18, 2013 13:17

Blotto Skorzany: Nov 7, 2008; He's a PSoC, loose and runnin'
came the whisper from each lip
And he's here to do some business with
the bad ADC on his chip
bad ADC on his chiiiiip

i should have revived cheesebot and made an account for it :/

# ? Sep 18, 2013 13:18

Wheany: Mar 17, 2006; Spinya^{ha^{ha^haha}ha}ha_{ha_{ha_haha}ha}ha!; Doctor Rope

i have to poast onge again that i hate how every programming related course is so loving concerned about performance when discussing things like patterns.

strategy pattern pros: you can change the implementation on the fly, cons: you have to call a function from a virtual function table :ohdear:

(which is what they mean by slightly decreased performance)

# ? Sep 18, 2013 13:22

Wheany: Mar 17, 2006; Spinya^{ha^{ha^haha}ha}ha_{ha_{ha_haha}ha}ha!; Doctor Rope

lol, wheany, so performance doesn't matter, well perhaps you have heard of a little thing called the travelling salesman problem. check-mate

# ? Sep 18, 2013 13:24

Adbot: ADBOT LOVES YOU

# ? Jun 11, 2024 15:11

minidracula: Dec 22, 2007; boo woo boo

Actual PL inquiry: has anyone here used Julia for anything (on Windows)? Does it run/install?

I'm hoping it's a better situation than trying the current stable Rust builds on Windows but I haven't tried yet.

Also, people interested in (simple, 2D) games programming should check out and play around with strlen's Lobster.

# ? Sep 18, 2013 13:27

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > YOSPOS > p-lang thread: (now (have you (problems two)))

«‹›1784 »