Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
MononcQc
May 29, 2007

Shaggar posted:

actually, out of curiosity how are you collecting their logs right now? do you hijack common logging libs and/or provide them an official heroku log api?

Stdout (or whatever) data is piped to individual syslog services, which then push it to the router. Our router basically aggregates various standard syslog transports from different components in the system (app itself, routing logs, DB logs, etc.) and re-routes them to user-configured endpoints. These can be the command line 'heroku' thing (lets you see the recent ones, tail them live), done over an API call, you can ask the router to redirect the log streams to whatever endpoints you want (TCP Syslog, Syslog/HTTPs if you ask for it), in any combination whatsoever.

Adbot
ADBOT LOVES YOU

Shaggar
Apr 26, 2006
ah, so you're looking for a way to handle the redirect to a custom endpoint and what format and protocol that should look like?

Shaggar
Apr 26, 2006
if that's the case i think sticking w/ syslog is fine cause everything has a way to parse syslog. if you want to cleanup the stuff before it you're looking at a project to develop a common logging library for all hosted languages

MononcQc
May 29, 2007

Shaggar posted:

ah, so you're looking for a way to handle the redirect to a custom endpoint and what format and protocol that should look like?

Pretty much. Multiple endpoints, in fact. You could have 4 TCP-Syslog endpoints that go to different service and an HTTP one to your own special sauce thingy, while still using the CLI tools to read them live. Or you could have none and just shut it all down.

The HTTPS pipe was added for specific users who felt their logs were somewhat sensitive and didn't feel safe sending them over TCP across different data centers. TCP/TLS syslog could have been used, but at this point I have no idea what went behind the decision making as it happened way before I joined.

Shaggar posted:

if that's the case i think sticking w/ syslog is fine cause everything has a way to parse syslog. if you want to cleanup the stuff before it you're looking at a project to develop a common logging library for all hosted languages

Yeah afaik, they went with syslog especially because everything uses and supports it.

Notorious b.s.d.
Jan 25, 2003

by Reene

MononcQc posted:

It's hairy for the custom logging protocol to be built on HTTP, but it shares some things for practical reasons. Syslog messages are individually stateless, and HTTP shares that. Syslog APIs are explicitly client-server, because that's how logs are produced and consumed, and HTTP shares that.

These considerations haven't been chucked out for our bizarro logging meta-protocol.

Interoperation is a given problem because nobody else is using it except those who ask for such features and consumers. At best, the format is trivial and uses HTTP as a wrapper, and optionally attaches some metadata (that can safely be ignored). By default we push for the RFC-standard TCP-Syslog implementation, though, and the bizarro HTTP(s) one is only given on-demand if need arises.

if this is the case, where did your gigantic insane list of bullet points come from?

because the system you just described is way simpler and probably inadequate to the requirements you specified earlier

fritz
Jul 26, 2003

tef posted:

still, if your dataset fits in memory or a single disk then hadoop isn't likely going to be cost effective

otoh let's get a whole bunch of machines, install hdfs, then a job manager to run over a dataset in parallel, which each step reading from and serializing to disk


yeah if you can fit that poo poo into memory you load it up and run w/ it

Notorious b.s.d.
Jan 25, 2003

by Reene

fritz posted:

yeah if you can fit that poo poo into memory you load it up and run w/ it

Don't underestimate what fits in memory.


You can get an x86 box with 3.0 TB of RAM now. That box is 3 TB of RAM + 40 cores inside 5 rack units. Pretty soon it will be 3TB + 48 or 64 cores.

MononcQc
May 29, 2007

Notorious b.s.d. posted:

if this is the case, where did your gigantic insane list of bullet points come from?

because the system you just described is way simpler and probably inadequate to the requirements you specified earlier

These can be user demands or requirements depending on their apps and use cases.

The system I just described uses multiple protocols to support all/most of the bullet points from my list on a per-demand basis depending on what each customer needs. You can use whatever. Here's the breakdown and how these 2-3 transport formats manage to work right:

quote:

maintain long-lived connections (reconnecting is always slow)

This is supported both by HTTP(s) (1.1 has keepalive by default) and TCP by design.

allow for multiple sessions/connections between any two endpoints (i.e. a service provider should be able to receive data on behalf of many different users from the same node)

This is supported by TCP on a per-connection basis, and can be done with HTTP too given the information is log-specific. Still we open a connection per app, so that their buffers are not being shared in software.

allow processing timeouts (infinite timeouts are a bad thing)

This can be done both ways with any kind of timeout on send and waiting for an ack or a response. Because Syslog RFC5424 works with UDP also, we can timeout only client-side (the router in this case) and declare a loss, kill the connection if we're really unsure and it were to have a bad effect, and start again.

support control flow mechanisms (we can drop messages, but must be able to know about it to report the losses)

Any synchronous protocol can do it. This makes UDP syslog a non-starter for our use case though, because without that synchronous behaviour, we can't know if a message was dropped and report on it.

allowing for retries/retransmissions is a good plus, but not vital (we turn ambiguous case in reported losses)

TCP syslog doesn't do this, because individual messages following a disconnection are lost. HTTP(s) allows us to do it because you have a reply/response pattern and you can add metadata specifically added and documented to mention it. TCP/TLS wouldn't be able to do it. The retry feature is useful for low-traffic log channels that have important information you do not want to drop, and do not want duplicated. We can work towards an 'at most once' mechanism for that.

batching is useful given the nature of logs (throughput > low latency)

TCP-Syslog supports batching over a single frame if messages allow it. HTTP(s) allows to use content-length to send a single bigger message while still supporting retransmission.

optionally encrypt data for safe transmission over unsecured networks

HTTPS does it. TCP/TLS would.

be supported by as many platforms and languages as possible (who gives a poo poo about a thing nobody can use)

HTTPS does it very well, and you will not have any problem with any single middleman. Arguable for TCP/TLS when it comes to middlemen, but app support should be there.

be conceptually simple enough, for the same reasons as above

It is somewhat simple. It's Syslog over HTTP(s). Otherwise TCP-Syslog is a well known standard

the protocol should be extensible, but at this point this is just a plus

Headers are extensible. They're not the best at it, but they can do an okay job in a pinch. TCP-Syslog isn't extensible.

can unambiguously represent string data -- encoding can be agreed upon if standard, though

Either defined by an RFC in TCP-Syslog, or through content-encoding values in HTTP(s)

compression is a plus, but not necessary

Supported by HTTP(s)

be CPU-efficient so that a single user doesn't make CPUs spin 100%
be memory-efficient for the same reason
be bandwidth-efficient, for the same reason

TCP Syslog is efficient enough. HTTP is arguable, depends on your IO pipeline and compression. HTTPS isn't the best due to crypto. If we can manage to send most data over TCP-Syslog, we are good to go. Which we do.

So depending on if you really want security, or things like retries before dumping messages for lower-level apps, you can ask and use a specific protocol that supports that feature better.

You don't implement all of these in a single protocol, you give a choice and let individual users make decisions based on what they need once they feel dissatisfied with the standard option (TCP-Syslog).

I have then been told by this thread that giving user choices is a dumb thing and I should be ashamed and worked on problems better, and that the list of requirements above was an horror despite us being able to do an acceptable job at dealing with them in production.

MononcQc fucked around with this message at 19:19 on Sep 17, 2013

fritz
Jul 26, 2003

Notorious b.s.d. posted:

Don't underestimate what fits in memory.


You can get an x86 box with 3.0 TB of RAM now. That box is 3 TB of RAM + 40 cores inside 5 rack units. Pretty soon it will be 3TB + 48 or 64 cores.

our 'big machine' at work has 32gb ram and a 1tb drive.

havelock
Jan 20, 2004

IGNORE ME
Soiled Meat

Shaggar posted:

representation as in json vs xml is not what we're talking about. we're talking about representation of Person v1 vs representation of Person v2. those are located at different services. using accept to route them is retarded. using urls to route them is not.

Not a fan of this av change, but nicely done whoever is responsible.

Accept is the exact way that a client says 'hey i can understand v1 of a person in json, or a v2 of a person in xml, or just gimmie back some random text if that's the best we can do'.

Why does the client have to know or care where a given implementation version of a service lives?

Shaggar
Apr 26, 2006
because only certain versions of the service understand person v1 or person v2. assuming that theres one service that understands everything is stupid.

custom version routing using accept is retarded. urls are the correct way to do it.

Brain Candy
May 18, 2006

:psypop: tef is shaggar and mononcQc is fishmech. wtf is happening in this thread!?

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

Brain Candy posted:

:psypop: tef is shaggar and mononcQc is fishmech. wtf is happening in this thread!?


Didn't you get the headers, they negotiated a new version of posting but kept the names the same

Janitor Prime
Jan 22, 2004

PC LOAD LETTER

What da fuck does that mean

Fun Shoe

Malcolm XML posted:

Didn't you get the headers, they negotiated a new version of posting but kept the names the same

lmao

Zlodo
Nov 25, 2006

Malcolm XML posted:

Didn't you get the headers, they negotiated a new version of posting but kept the names the same

Notorious b.s.d.
Jan 25, 2003

by Reene

fritz posted:

our 'big machine' at work has 32gb ram and a 1tb drive.

let's do the math here:
  • weekly cost of a software engineer, fully loaded:
    $2730.00
    ($91k average salary, * 1.5 for office space/bennies/taxes, 2 weeks vacation)

  • weekly cost of a server w/ 768 GB of RAM, 20 cores, and RHEL unlimited guests
    $146.25
    ($23,419 list, 48 month financing)

if they can't afford 5% of your salary for development environments, how loving broken is their accounting? how many other things in the organization are this dysfunctional?

if they value you so little they won't pay for even one virtualization server, it's very probable they just don't know how much your team costs. time to find a job where you can deliver real business value

Notorious b.s.d. fucked around with this message at 21:02 on Sep 17, 2013

Notorious b.s.d.
Jan 25, 2003

by Reene
incidentally i used 48 month financing because that's what dell defaults to, i would have to actually apply for a loan to get a more realistic 36 month quote

on the other hand, you can probably get that server for 60% of list if you buy more than 1 or 2 of them. it probably balances out in the end

Nomnom Cookie
Aug 30, 2009



that calculation is why in memory DBs are the future. RAM is so fuckin cheap

Notorious b.s.d.
Jan 25, 2003

by Reene

Nomnom Cookie posted:

that calculation is why in memory DBs are the future. RAM is so fuckin cheap

i don't know anyone whose database fits in 768 GB of RAM.

but it *is* why SQL databases are still awesome in the age of "big data." 768 GB is more than enough cache to have a useful working set out of your 3 TB database

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

Notorious b.s.d. posted:

i don't know anyone whose database fits in 768 GB of RAM.

but it *is* why SQL databases are still awesome in the age of "big data." 768 GB is more than enough cache to have a useful working set out of your 3 TB database

IIRC microsoft calculated that the working set of something like 90% of their "big data" calc was under 32GB

Shaggar
Apr 26, 2006
the next version of sql server has heck of ton which lets you put all ur poo poo in memory all the time and apparently uses some lockless system that still guarantees integrity. sounds like black magic, but we'll see.

JewKiller 3000
Nov 28, 2006

by Lowtax

Malcolm XML posted:

Didn't you get the headers, they negotiated a new version of posting but kept the names the same

Nice!

MononcQc
May 29, 2007

Brain Candy posted:

:psypop: tef is shaggar and mononcQc is fishmech. wtf is happening in this thread!?

for a second I thought I actually had received a new avatar here

MrMoo
Sep 14, 2000

Nomnom Cookie posted:

semantic versioning for apis

/v2.4.5/boners/69/stroke
/v2.4.x/boners/69/stroke
/v2.x.x/boners/69/stroke

so x is interpreted as "latest"

gently caress me.

/v2.4.5/boners/69/stroke
/v2.4/boners/69/stroke
/v2/boners/89/stroke

MononcQc
May 29, 2007

the best thing is dealing with combined versions. Client v2.3.7 deals with API v1.12.2, but so does v2.3.6 or something :toot:

suffix
Jul 27, 2013

Wheeee!
a good use for content negotiating is serving webp images to browsers that support them, and jpeg to browsers who don't.
users want the image, not a file, and browsers can use any of the image formats they support interchangeably.
(well, mostly. when Facebook tried it, users got confused because they were saving thumbnails and getting webp files. drat those implicit requirements!)

i don't like using content negotiation for api versioning because

1. your special snowflake protocol probably doesn't need its own mime type. if you use a generic type like text/xml or application/json you can read it with browsers or generic adapters.

2. most formats you would be using (form-data, json, xml, protocol buffers, thrift) are extensible enough to support backwards compatibility by adding optional fields, or in the worst case, an explicit version field. any upgrade severe enough to require a format change is probably going to change your URI layout as well.

Nomnom Cookie
Aug 30, 2009



MrMoo posted:

gently caress me.

/v2.4.5/boners/69/stroke
/v2.4/boners/69/stroke
/v2/boners/89/stroke

nope make the wildcards explicit

MononcQc
May 29, 2007

make the api purposefully return bad results or behave wrong as a built-in chaos monkey so that users learn to be fault tolerant

Nomnom Cookie
Aug 30, 2009



im pretty sure the storage people at work are already doing that

havelock
Jan 20, 2004

IGNORE ME
Soiled Meat

suffix posted:

a good use for content negotiating is serving webp images to browsers that support them, and jpeg to browsers who don't.
users want the image, not a file, and browsers can use any of the image formats they support interchangeably.
(well, mostly. when Facebook tried it, users got confused because they were saving thumbnails and getting webp files. drat those implicit requirements!)

i don't like using content negotiation for api versioning because

1. your special snowflake protocol probably doesn't need its own mime type. if you use a generic type like text/xml or application/json you can read it with browsers or generic adapters.

2. most formats you would be using (form-data, json, xml, protocol buffers, thrift) are extensible enough to support backwards compatibility by adding optional fields, or in the worst case, an explicit version field. any upgrade severe enough to require a format change is probably going to change your URI layout as well.

Regarding #1, just support both. I've usually had application/json or application/xml just return the latest version of the resource in json or xml respectively. Then everything works nicely in the browser and real clients who depend on specific versions can use your custom media type with the embedded version info in it.

The rels for any links you embed in your resources will likely be vendor specific anyway (and they definitely aren't part of the application/json media type), so that seems to fit better with a custom media type.

fritz
Jul 26, 2003

Notorious b.s.d. posted:

let's do the math here:
  • weekly cost of a software engineer, fully loaded:
    $2730.00
    ($91k average salary, * 1.5 for office space/bennies/taxes, 2 weeks vacation)

  • weekly cost of a server w/ 768 GB of RAM, 20 cores, and RHEL unlimited guests
    $146.25
    ($23,419 list, 48 month financing)

if they can't afford 5% of your salary for development environments, how loving broken is their accounting? how many other things in the organization are this dysfunctional?

if they value you so little they won't pay for even one virtualization server, it's very probable they just don't know how much your team costs. time to find a job where you can deliver real business value

I had to agitate for months to get that machine, prior to that I was doing everything on an 8g MacBook pro from 2011. just a couple weeks ago I discovered that they had several 64+gb servers in some hosting somewhere that had been sitting idle for literal years, right now I think they're stacked in a back room.

my stock was supposed to vest today. well see how much they value me when I give two weeks notice tomorrow.

fritz
Jul 26, 2003

fritz posted:

I had to agitate for months to get that machine, prior to that I was doing everything on an 8g MacBook pro from 2011. just a couple weeks ago I discovered that they had several 64+gb servers in some hosting somewhere that had been sitting idle for literal years, right now I think they're stacked in a back room.

my stock was supposed to vest today. well see how much they value me when I give two weeks notice tomorrow.

our cto is selling us as a big data company, right now people are all agitated because we're bringing a new thing online that'll come to like four (4) terabytes per year.

JewKiller 3000
Nov 28, 2006

by Lowtax
jesus loving christ can we get past http urls vs headers and talk about actual programming languages again

here let me start: php is not a programming language, it's a set of functions that you can call, but probably should not

minidracula
Dec 22, 2007

boo woo boo
OK, I skimmed through most of the HTTP chat for the past few pages until I got to this one and just had to post, furiously swiping my thumb up the screen of my phone, so please excuse even less reading than usual, but:

1. I think lots of things should use BEEP.

2. Use a real man's protocol: PCIe Gen 3.0 x16. DMA bitch.

If and/or how serious I am about either of these is left as an exercise to the reader.

tef
May 30, 2004

-> some l-system crap ->
use php

Blotto Skorzany
Nov 7, 2008

He's a PSoC, loose and runnin'
came the whisper from each lip
And he's here to do some business with
the bad ADC on his chip
bad ADC on his chiiiiip
tripletef

Blotto Skorzany
Nov 7, 2008

He's a PSoC, loose and runnin'
came the whisper from each lip
And he's here to do some business with
the bad ADC on his chip
bad ADC on his chiiiiip
i should have revived cheesebot and made an account for it :/

Wheany
Mar 17, 2006

Spinyahahahahahahahahahahahaha!

Doctor Rope
i have to poast onge again that i hate how every programming related course is so loving concerned about performance when discussing things like patterns.

strategy pattern pros: you can change the implementation on the fly, cons: you have to call a function from a virtual function table :ohdear: (which is what they mean by slightly decreased performance)

Wheany
Mar 17, 2006

Spinyahahahahahahahahahahahaha!

Doctor Rope
lol, wheany, so performance doesn't matter, well perhaps you have heard of a little thing called the travelling salesman problem. check-mate

Adbot
ADBOT LOVES YOU

minidracula
Dec 22, 2007

boo woo boo
Actual PL inquiry: has anyone here used Julia for anything (on Windows)? Does it run/install?

I'm hoping it's a better situation than trying the current stable Rust builds on Windows but I haven't tried yet.

Also, people interested in (simple, 2D) games programming should check out and play around with strlen's Lobster.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply