Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
tef
May 30, 2004

-> some l-system crap ->

redleader posted:

it seems to be a lesson you need to learn through bitter operational experience

that's operations

quote:

some internet person writing about it is unlikely to convince you that queues aren't the be all and end all

you say this but i have literally stopped someone writing a message broker in rust apparently

quote:

- especially with the hype that comes with eg kafka

kafka is a weird one, because, unlike many of the contemporaries, well, it's built with the assumption of running in a cluster, and topics being partitioned


meanwhile: there's like 20 data platform startups

all writing their own component of some sort of stack, made from the other companies apache software projects

Adbot
ADBOT LOVES YOU

tef
May 30, 2004

-> some l-system crap ->

Cocoa Crispies posted:

yeah almost as if they worked somewhere that outgrew a lot of practices that work at small and medium scale

I WISH

tef
May 30, 2004

-> some l-system crap ->

Mr SuperAwesome posted:

i want a thing that lets me
- pull all repos
- check out ticket branch for a repo if exists else use master
- merge branch A into my branch for all repos

it's called a monorepo

tef
May 30, 2004

-> some l-system crap ->
https://redfin.engineering/well-never-know-whether-monorepos-are-better-2c08ab9324c0

like someone went out and changed a team and came to exactly zero conclusions

because they assume the cost of a monorepo/multirepo is uniform across the team

if you're working on a single component, like mutlirepos is like a monorepo, but not having to deal with other people as much, and you don't have to spend your time rebasing things like a chump

on the other hand if you're making lots of cross cutting changes, or doing deployment or operations, a monorepo is far far easier to handle as you keep a global snapshot around

but mostly the git monorepo / microrepo debate boils down to what you hate more: git's lack of subtree checkouts or git submodules

tef
May 30, 2004

-> some l-system crap ->
flat multirepos: this is great

multirepos that have subtle dependency chains: oh god kill me already

tef
May 30, 2004

-> some l-system crap ->

Shaggar posted:

bugs is right. logic belongs in the controller. logic in the model is asking for pain.

how do you feel about stored procedures shaggar

tef
May 30, 2004

-> some l-system crap ->

go 2 considered harmful

tef
May 30, 2004

-> some l-system crap ->

MALE SHOEGAZE posted:

so, i'm working on speeding up my memcached clone and I'm currently messing with the way access to the backing store is synchronized. I'm currently testing two different methods:

1) There's a mutex around the backing LRU cache (both reads and writes need the mutex, because the LRU cache uses a linked hash map and getting items rearranges the data). Each request needs to wait for the lock. This takes about 25 micros per request.

2) I've got a single worker with unsynchronized access to the store, running in a loop. When a request comes in, I push the work onto a dequeue. The worker spins until work gets added to the queue, it pops the work off the queue, does the work (adding/removing/getting items from the cache), and then sends a response via a channel. Despite all of the ceremony, this method seems to take about 11-13 micros per request. It's faster!

I guess my question is: what's the best way to handle this (assuming i'm building a memcached-like key value store, with performance being the emphasis)? Memcached uses slabs and I'm still reading up on understanding how slabs would be used for caching.

also sorry this question is stupid, i smoked too much before posting it

it depends

1) will work out, but you can scale it out by partitioning your tree into ranges and having one lock per range

2) this will work but honestly how much faster is this than a single thread and a select loop

if performance is your emphasis, well, you want to be able to do cheap reads

something like read copy update might work better

a) partition hash keyspace into N tables
b) put an insert/delete lock around each hash table, with a version number
c) use some clever read locks and do compare-and-swap for updates

write protocol:
- obtain writer lock
- compare and swap in new data which points to old data
- increment version number
- wait for old readers to wrap up
- trash old data

read protocol:
- register as reader at current version number,
- do lookup
- read older data if new record is above current version
- unregister

tef
May 30, 2004

-> some l-system crap ->

Ploft-shell crab posted:

this is essentially just mvcc right?

kinda

read copy update is a concurrency pattern that keeps old values around (i.e old readers stall writers, writers (somewhat block) new readers iirc)
but there isn't ordering across partitons/writers

mvcc is similar in that you have multiple values, and you're doing concurrency control, but that normally comes with a larger transactional protocol,
and synchronising with the write ahead log

and depending on the isolation level, you can get some ordering

tef
May 30, 2004

-> some l-system crap ->

Xarn posted:

I am assuming that reads are waaaaay more common than updates, so I'd just dispense with the ceremony and do QSBR RCU hash table. Basically free reads, writer stall can be configured depending on your contention and worst case is that there will be slightly old data (seeking ordered versions of data in cache is foolish errand anyway).

this is probably simpler than partitioning but it's kinda one of those 'everything is a nail' solutions for me.

i have a hazy memory of rcu specifics and felt lazy enough, also i love epochs over checkpoints/qsbr

i mention the writer lock/insert lock as i kinda assume they've wrapped a normal hash table somehow in golang

the biggest problem with concurrent hash tables is creates and deletes so that's kinda why i was suggesting partitioned locks

so you don't end up in weird fuckups of writers contesting or reading uninitalized data

MALE SHOEGAZE posted:

thanks friend, ill give this a shot along with using slab storage for the partitions. sounds like fun!

please be sure and write performance tests

- cache warm up time (writes under no reads)
- read time under low writes/churn
- read time under high churn/writes

like, you have a lot of leeway to go hog wild

there's fun things to do, and you're already sorta doing them: instead of a lock, you can have a single process that takes requests and vice/verse

you can even do things like replacing the channels with a ring buffer, and maybe cheating a bit,

i.e instead of having threads that take a read/write lock, or a single thread that does all read/writes

you can have a ring buffer instead of a channel

network threads write to front of buffer, read from tail
lru threads: reader threads use an atomic integer to race through incoming lookups, but spin when they hit a write operation, single writer thread races ahead of read operations
and the single writer thread can handle eviction, etc.

but tbh it's probably needless optimization, we didn't find 1 redis per core went faster than 1 redis and gently caress the other cores

tef
May 30, 2004

-> some l-system crap ->
it realllly depends where your costs are

if serializing/deserializing is expensive, then yeah, doing that in a thread pool will be cheap

but you could likely write something that uses a single main thread that does the network i/o and all read requests,

and fire off the create/update requests in backgroud routines (leaving tombstones for deletes), and have the main thread tag in when it's safe to evict old records

tef
May 30, 2004

-> some l-system crap ->

Radix LSD :2bong:

to people who haven't clicked, this also visualizes the relative displacement of elements sort separately from it's value/position in list

tef
May 30, 2004

-> some l-system crap ->
ed is the standard editor

tef
May 30, 2004

-> some l-system crap ->

MALE SHOEGAZE posted:

at my startup our architecture is microservices and a message queue.

each microservice has its own mongo database, because this reduces coupling. allowing the services to access the same database would increase coupling, so if one service needs access to data in another service, the originating service will just dump the entire collection into the pipeline, and the consuming service will write out all of the entries into its own database.

whenever an entity in the database is updated, the responsible service will emit an update event, and dump the entire object into the pipeline. consuming services will then write it to their own db, taking extreme care to update any and all data associations (a traditional DB would of course enforce these relationships for you, but it's no loss because keeping data in sync is a totally trivial problem compared to coupling, which is the worst problem).

the architects of this system did not concern themselves with concurrency, because data races are trivial compared to coupling. we've simply forced each consumer to run on a single thread, because concurrency issues are difficult to solve and we have more important problems such as reducing the amount of coupling in our system.

naturally, this system contains json entities that can be over 1mb compressed. if a user updates a single field on one of these entities, the entire 1mb model will get dumped into the queue. if they update the model twice (or 100 times) it will get dumped into the queue each time. this in no way overwhelms the system.

a few months back, i introduced an RPC mechanism so that we could at least make synchronous calls between services in response to user events. today my lead informed me that we're going to deprecate the RPC system because it increases coupling.

this is how you architect a system with 12 microservices that cannot handle more than 4 or 5 concurrent users before falling over. Fortunately, since everything is so decoupled, the system at least maintains the appearance of working.

bet you things are 'reusable' too

tef
May 30, 2004

-> some l-system crap ->

Ciaphas posted:

speaking as a top tier scrublord what's the alternative if you control (indeed, are writing) both the producer and consumer

i first tried direct sockets and that works but i have been told it is a major no no

no-one ever got fired for using grpc

tef
May 30, 2004

-> some l-system crap ->

Ciaphas posted:

speaking as a top tier scrublord what's the alternative if you control (indeed, are writing) both the producer and consumer

i first tried direct sockets and that works but i have been told it is a major no no

it literally depends on the protocol

the talent deficit posted:

message queues should only be used when you have zero control over the producers (or you give zero fucks about the consumer). if you're inserting a message queue between your own producer and consumers you are a top tier scrublord

aka 'pubsub is about isolating, not integrating' sure i wrote something about this in this thread already

tef
May 30, 2004

-> some l-system crap ->

Sapozhnik posted:

i mean rpc is just a catch-all term for any protocol where one party sends requests for which the other party always generates one response. it's too general a concept to be either good or bad in its own right.

rpc with some sort of stateful context to the conversation is bad (see: corba, dcom, etc). whatever crazy conway's law bullshit shoegaze's company has going on is bad.

i dunno sometimes stateful contexts can work if you're using like a session id to represent it, like within a cookie

tef
May 30, 2004

-> some l-system crap ->

MALE SHOEGAZE posted:

at my startup our architecture is microservices and a message queue.

each microservice has its own mongo database, because this reduces coupling. allowing the services to access the same database would increase coupling, so if one service needs access to data in another service, the originating service will just dump the entire collection into the pipeline, and the consuming service will write out all of the entries into its own database.

whenever an entity in the database is updated, the responsible service will emit an update event, and dump the entire object into the pipeline. consuming services will then write it to their own db, taking extreme care to update any and all data associations (a traditional DB would of course enforce these relationships for you, but it's no loss because keeping data in sync is a totally trivial problem compared to coupling, which is the worst problem).

the architects of this system did not concern themselves with concurrency, because data races are trivial compared to coupling. we've simply forced each consumer to run on a single thread, because concurrency issues are difficult to solve and we have more important problems such as reducing the amount of coupling in our system.

naturally, this system contains json entities that can be over 1mb compressed. if a user updates a single field on one of these entities, the entire 1mb model will get dumped into the queue. if they update the model twice (or 100 times) it will get dumped into the queue each time. this in no way overwhelms the system.

a few months back, i introduced an RPC mechanism so that we could at least make synchronous calls between services in response to user events. today my lead informed me that we're going to deprecate the RPC system because it increases coupling.

this is how you architect a system with 12 microservices that cannot handle more than 4 or 5 concurrent users before falling over. Fortunately, since everything is so decoupled, the system at least maintains the appearance of working.

needs more containers and to be ported to kubernetes by next month tia

tef
May 30, 2004

-> some l-system crap ->

Wheany posted:

is there some kind of an universal sentence tokenization library that will spit out a stream of words from any text, written in any script?

i know doing that correctly is Really Hard, but is there something that produces stable results, even if incorrect? i want to try some bayesian classification on tweets as an idiot side project, and it would be nice if it's not hardcoded to support only languages that separate words with whitespace.

nltk is probably your best bet

quote:

is there a word for what i'm looking for? (a unicorn lol)

segmentation ?

https://en.wikipedia.org/wiki/Text_segmentation#Word_segmentation

tef
May 30, 2004

-> some l-system crap ->

eschaton posted:

someone needs to combine message queues with document databases with blockchains

I bet someone like tef or MononcQc could come up with a combined pitch for such a terrible piece of software that could easily raise $texas in angel & VC investment, followed by a nine figgy buyout by MS or Oracle or CA or someone

um the implementation doesn't matter or the technical skills, it's if you know the right people to ask for money

then it goes a little something like

slide 1. the problem

a bunch of stuff on one side is talking to a bunch of stuff, but it is a mess, oh no

and not everything talks to everything else

slide 2. the dream

now they all talk to one point, represented as a cloud

slide 3. the pitch

same as slide 2, but the startup logo is on the cloud, and you explain why you have some magic ingredient that your competitors don't have, and maybe some whacky sub-plot that won't work long term but might attract a bunch of press and attention your way

and the buyouts aren't that much unless you're a high-status company in the first place, companies will pay more for user base, maybe adoption is harder than hiring


so like

slide 1: people need to get from place to place but take all of these different things
slide 2: what if they could take one thing for every journey, ad-hoc
slide 3: uber logo

slide 1: developers struggle to put software into production, and there is a lot of amazon lock in
slide 2: what if they could use one tool to handle ci, cd and ensure local and production environments run the same, and maybe not use amazon
slide 3: docker logo

you aren't selling investors a technology, you're selling them a market you plan to control

you sell the technology to tech journalists

tef
May 30, 2004

-> some l-system crap ->

MALE SHOEGAZE posted:

hey i'm presenting on all this mongo -> postgres stuff I've been doing. One of the questions I know I'll get is "why postgres instead of mysql." The answer to this is json support (since we're migrating from mongo this is really critical), but are there any articles I can read/link to on why postrgres is almost always better than mysql? I can't just say "everyone on yospos/the internet says so" but to be honest I don't know that much about databases.

postgres is what people use to test other databases for correctness, namely sqlite

postgres has rich data types: jsonb, and a whole bunch of lovely indexes

and postgres has transactions to the point of transactional schema changes

previously:

postgres had to be vacuumed manually
autovaccum locked the poo poo out of the tables
mysql had a lot more in the way of replication and other tooling

this is less true now, specifically the first two: 9.3 onwards, and 10 especially are looking to be very nice for vacuuming (major locking improvements)

but the tooling for multi-primary replication is still way ahead


mysql is more like a very good b-tree with poor sql support and reasonable tooling

iirc, mysql does in-place update of values, meanwhile postgress is more mvcc, it keeps the old values around until not needed

so

for very high volume writes of often ephemeral data, mysql is a pretty good choice

but, postgres is just a much better default

tef
May 30, 2004

-> some l-system crap ->
just remember to use utf8_mb4 aka 'no i really mean utf-8' and also to turn off swedish collation by default, you should be fine with mysql

tef
May 30, 2004

-> some l-system crap ->

gonadic io posted:

So if mongo is the snap chat of databases, is mysql the twitter?

tef
May 30, 2004

-> some l-system crap ->

Shinku ABOOKEN posted:

i was once told that if data isnt sensitive just mark it as deleted without actually deleting it and move on. is this good advice?

reasonable

also, if data is sensitive, you delete the key, because it's encrypted, right?

tef
May 30, 2004

-> some l-system crap ->

redleader posted:

guessing keys isn't a thing. either you have access controls in place to prevent someone from looking up record (n + 1) and it doesn't matter, or you don't and you're a clown and you have other, worse vulnerabilities to worry about

lol it's totally a thing

tef
May 30, 2004

-> some l-system crap ->

Luigi Thirty posted:

nope I’m on my way to being a girl 🌈

:toot:

tef
May 30, 2004

-> some l-system crap ->
i've lost count of how many goons came out this year tbh

tef
May 30, 2004

-> some l-system crap ->

Slurps Mad Rips posted:

how did that one post about this go? something like

"the three endings to posting on SA are

1) writing articles for vice
2) coming out as trans
3) sending women you don't like pictures of dog tits over twitter"

or something like that.

i did some sourcing for sarah jeong once so technically

tef
May 30, 2004

-> some l-system crap ->

cinci zoo sniper posted:

even in cases like these below?
code:
try:
    bar = foo["x"]
except TypeError:
    print("baz")
code:
if isinstance(foo, dict):
    bar = foo["x"]
else:
    print("baz")

bar = foo.get('x', 'baz')

tef
May 30, 2004

-> some l-system crap ->
if you're doing list['x'] what

tef
May 30, 2004

-> some l-system crap ->

Fututor Magnus posted:

good programmers, how did you become able to program without googling solutions to problems but instead coming up with the solution yourself?

no idea, i google the poo poo out of things. also, most practical soutions have come from iteration in a team

quote:

how important are book-learning and "real" programming experience respectively?

depends on the code you're writing

tef
May 30, 2004

-> some l-system crap ->
while we're at it, how do i get a job thanks

tef
May 30, 2004

-> some l-system crap ->
i guess i should get a linked in

tef
May 30, 2004

-> some l-system crap ->

eschaton posted:

also tef what do you want to do?

pay rent

tef
May 30, 2004

-> some l-system crap ->

jony neuemonic posted:

e/n incoming :toot:

how do i get better at greenfield development?

practice

it's mostly about avoiding the pitfalls than getting things right first time

the worst code is always an extensible mess, that requires changing things all over the code base, to support some need that never materialised


quote:

i think i'm pretty good at working with existing stuff and i haven't had much trouble walking into some pretty gnarly projects and getting a handle on things, but whenever i get asked to build something from scratch i panic and deliver (imo) pretty bad work.

welcome to the other end of the sewer

quote:

it's real demoralizing because i don't want to do bad work, and it makes what should be a good opportunity (no more legacy code!) into a huge source of stress.

bad code gets fixed, good code rots, but lovely code lasts forever

Adbot
ADBOT LOVES YOU

tef
May 30, 2004

-> some l-system crap ->
the best advice i could give would end up rephrasing this effort post i made in this or the other thread, ages ago

http://programmingisterrible.com/post/139222674273/write-code-that-is-easy-to-delete-not-easy-to

but really, it's still the same old stuff i keep banging on about

  • Locked thread