Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS

uncurable mlady posted:

im the cloud windows server that is ready to roll in 30 seconds after power on including domain join and application start

ur not real?????

Adbot
ADBOT LOVES YOU

crap nerd
May 24, 2008

Luigi Thirty posted:

ugh how the gently caress does glVertexAttribPointer work

i got an NSOpenGLView to display in my window and clear to black at least but how to polygon :mad:

im looking through the last bit of opengl i wrote a while ago wishing i had commented every drat line of it cos goddamn writing the opengl interface is a pain

after you've bound your vertex array and enabled the your attribute array (glEnableVertexAttribArray) you can use glVertexAttribPointer to specify the data layout in that array

the first value is the array you're defining (the one you've just enabled), the second is the number of values (like a 3 for a vec3) and the third is the type (prob GL_FLOAT). i think the last two are offsets if you were doing something weird, i just have them both as zero

Arcsech
Aug 5, 2008

feedmegin posted:

Embedded? What is 'operating system'

if you're running embedded without an os you're using c or c++ anyway so c#'s entire tier of language is already out

NihilCredo
Jun 6, 2011

iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

Arcsech posted:

if you're running embedded without an os you're using c or c++ anyway so c#'s entire tier of language is already out

serious question: can you take the haskell compiler's "c--" output (https://en.wikipedia.org/wiki/C--) and run it on most (or any, really) embedded chip?

Sapozhnik
Jan 2, 2005

Nap Ghost
serious answer: no, because an "embedded" chip has anywhere from 256 bytes to maybe 128KB of SRAM at the absolute most. also like, maybe 128KB of program ROM.

you're not going to run a compacting gc in that sort of footprint.

even using malloc() at these scales is almost always a bad idea.

Luigi Thirty
Apr 30, 2006

Emergency confection port.

My Atari hardware has 4KB of RAM and 128KB of ROM

good luck running anything in that

MononcQc
May 29, 2007

Cheekio posted:

What sort of programming do you do where data structures matter? I've been working in high level languages for my entire career so I've only run into algorithms as a bottleneck when it was literally a problem of stupidly nesting loops.

currently I was just toying with them and brushing up, but mostly specific data structures become useful when you find specific bottlenecks or access patterns that are less than optimal in your system but need to be better.

Treaps themselves are not specifically better compared to any balanced tree, but they just show a nifty way to get it similar results by using random weights.

The thing I'm looking at is figuring out if treaps can be used as splay trees would be (hot keys are moved closer to the root) but with the ability to override how close to the root a tree node should be. This could let you do things like keep using key/val access, ordering (in case you want ranges or min/max values), but giving hot nodes a space closer to the root of the tree so operations on them are faster average even if the worst case is still a probabilistic O(log n).

I have no use for it right now, but the possibilities are cool. Something like being able to keep data about say servers behind given IPs quickly available, operate over ranges of addresses, and moving servers ones nearer to the root so that you get good perf without compromising on the rest (or can even seek by load by using the tree as a max-heap)

MononcQc fucked around with this message at 23:35 on Jul 30, 2017

Brain Candy
May 18, 2006

the 'free' desire path monitoring is v. nice, but aren't you compromising on the rest? check out those linear worst cases

Brain Candy
May 18, 2006

isn't the Lesson of Quicksort to always check the worst case and that when you think it won't happen to you, you are wrong

akadajet
Sep 14, 2003

Luigi Thirty posted:

My Atari hardware has 4KB of RAM and 128KB of ROM

good luck running anything in that
Is "My Atari hardware" the new "my girlfriend"?

Because that would rule

MononcQc
May 29, 2007

Brain Candy posted:

the 'free' desire path monitoring is v. nice, but aren't you compromising on the rest? check out those linear worst cases

yeah the key to keeping it working you need to ensure a good overall distribution. Fully randomized treaps (as they are originally intended) don't guarantee O(log n) but mathematically come pretty close to it no matter what. The question would be if there is such a metric or metric modification that could keep good overall tree balance while being able to bubble entries up (say, +1% to the weight when being picked, but -0.01% when being on the path but not selected), but also whether it would be faster still than just maintaining 3 distinct data structures for each use case.

I'd need to play with it to see how it goes, but so far I haven't taken the time for it.

Luigi Thirty
Apr 30, 2006

Emergency confection port.

akadajet posted:

Is "My Atari hardware" the new "my girlfriend"?

Because that would rule

yeah atari system 1 is my girlfriend. amiga is my boyfriend. i'm poly-play

DONT THREAD ON ME
Oct 1, 2002

by Nyc_Tattoo
Floss Finder
so, i'm working on speeding up my memcached clone and I'm currently messing with the way access to the backing store is synchronized. I'm currently testing two different methods:

1) There's a mutex around the backing LRU cache (both reads and writes need the mutex, because the LRU cache uses a linked hash map and getting items rearranges the data). Each request needs to wait for the lock. This takes about 25 micros per request.

2) I've got a single worker with unsynchronized access to the store, running in a loop. When a request comes in, I push the work onto a dequeue. The worker spins until work gets added to the queue, it pops the work off the queue, does the work (adding/removing/getting items from the cache), and then sends a response via a channel. Despite all of the ceremony, this method seems to take about 11-13 micros per request. It's faster!

I guess my question is: what's the best way to handle this (assuming i'm building a memcached-like key value store, with performance being the emphasis)? Memcached uses slabs and I'm still reading up on understanding how slabs would be used for caching.

also sorry this question is stupid, i smoked too much before posting it

DONT THREAD ON ME fucked around with this message at 01:00 on Jul 31, 2017

jony neuemonic
Nov 13, 2009

the worst part of c# is that every ms framework pre-mvc was bad, and it's harder than it should be to find anyone not still running their business on a disgusting mountain of web forms.

well that and whatever the hell is going on with .net core.

my homie dhall
Dec 9, 2010

honey, oh please, it's just a machine

MALE SHOEGAZE posted:

so, i'm working on speeding up my memcached clone and I'm currently messing with the way access to the backing store is synchronized. I'm currently testing two different methods:

1) There's a mutex around the backing LRU cache (both reads and writes need the mutex, because the LRU cache uses a linked hash map and getting items rearranges the data). Each request needs to wait for the lock. This takes about 25 micros per request.

2) I've got a single worker with unsynchronized access to the store, running in a loop. When a request comes in, I push the work onto a dequeue. The worker spins until work gets added to the queue, it pops the work off the queue, does the work (adding/removing/getting items from the cache), and then sends a response via a channel. Despite all of the ceremony, this method seems to take about 11-13 micros per request. It's faster!

I guess my question is: what's the best way to handle this (assuming i'm building a memcached-like key value store, with performance being the emphasis)? Memcached uses slabs and I'm still reading up on understanding how slabs would be used for caching.

also sorry this question is stupid, i smoked too much before posting it

Depending on how important it is to you that the LRU property holds, I would think you might be able to improve performance by queuing up the rearrangement operations to be during writes/updates. That way your reads can stay concurrent and you can do the non-essential rearrangements when you're holding the lock anyway.

This might open up new cans of worms to be solved however (eg starvation of your threads trying to write)

my homie dhall fucked around with this message at 03:36 on Jul 31, 2017

tef
May 30, 2004

-> some l-system crap ->

MALE SHOEGAZE posted:

so, i'm working on speeding up my memcached clone and I'm currently messing with the way access to the backing store is synchronized. I'm currently testing two different methods:

1) There's a mutex around the backing LRU cache (both reads and writes need the mutex, because the LRU cache uses a linked hash map and getting items rearranges the data). Each request needs to wait for the lock. This takes about 25 micros per request.

2) I've got a single worker with unsynchronized access to the store, running in a loop. When a request comes in, I push the work onto a dequeue. The worker spins until work gets added to the queue, it pops the work off the queue, does the work (adding/removing/getting items from the cache), and then sends a response via a channel. Despite all of the ceremony, this method seems to take about 11-13 micros per request. It's faster!

I guess my question is: what's the best way to handle this (assuming i'm building a memcached-like key value store, with performance being the emphasis)? Memcached uses slabs and I'm still reading up on understanding how slabs would be used for caching.

also sorry this question is stupid, i smoked too much before posting it

it depends

1) will work out, but you can scale it out by partitioning your tree into ranges and having one lock per range

2) this will work but honestly how much faster is this than a single thread and a select loop

if performance is your emphasis, well, you want to be able to do cheap reads

something like read copy update might work better

a) partition hash keyspace into N tables
b) put an insert/delete lock around each hash table, with a version number
c) use some clever read locks and do compare-and-swap for updates

write protocol:
- obtain writer lock
- compare and swap in new data which points to old data
- increment version number
- wait for old readers to wrap up
- trash old data

read protocol:
- register as reader at current version number,
- do lookup
- read older data if new record is above current version
- unregister

Jabor
Jul 16, 2010

#1 Loser at SpaceChem

MALE SHOEGAZE posted:

so, i'm working on speeding up my memcached clone and I'm currently messing with the way access to the backing store is synchronized. I'm currently testing two different methods:

1) There's a mutex around the backing LRU cache (both reads and writes need the mutex, because the LRU cache uses a linked hash map and getting items rearranges the data). Each request needs to wait for the lock. This takes about 25 micros per request.

2) I've got a single worker with unsynchronized access to the store, running in a loop. When a request comes in, I push the work onto a dequeue. The worker spins until work gets added to the queue, it pops the work off the queue, does the work (adding/removing/getting items from the cache), and then sends a response via a channel. Despite all of the ceremony, this method seems to take about 11-13 micros per request. It's faster!

I guess my question is: what's the best way to handle this (assuming i'm building a memcached-like key value store, with performance being the emphasis)? Memcached uses slabs and I'm still reading up on understanding how slabs would be used for caching.

also sorry this question is stupid, i smoked too much before posting it

3) use a concurrent cache implementation that has a better locking strategy than "lock the whole cache for every read"

my homie dhall
Dec 9, 2010

honey, oh please, it's just a machine

tef posted:

write protocol:
- obtain writer lock
- compare and swap in new data which points to old data
- increment version number
- wait for old readers to wrap up
- trash old data

read protocol:
- register as reader at current version number,
- do lookup
- read older data if new record is above current version
- unregister

this is essentially just mvcc right?

tef
May 30, 2004

-> some l-system crap ->

Ploft-shell crab posted:

this is essentially just mvcc right?

kinda

read copy update is a concurrency pattern that keeps old values around (i.e old readers stall writers, writers (somewhat block) new readers iirc)
but there isn't ordering across partitons/writers

mvcc is similar in that you have multiple values, and you're doing concurrency control, but that normally comes with a larger transactional protocol,
and synchronising with the write ahead log

and depending on the isolation level, you can get some ordering

Luigi Thirty
Apr 30, 2006

Emergency confection port.

crap nerd posted:

im looking through the last bit of opengl i wrote a while ago wishing i had commented every drat line of it cos goddamn writing the opengl interface is a pain

after you've bound your vertex array and enabled the your attribute array (glEnableVertexAttribArray) you can use glVertexAttribPointer to specify the data layout in that array

the first value is the array you're defining (the one you've just enabled), the second is the number of values (like a 3 for a vec3) and the third is the type (prob GL_FLOAT). i think the last two are offsets if you were doing something weird, i just have them both as zero

yeah gently caress opengl, i rewrote the drawing code to to render to a CVPixelBuffer instead of just an array of bytes and it's a zillion times faster. thanks stebe for providing such wonders

Wheany
Mar 17, 2006

Spinyahahahahahahahahahahahaha!

Doctor Rope

akadajet posted:

Is "My Atari hardware" the new "my girlfriend"?

Because that would rule

i have an atari, but it's in canada.

Xarn
Jun 26, 2015

tef posted:

it depends

1) will work out, but you can scale it out by partitioning your tree into ranges and having one lock per range

2) this will work but honestly how much faster is this than a single thread and a select loop

if performance is your emphasis, well, you want to be able to do cheap reads

something like read copy update might work better

a) partition hash keyspace into N tables
b) put an insert/delete lock around each hash table, with a version number
c) use some clever read locks and do compare-and-swap for updates

write protocol:
- obtain writer lock
- compare and swap in new data which points to old data
- increment version number
- wait for old readers to wrap up
- trash old data

read protocol:
- register as reader at current version number,
- do lookup
- read older data if new record is above current version
- unregister

I am assuming that reads are waaaaay more common than updates, so I'd just dispense with the ceremony and do QSBR RCU hash table. Basically free reads, writer stall can be configured depending on your contention and worst case is that there will be slightly old data (seeking ordered versions of data in cache is foolish errand anyway).

DONT THREAD ON ME
Oct 1, 2002

by Nyc_Tattoo
Floss Finder

tef posted:

it depends

1) will work out, but you can scale it out by partitioning your tree into ranges and having one lock per range

2) this will work but honestly how much faster is this than a single thread and a select loop

if performance is your emphasis, well, you want to be able to do cheap reads

something like read copy update might work better

a) partition hash keyspace into N tables
b) put an insert/delete lock around each hash table, with a version number
c) use some clever read locks and do compare-and-swap for updates

write protocol:
- obtain writer lock
- compare and swap in new data which points to old data
- increment version number
- wait for old readers to wrap up
- trash old data

read protocol:
- register as reader at current version number,
- do lookup
- read older data if new record is above current version
- unregister

thanks friend, ill give this a shot along with using slab storage for the partitions. sounds like fun!

Fergus Mac Roich
Nov 5, 2008

Soiled Meat

crap nerd posted:

im looking through the last bit of opengl i wrote a while ago wishing i had commented every drat line of it cos goddamn writing the opengl interface is a pain

after you've bound your vertex array and enabled the your attribute array (glEnableVertexAttribArray) you can use glVertexAttribPointer to specify the data layout in that array

the first value is the array you're defining (the one you've just enabled), the second is the number of values (like a 3 for a vec3) and the third is the type (prob GL_FLOAT). i think the last two are offsets if you were doing something weird, i just have them both as zero

I believe you need the stride parameter to put your texture coordinates in the buffer with your vertices

Idk if that's a weird thing to do but it's how I learned it

I also found it really hard to keep the opengl api calls straight, kinda curious to poke at Vulkan but I also don't want to find out that I'm too dumb to use it

Fergus Mac Roich fucked around with this message at 13:25 on Jul 31, 2017

Symbolic Butt
Mar 22, 2009

(_!_)
Buglord

cinci zoo sniper posted:

"no actually all you need is spermacs with this section of curated plugins passed down the generations"

making everything a web app is bad, even, or in some cases especially, if it's electron or nw.js

I'm renaming my bespoke spacemacs setup to spermacs just so you know

CRIP EATIN BREAD
Jun 24, 2002

Hey stop worrying bout my acting bitch, and worry about your WACK ass music. In the mean time... Eat a hot bowl of Dicks! Ice T



Soiled Meat

AWWNAW posted:

mono is poo poo. the only alternative to .net slow stubborn demise is .net core and they’re loving that up something awful too. they waited about 5 years too long to even start on cross platform and now it’s way too late. once anyone gets their .net poo poo running on Linux they gonna realize there are better options

when .net core first came out i followed the "getting started" thing from MS on OS X and it just spewed out a ton of exceptions during the build.

so then i used the official docker image, and it got one step further but also just dumped a ton of exceptions.

if you're build system is that loving janky why would i ever bother?

FlapYoJacks
Feb 12, 2009

Arcsech posted:

if you're running embedded without an os you're using c or c++ anyway so c#'s entire tier of language is already out

If you are running embedded without an OS your company/you are poo poo/poor and can't spend an extra dollar on a processor that can run Linux.

Sapozhnik
Jan 2, 2005

Nap Ghost
Yeah or maybe your power budget isn't as high as tens of milliamps, gently caress face

cinci zoo sniper
Mar 15, 2013




ratbert90 posted:

If you are running embedded without an OS your company/you are poo poo/poor and can't spend an extra dollar on a processor that can run Linux.
computing on satellites solved, boom

CRIP EATIN BREAD
Jun 24, 2002

Hey stop worrying bout my acting bitch, and worry about your WACK ass music. In the mean time... Eat a hot bowl of Dicks! Ice T



Soiled Meat

ratbert90 posted:

If you are running embedded without an OS your company/you are poo poo/poor and can't spend an extra dollar on a processor that can run Linux.

yeah let me run a full OS on something that is basically just a glorified PID controller

FlapYoJacks
Feb 12, 2009

Sapozhnik posted:

Yeah or maybe your power budget isn't as high as tens of milliamps, gently caress face

lol fine, but if you can be > 50mAh there's no reason not to run an actual processor.


CRIP EATIN BREAD posted:

yeah let me run a full OS on something that is basically just a glorified PID controller


Hell yeah, IOT PID controller!

Sapozhnik
Jan 2, 2005

Nap Ghost
i mean yeah linux rulez and all but realistically you need DRAM for that, and that makes your layout work and PCB costs a whole heck of a lot more expensive right there.

you can do a lot with bare metal firmware.

CRIP EATIN BREAD
Jun 24, 2002

Hey stop worrying bout my acting bitch, and worry about your WACK ass music. In the mean time... Eat a hot bowl of Dicks! Ice T



Soiled Meat
this is a nice design and all, but wouldn't you be happier bit-banging uart in a kernel thread?

VikingofRock
Aug 24, 2008




CRIP EATIN BREAD posted:

this is a nice design and all, but wouldn't you be happier bit-banging uart in a kernel thread?

I'd read that thread

tef
May 30, 2004

-> some l-system crap ->

Xarn posted:

I am assuming that reads are waaaaay more common than updates, so I'd just dispense with the ceremony and do QSBR RCU hash table. Basically free reads, writer stall can be configured depending on your contention and worst case is that there will be slightly old data (seeking ordered versions of data in cache is foolish errand anyway).

this is probably simpler than partitioning but it's kinda one of those 'everything is a nail' solutions for me.

i have a hazy memory of rcu specifics and felt lazy enough, also i love epochs over checkpoints/qsbr

i mention the writer lock/insert lock as i kinda assume they've wrapped a normal hash table somehow in golang

the biggest problem with concurrent hash tables is creates and deletes so that's kinda why i was suggesting partitioned locks

so you don't end up in weird fuckups of writers contesting or reading uninitalized data

MALE SHOEGAZE posted:

thanks friend, ill give this a shot along with using slab storage for the partitions. sounds like fun!

please be sure and write performance tests

- cache warm up time (writes under no reads)
- read time under low writes/churn
- read time under high churn/writes

like, you have a lot of leeway to go hog wild

there's fun things to do, and you're already sorta doing them: instead of a lock, you can have a single process that takes requests and vice/verse

you can even do things like replacing the channels with a ring buffer, and maybe cheating a bit,

i.e instead of having threads that take a read/write lock, or a single thread that does all read/writes

you can have a ring buffer instead of a channel

network threads write to front of buffer, read from tail
lru threads: reader threads use an atomic integer to race through incoming lookups, but spin when they hit a write operation, single writer thread races ahead of read operations
and the single writer thread can handle eviction, etc.

but tbh it's probably needless optimization, we didn't find 1 redis per core went faster than 1 redis and gently caress the other cores

tef
May 30, 2004

-> some l-system crap ->
it realllly depends where your costs are

if serializing/deserializing is expensive, then yeah, doing that in a thread pool will be cheap

but you could likely write something that uses a single main thread that does the network i/o and all read requests,

and fire off the create/update requests in backgroud routines (leaving tombstones for deletes), and have the main thread tag in when it's safe to evict old records

eschaton
Mar 7, 2007

Don't you just hate when you wind up in a store with people who are in a socioeconomic class that is pretty obviously about two levels lower than your own?

CRIP EATIN BREAD posted:

yeah let me run a full OS on something that is basically just a glorified PID controller

well yeah, how else are you going to use node.js to implement it

gotta get close to the metal you know

CRIP EATIN BREAD
Jun 24, 2002

Hey stop worrying bout my acting bitch, and worry about your WACK ass music. In the mean time... Eat a hot bowl of Dicks! Ice T



Soiled Meat

eschaton posted:

well yeah, how else are you going to use node.js to implement it

gotta get close to the metal you know

hey, today i learned you can write postgres functions in javascript (uses v8 as the js engine), so anythings possible!

redleader
Aug 18, 2005

Engage according to operational parameters
json parsing in hipster weblang "elm" is so lovely that there's a $29 ebook available that goes through all the available options

you'd think json handling in a language designed for the web would be just a smidge better thought out and integrated

Adbot
ADBOT LOVES YOU

redleader
Aug 18, 2005

Engage according to operational parameters
lmao, here's one of its chapters: "How Can I Decode A JSON Object With More Than 9 Fields?"

truly cutting edge stuff

  • Locked thread