Dr. Erlang Love or: How I Learned to Stop Worrying and Love Crashing

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Dr. Erlang Love or: How I Learned to Stop Worrying and Love Crashing

«‹›5 »

Paul MaudDib: May 3, 2006; TEAM NVIDIA:
FORUM POLICE

Way back when I took a course in Prolog, and I've always wanted to pick Erlang up to have a more practical use for that stuff.

I threw 5.3 into a file and compiled it, and it always kills the interpreter after around 130-140k recursions. Am I running into some kind of memory limit or infinite recursion protection for the interpreter or something? It should be tail recursive, so I'm not sure why it's crashing.

code:

-module(loop).
-export([loop/1]).

loop(N) ->
    io:format("~w~n", [N]),
    loop(N+1).

code:

>loop:loop(0).
0
...
139983
Killed

# ? Jul 22, 2013 23:45

Adbot: ADBOT LOVES YOU

# ? Apr 27, 2024 17:51

MononcQc: May 29, 2007

Paul MaudDib posted:

Way back when I took a course in Prolog, and I've always wanted to pick Erlang up to have a more practical use for that stuff.

I threw 5.3 into a file and compiled it, and it always kills the interpreter after around 130-140k recursions. Am I running into some kind of memory limit or infinite recursion protection for the interpreter or something? It should be tail recursive, so I'm not sure why it's crashing.
code:
-module(loop).
-export([loop/1]).

loop(N) ->
    io:format("~w~n", [N]),
    loop(N+1).
code:
>loop:loop(0).
0
...
139983
Killed

I've just tried this snippet. It works fine (went well over 10 times your problem) on R15 and R16 versions of the VM.

I'm pretty dumbfounded that something like that would kill your node. How did you get Erlang installed? Do you have any code being loaded in a .erlang file in your home directory?

# ? Jul 27, 2013 14:26

MononcQc: May 29, 2007

I stole this entry from the PL thread in YOSPOS where I posted it yesterday, and it fits fairly well in here.

For the last couple of weeks, I've been playing a game of optimizing and hunting down all kinds of bottlenecks, memory hogs, processes with too many messages, analyzing crash dumps and whatever om some of our systems so that we'd finally get rid of pretty much 99% of non-actionable alerts over some clusters.

It's the kind of poo poo you end up to do on any drat software stack every once in a while once it gets to be under load for a good while in production, and is fairly unavoidable as an activity. What changes, though, is how you do it.

One thing I wanted to do, for example, is figure what connections (out of more than 20,000) end up taking most of the bandwidth on the server at any given time, either on input or output, to be able to characterize the extremes of the load we may receive. Doing this in many languages or systems would generally require to wrap whatever socket operations are being done in a counter of some sort if it's not done already, and then poll or log the results somewhere. If your application is mostly IO bound over the network, logging that data for arbitrary levels of precision can have a rather damaging effect on the quality of service.

Erlang has this nice thing where stats are automatically accumulated for all socket activity, in advance. Every Erlang socket is called a port, and they're a language construct that wraps a socket, a file descriptor, or whatever else in something that looks like an Erlang process to the rest of the language. The statistics for any of them are available by calling a function called inet:getstat/1 on any given port. Moreover, this function can be combined with some other calls to list all the ports around.

By doing this, you can get the data for all the ports in a program from the Erlang shell, ask for how much data they've seen, make a list and sort it up. You could also wait for a few milliseconds, make a second list, make a difference between both, and get a snapshot of how what ports saw the most data at any interval of time. I ended up writing such functions and putting them in a small library called recon.

I wrote both functions in a short while, loaded the module and could find the biggest culprits in a matter of seconds by calling recon:inet_window(Metric, Number, IntervalToMeasure):

code:

(node@some-ip.ec2.internal)1> recon:inet_window(oct, 10, 1000).
[{#Port<0.336707>,736420,[{recv_oct,731950},{send_oct,4470}]},
 {#Port<0.336796>,627169,[{recv_oct,624338},{send_oct,2831}]},
 {#Port<0.336783>,298342,[{recv_oct,297150},{send_oct,1192}]},
 {#Port<0.336803>,270491,[{recv_oct,268703},{send_oct,1788}]},
 {#Port<0.336819>,177074,[{recv_oct,590276},{send_oct,1341}]},
 {#Port<0.336840>,168324,[{recv_oct,165791},{send_oct,2533}]},
 {#Port<0.336782>,136014,[{recv_oct,446228},{send_oct,1341}]},
 {#Port<0.336801>,123242,[{recv_oct,744971},{send_oct,2533}]},
 {#Port<0.336748>,109884,[{recv_oct,966680},{send_oct,4023}]},
 {#Port<0.336736>,98852,[{recv_oct,208764},{send_oct,1043}]}]

For this very second, the ports above had given me the most traffic, and I can see whatever was the input or the output of each, in bytes. The search can also be done by output or input only. So we found the busiest sockets at a time. Big deal -- we have no idea who owns them. To figure that out, I can use the erlang:process_info/2 function along with some port introspection to figure out what Erlang process owns which socket. I ended up wrapping some more of that functionality in a call:

code:

(node@ip.ec2.internal)2> [{Pid,recon:info(Pid)} || {Port,_,_} <- recon:inet_window(oct, 3, 1000),
(node@ip.ec2.internal)2>                           {_,Pid} <- [erlang:port_info(Port,connected)]].
[{<0.2630.513>,
  [{meta,[{registered_name,...},
          {dictionary,[{'$ancestors', ...},
                       {'$initial_call', {tcp_proxy,init,1}}]},
          {group_leader,<0.100.0>},
          {status,runnable}]},
   {signals,[{links,[<0.30588.0>,#Port<0.33985065>]},
             {monitors,[]},
             {monitored_by,[]},
             {trap_exit,false}]},
   {location,[{initial_call, ...},
              {current_stacktrace,[...]}]},
   {memory,[{memory,109344},
            {message_queue_len,0},
            {heap_size,6772},
            {total_heap_size,13544},
            {garbage_collection,[...]}]},
   {work,[{reductions,2833819}]}]},
 {<0.14755.56>,
  [{meta,[...]},
   {signals,[...]},
   {location,[...]},
   {memory,[...]},
   {work,[...]}]},
 {<0.30316.55>,
  [{meta,[...]},
   {signals,[...]},
   {location,[...]},
   {memory,[...]},
   {work,[...]}]}]

That's an ugly dump, but I can know, for example, that one of our TCP proxies is the one having the most network IO going through it at the time, and that its pid is <0.2630.513>. I can call a function and get that process' entire internal state (if stack traces and whatnot weren't enough already):

code:

(node@ip.ec2.internal)3> recon:get_state("<0.2630.513>").
{state,#Port<0.33985065>,
        {buf,<<"618 <133>1 2013-08-12T13:58:03.6 buffer data"...>>,
             295},
        {{11,12,13,14},59900},
        {1375,868792,992970},
        {my_own_worker,{state,{re_pattern,1,0, <<69,82,67,...>>},
                               {{dict,...}}},
                                34028236692093846346337460743176821146,
                                {1375,810971,924209}}}}}

Where in this case, the dict contains info letting me know what interface that proxy is listening one (which I can inspect by finding whatever is in the buffer, too, or the sharding information of the worker). I could use this information to dynamically change their buffer size, disable some call, give them a special shard, or whatever.

The fun thing about these inspection functions to make time windows and whatnot is that they're usable for anything else. So if you're looking for processes leaking memory, you can either look at them in the absolute or over a sliding time window:

code:

(node@ip.ec2.internal)4> recon:proc_count(memory, 5).
[{<0.121.0>,162095008,
  [my_stats,
   {current_function,{io,wait_io_mon_reply,2}},
   {initial_call,{proc_lib,init_p,5}}]},
 ...}]
(node@ip.ec2.internal)5> recon:proc_window(memory, 5, 5000).
[{<0.12493.0>,688584,
  [{current_function,{gen_fsm,loop,7}},
   {initial_call,{proc_lib,init_p,5}}]},
 ...}]

Which lets me see both what processes hold all the memory right now (the stats process, which holds a shitload of counters), but also what kinds of processes are allocating it the most as we speak (some FSM doing actual work), to see where the churn and work is actually going. I could search by other attributes, such as 'reductions' (CPU used), message_queue_len (mailbox sizes, to identify points of contention), stack of their size, heap, etc. I can force garbage collection on a process, see attribute changes, and spot memory leaks that way, if I want to. I can even do it over the entire node and find which processes leak the most memory of a certain type in general, if at all.

Oh yeah, and all of this can be done remotely over entire clusters to find the worst everywhere:

code:

(node@ip.ec2.internal)6> recon:rpc(fun() -> recon:proc_count(memory, 1) end).
[[{<0.121.0>,144982376, [my_stats, ...]}],
 [{<8348.121.0>,162094936, [my_stats, ...]}],
 [{<8350.121.0>,208774192, [my_stats, ...]}],
 ...]

Sweet, it looks like the biggest memory consumer is the same everywhere (my_stats)! Sounds like we forgot to clear some inactive counters and moving to a lazy scheme will allow to reduce good chunks of unused memory. Took just 2 minutes to run it and find what could be the leak source.

The best thing about all of that is that it's non-blocking, read-only, and generally safe to run on any number of production nodes remotely without impacting quality of service at all.

I don't know how many other languages can give you that kind of run-time introspection, but I know I feel like poo poo every time I need to go back to "reproducing poo poo locally" or "debugging via logging or printf". poo poo is much harder and requires an ungodly amount of redeploys, compared to just digging into a running system for whatever information you need, even across a cluster to give you more data points. Of course if you had a decent stats/graphing/reporting system in place already, and that all the data you need is in there, that's even better, but you're likely not going to get the same level of granularity.

So far I'm pretty happy with Recon as a library and I'm trying to inject it into more work projects :toot:

^{_{this post brought to you by your local department of Erlang propaganda.}}

# ? Aug 13, 2013 12:30

Workaday Wizard: Oct 23, 2009; by Pragmatica

MononcQc posted:

...Cool poo poo...

Post this in your blog.

E: Pressed post too soon:

What projects do people use Erlang in? By that I mean, what happened that made you go "I need Erlang for this!"?

Workaday Wizard fucked around with this message at 16:25 on Aug 13, 2013

# ? Aug 13, 2013 16:22

Cocoa Crispies: Jul 20, 2001; Vehicular Manslaughter!; Pillbug

Shinku ABOOKEN posted:

Post this in your blog.

E: Pressed post too soon:

What projects do people use Erlang in? By that I mean, what happened that made you go "I need Erlang for this!"?

From Justin Sheehy, CTO at Basho:

quote:

We use the Erlang/OTP programming language in building our products here at Basho. We made that choice consciously, believing that it would be a tradeoff � significant benefits balanced by a handful of costs. I am often asked if we would make the same choice all over again. To answer that question I need to address the tradeoff we thought we were making.

The single most compelling reason to choose Erlang was the attribute for which it is best known: extremely high availability. The original design goal for Erlang was to enable rapid development of highly robust concurrent systems that �run forever.� The poster child of its success (outside Riak, of course) is the AXD 301 ATM switch, which reportedly delivers at or better than �nine nines� (99.9999999%) of uptime to customers. Since when we set out to build a database for applications requiring extremely high availability, Erlang was a natural fit.

We knew that Erlang�s supervisor concept, enabling a �let it crash� program designed for resilience, would be a big help for making systems that handle unforeseen errors gracefully. We knew that lightweight processes and a �many-small-heaps� approach to garbage collection would make it easier to build systems not suffering from unpredictable pauses in production. Those features paid off exactly as expected, and helped us a great deal. Many other features that we didn�t understand the full importance of at the time (such as the ability to inspect and modify a live system at run-time with almost no planning or cost) have also helped us greatly in making systems that our users and customers trust with their most critical data.

It turns out that our assessment of the key trade-off � a more limited pool of talented engineers � is, in practice, not a problem for a company like Basho. We need to hire great software developers, and we tend to look for ones with particular skills in areas like databases and/or distributed systems. If someone is a skilled programmer in relatively arcane disciplines like those, then the ability to learn a new programming language will not be daunting. While it�s theoretically a nice bonus for someone to bring knowledge of all the tools we use, we�ve hired a significant number of engineers that had no prior Erlang experience and they�ve worked out well.

This same purported drawback is a benefit in some ways. By not just looking for �X Engineers� (where X is Java, Erlang, or anything else), we make a statement both about our own technology decision-making process and the expected levels of interesting work at Basho. To help me work on my house, I�d rather have someone who self-identifies as an �expert carpenter� or �expert plumber,� not �expert hammer wielder,� even in the cases where most of the job might involve that tool. We expect developers at Basho to exercise deep, broad interests and expertise, and for them to do highly creative work. When we mention Erlang and the other thoughtful decisions we made in building our products, they value the roadmap and leadership.

I had an entertaining and ironic conversation about this recently with a manager at a large database company. He explained to me that we had clearly made the wrong choice, and that we should have chosen Java (like his team) in order to expand the recruiting pool. Then, without breaking a stride, he asked if I could send any candidates his way, to fill his gaps in finding talented people.

We continue to grow and to bring on great new engineers.

That�s not to say that there are no downsides. Any language, runtime, and community will bring with it different constraints and freedoms, making some tasks easier and others less so. We�ve done some work over the years to participate in the highly supportive Erlang community. But the big organizational weakness that so many people thought would come with the choice? It�s simply not a problem.

That lesson, combined with the ongoing technical advantages we enjoy because of Erlang, makes it easy to answer the question:

Yes, we would absolutely choose Erlang today.

# ? Aug 13, 2013 17:21

Rapsey: Sep 29, 2005

Shinku ABOOKEN posted:

Post this in your blog.

E: Pressed post too soon:

What projects do people use Erlang in? By that I mean, what happened that made you go "I need Erlang for this!"?

Does it run as a server? Erlang is probably the best choice. It is a language built for writing servers.

# ? Aug 13, 2013 19:53

MononcQc: May 29, 2007

Shinku ABOOKEN posted:

Post this in your blog.

E: Pressed post too soon:

What projects do people use Erlang in? By that I mean, what happened that made you go "I need Erlang for this!"?

Maybe I'll add it to my blog, reword it and whatnot. It's been a while since I've been truly excited about using one of my libs for myself, rather than doing it then moving on.

I've used Erlang for:

Chat apps, because lots of users and message passing
Real Time Bidding software, because low-latency requirements (soft real time), massive levels of concurrency, and constant system overload (which Erlang rules at)
I'm currently using it at Heroku, as part of the routing team, both for HTTP routing and log routing.

Most use cases I've made of Erlang had a few common points in terms of massive concurrency, a lot of time spent over heavy load, strict time constraints, requiring to be always up (shutting down to upgrade means losing money/user data). I've been served well so far.

Cocoa Crispies posted:

From Justin Sheehy, CTO at Basho:

quote:
I had an entertaining and ironic conversation about this recently with a manager at a large database company. He explained to me that we had clearly made the wrong choice, and that we should have chosen Java (like his team) in order to expand the recruiting pool. Then, without breaking a stride, he asked if I could send any candidates his way, to fill his gaps in finding talented people.

^ this is my favorite bit of the whole thing.

E: fixed a link to the wrong blog post

MononcQc fucked around with this message at 03:38 on Aug 14, 2013

# ? Aug 14, 2013 03:17

Blotto Skorzany: Nov 7, 2008; He's a PSoC, loose and runnin'
came the whisper from each lip
And he's here to do some business with
the bad ADC on his chip
bad ADC on his chiiiiip

As I've (slowly, mea culpa) read LYSE and discussions/papers here and in the PL thread, I've started coming around to the view that there is a lot of overlap between building classical distributed systems and building industrial embedded systems (stuff with hard reliability requirements, eg. SIFs). I'm surprised there isn't more on the web relating the two, but then again I am a crazy dude that thinks there is a fundamental equivalence between statistical signal processing and classical frequency-domain signal processing v :shobon:

# ? Aug 14, 2013 03:30

Workaday Wizard: Oct 23, 2009; by Pragmatica

Thanks for the replies.

MononcQc posted:

[*] Real Time Bidding software, because low-latency requirements (soft real time), massive levels of concurrency, and constant system overload (which Erlang rules at)

I don't know why I was under the impression that Erlang couldn't do low latency stuff.

Glad to hear it does.

# ? Aug 14, 2013 03:31

MononcQc: May 29, 2007

Otto Skorzeny posted:

As I've (slowly, mea culpa) read LYSE and discussions/papers here and in the PL thread, I've started coming around to the view that there is a lot of overlap between building classical distributed systems and building industrial embedded systems (stuff with hard reliability requirements, eg. SIFs). I'm surprised there isn't more on the web relating the two, but then again I am a crazy dude that thinks there is a fundamental equivalence between statistical signal processing and classical frequency-domain signal processing vv

I'd like to hear about that more.

FWIW, there's work to make Erlang more available for bigger 'embedded' platforms (http://www.erlang-embedded.com/).

There's also a German guy I know who uses Erlang in hard real-time systems for the automotive industry by using it in RTEMS (a real time OS). The gist of his thing is that he writes the hard-real time components in C, C++, or even Ada, and gives them a priority higher than Erlang. Because they're usually smaller core components, he can then run everything that is soft-real or lower priority on the same embedded OS with a lower priority, within the Erlang VM.

There's been other interesting work in the topic, trying to make the Erlang VM work for hard real time, but I haven't heard about it in a long while and it's frankly outside of my area of expertise.

# ? Aug 14, 2013 03:47

Rapsey: Sep 29, 2005

Shinku ABOOKEN posted:

Thanks for the replies.

I don't know why I was under the impression that Erlang couldn't do low latency stuff.

Glad to hear it does.

Goldman Sachs uses erlang in their high frequency trading platform.

# ? Aug 14, 2013 04:47

Sang-: Nov 2, 2007

Rapsey posted:

Goldman Sachs uses erlang in their high frequency trading platform.

With that and a bunch of different languages, for different HFT platforms :p

# ? Aug 14, 2013 13:02

tef: May 30, 2004; -> some l-system crap ->

Shinku ABOOKEN posted:

What projects do people use Erlang in? By that I mean, what happened that made you go "I need Erlang for this!"?

I wanted to use Erlang in my last project, a distributed web crawler. Unfortunately, Erlang wasn't on the cards.

So I poorly reimplemented process supervision in python :shrek:

# ? Aug 15, 2013 16:37

The Insect Court: Nov 22, 2012; by FactsAreUseless

Any opinions on Elixir? Is it performant/stable enough to be worth learning in addition to/in place of Erlang at this point?

For reference, Elixir's a new-ish language implemented on top of the Erlang VM, with a Ruby-ish syntax.

# ? Aug 20, 2013 05:15

MononcQc: May 29, 2007

The Insect Court posted:

Any opinions on Elixir? Is it performant/stable enough to be worth learning in addition to/in place of Erlang at this point?

For reference, Elixir's a new-ish language implemented on top of the Erlang VM, with a Ruby-ish syntax.

From last page:

MononcQc posted:

Elixir is not adding too much to Erlang, IMO. Its biggest contributions are in macros, multiple modules per file, and the ability to have contracts, but otherwise most of its features will be a variation of something available in Erlang through the BEAM VM and Core Erlang (the intermediary language many can compile to, if they don't just generate an Erlang abstract parse tree), and its weaknesses will likely be a similar variation.

Then there's also the different syntax.

I think it's a nice attempt at a new language, and possibly the best alternative language on BEAM (though LFE is definitely nice too), but it doesn't offer much for people who already know Erlang outside of a change in a few semantics and the features I mentioned above. My hope for it is that it becomes a honeypot to Ruby fanboys who want Ruby and its do notation everywhere so they stop bothering the Erlang regulars about it, and those that really want to will be able to jump to Erlang from there.

Honeypot is a bit of a strong word, given Elixir can stand on its own and has its own tiny community that's still at the very flexible stage where they can modify the language as they go for what they like, but it's somewhat appropriate because at this point you still need to understand Erlang to be efficient with Elixir.

They got their first book out very lately http://pragprog.com/book/elixir/programming-elixir, so it might be the first sign of the language taking off. You'll probably still need to know a bit of Erlang to feel at home with Elixir, but I believe this is happening less and less.

I still stand by that position. I'm interested to see how Elixir grows and how the community goes about it.

I've privately held the theory for a while that Erlang is a 'different' language and that different syntax helps people drop the baggage they usually carry around (the same way it's obvious when a C programmer programs C in C++, it's obvious when a OO programmer programs OO in Erlang). I'm very eager to see how people who adopt Elixir for its friendlier syntax will deal with the [relatively] more surprising semantics of the language.

# ? Aug 20, 2013 06:14

minidracula: Dec 22, 2007; boo woo boo

MononcQc posted:

words about Elixir

So Elixir is Reia 2.0 (but with different people?)

# ? Aug 20, 2013 08:15

MononcQc: May 29, 2007

Elixir is nicer than Reia. Reia was an attempt to make an entirely new language with new semantics -- object-oriented with each object being its own process.

Compared to that, Elixir keeps its semantics closer to Erlang (not OO, and processes are used similarly as Erlang, not as Reia did it).

# ? Aug 20, 2013 08:47

minidracula: Dec 22, 2007; boo woo boo

MononcQc posted:

Elixir is nicer than Reia. Reia was an attempt to make an entirely new language with new semantics -- object-oriented with each object being its own process.

Compared to that, Elixir keeps its semantics closer to Erlang (not OO, and processes are used similarly as Erlang, not as Reia did it).

Fair enough. I was being too glib; it's nice to hear Elixir isn't trying to replace Erlang per se.

The object-as-process model is pretty much the actor model though; I wonder what might have come out of the Reia folks looking at something like E and still targeting a Ruby-ish syntax...

# ? Aug 21, 2013 02:21

MononcQc: May 29, 2007

Erlang is sometimes said to be object-oriented in the original meaning of it (each process acts as an object communicating through message passing), but you'll hit a wall if that's how you approach things. Erlang's processes are meant as a way to separate individual components to provide fault-tolerance; not to compose them and have them interacting on a level as low as function calls all the time. Representing a list or a tree node as a process is useless, while they could very well be objects in any OO language.

Erlang's processes are a way to provide fault-tolerance first. This can be tolerance to some weird hardware failure, a programmer error, corrupted data, etc. That an OO-like system emerges from it is purely accidental.

We can think of it as "OO done right" if we want, but using it in practice as if it were truly OO will likely lead to a shitload of unwarranted friction that would have been avoided by using a functional style over data structures, and keeping processes as isolated small programs that can talk to each other with messages.

That being said, Reia eventually died off and got abandoned after its author tried to get ruby blocks into Erlang and being told 'no' by members of the community, most notably by the well-informed Richard O'Keefe^{[1][2][3][4][5]}. This discussion, and the appearance of Elixir, prompted Tony Arcieri to declare that Erlang is a ghetto and he left the community to work on Celluloid, which tries to bring Erlang to Ruby, rather than his former approach of bringing Ruby to Erlang.

# ? Aug 21, 2013 06:04

Cocoa Crispies: Jul 20, 2001; Vehicular Manslaughter!; Pillbug

quote:

However, the problem with Erlang's fun syntax is, well, it isn't fun.

Ton'y a great guy and a good friend of mine but let's just say keyboards aren't the only buttons he's good at pressing.

# ? Aug 21, 2013 21:57

Mniot: May 22, 2003; Not the one you know

MononcQc posted:

Sorry for the thread necromancy, but I was wondering if anyone in here planned to be around for the Erlang Factory Lite in September, in New York City?

I'm not a frequent poster to the forums, but I'll be there and it would be fun to say hi.

I've only just started learning Erlang, and your book was highly recommended by my coworkers.

# ? Aug 29, 2013 19:21

Cocoa Crispies: Jul 20, 2001; Vehicular Manslaughter!; Pillbug

Mniot posted:

I'm not a frequent poster to the forums, but I'll be there and it would be fun to say hi.

I've only just started learning Erlang, and your book was highly recommended by my coworkers.

Did you have coworkers at Erlang DC?

# ? Aug 30, 2013 04:17

MononcQc: May 29, 2007

Mniot posted:

I'm not a frequent poster to the forums, but I'll be there and it would be fun to say hi.

I've only just started learning Erlang, and your book was highly recommended by my coworkers.

Nice! Good to hear.

---

Oh and I can't believe I forgot to post this here, but I'm giving a free webcast for O'Reilly on Tuesday on Modern Server Application Design with Erlang for a high-level tour of building Erlang apps and how that compares to traditional things, then some general design ideals to keep in mind when using Erlang for that.

I hope it's gonna be good, although I'm still working on it as I type this.

# ? Aug 30, 2013 13:06

Workaday Wizard: Oct 23, 2009; by Pragmatica

MononcQc posted:

Nice! Good to hear.

---

Oh and I can't believe I forgot to post this here, but I'm giving a free webcast for O'Reilly on Tuesday on Modern Server Application Design with Erlang for a high-level tour of building Erlang apps and how that compares to traditional things, then some general design ideals to keep in mind when using Erlang for that.

I hope it's gonna be good, although I'm still working on it as I type this.

Does O'Reilly archive webcasts?

I can't attend this one

# ? Aug 30, 2013 13:41

Blotto Skorzany: Nov 7, 2008; He's a PSoC, loose and runnin'
came the whisper from each lip
And he's here to do some business with
the bad ADC on his chip
bad ADC on his chiiiiip

MononcQc posted:

I'd like to hear about that more.

FWIW, there's work to make Erlang more available for bigger 'embedded' platforms (http://www.erlang-embedded.com/).

There's also a German guy I know who uses Erlang in hard real-time systems for the automotive industry by using it in RTEMS (a real time OS). The gist of his thing is that he writes the hard-real time components in C, C++, or even Ada, and gives them a priority higher than Erlang. Because they're usually smaller core components, he can then run everything that is soft-real or lower priority on the same embedded OS with a lower priority, within the Erlang VM.

There's been other interesting work in the topic, trying to make the Erlang VM work for hard real time, but I haven't heard about it in a long while and it's frankly outside of my area of expertise.

It's been a while, but I didn't forget this :3:

I've been real busy, but in my spare mental cycles I've been thinking more about it and to cut a long story short I think the similarity I saw boils down to a) some systems, especially SIFs, being actual distributed embedded systems of the AP variety even though they aren't normally described that way and b) unreliable communication even in systems that don't look "distributed", where it is impossible to discern whether there is excessive latency on a bus or whether the chip you're trying to talk to is transiently failing or really broken in the "magic smoke is gone" way. I will try and elaborate on this Saturday or Sunday

# ? Aug 30, 2013 14:00

Mniot: May 22, 2003; Not the one you know

Cocoa Crispies posted:

Did you have coworkers at Erlang DC?

Unlikely; they're mostly in California. I'm in the Boston area, myself. I know they went to the last SF conference.

# ? Aug 30, 2013 14:47

MononcQc: May 29, 2007

Shinku ABOOKEN posted:

Does O'Reilly archive webcasts?

I can't attend this one

Other talks that were free seem to have been made public (there's a couple of Haskell ones), so in the best of cases, it should be archived and made available. I don't know the details though.

# ? Aug 30, 2013 15:11

more like dICK: Feb 15, 2010; This is inevitable.

I'm looking to parse HTML and RSS feeds. It looks like mochiweb_html is the goto html parser, but is there anything more standalone that doesn't require bringing in mochiweb? For the RSS, is there a specific RSS library out there, or should I just stick to xmerl?

# ? Sep 6, 2013 12:37

MononcQc: May 29, 2007

leper khan posted:

This is what I do, and it's worked out really well for me. I'd still be interested in something cleaner if someone finds something though.

The "cleanest" way I can think of is if you end up using OTP releases, which basically mean you take your entire node, and crystallize it by declaring what applications it should contain, along with a few more settings regarding the kind of runtime you want. They're the canonical way to ship an Erlang system, even though a lot of people and companies (those I worked at included) don't do it that way.

If you're using reltool, you can specify custom filters about what part of applications to include or not. I have examples in the cookbook part of my book for reltool, and just today, Riak started using this method to avoid including Mnesia's include files in its project.

I put "cleanest" in quotes, because there's a significant overhead to use Reltool in terms of what you need to know just to ship something. Rebar can actually wrap around it and newer releases of Erlang should contain a self-executable to do it.

Relx is a new build tool for releases that is far easier to use, but is also far less powerful than reltool. So for your use case you might need reltool, but I'd recommend playing around with relx to get accustomed to releases and how they work.

# ? Sep 20, 2013 00:33

Cocoa Crispies: Jul 20, 2001; Vehicular Manslaughter!; Pillbug

Not Erlang per se but is anyone else going to Ricon 2013 in SF this week?

# ? Oct 28, 2013 11:38

MononcQc: May 29, 2007

Cocoa Crispies posted:

Not Erlang per se but is anyone else going to Ricon 2013 in SF this week?

I won't be there (Toronto awaits me instead), but many of my coworkers will be there.

I need to go to there next year :(

# ? Oct 28, 2013 23:45

MononcQc: May 29, 2007

So I posted this stuff in plenty of places already, but I had forgotten about this thread where this is where it makes more sense. Here's the blog post: https://blog.heroku.com/archives/2013/11/7/logplex-down-the-rabbit-hole

And here's the relevant stuff about how Erlang's memory works:

quote:

The amount returned by erlang:memory/0-1 is the amount of memory actively allocated, where Erlang terms are laid in memory; this amount does not represent the amount of memory that the OS has given to the virtual machine (and Linux doesn't actually reserve memory pages until they are used by the VM). To understand where memory goes, one must first understand the many allocators being used:

temp_alloc: does temporary allocations for short use cases (such as data living within a single C function call).

eheap_alloc: heap data, used for things such as the Erlang processes' heaps.

binary_alloc: the allocator used for reference counted binaries (what their 'global heap' is).

ets_alloc: ETS tables store their data in an isolated part of memory that isn't garbage collected, but allocated and deallocated as long as terms are being stored in tables.

driver_alloc: used to store driver data in particular, which doesn't keep drivers that generate Erlang terms from using other allocators. The driver data allocated here contains locks/mutexes, options, Erlang ports, etc.

sl_alloc: short-lived memory blocks will be stored there, and include items such as some of the VM's scheduling information or small buffers used for some data types' handling.

ll_alloc: long-lived allocations will be in there. Examples include Erlang code itself and the atom table, which stay there.

fix_alloc: allocator used for frequently used fixed-size blocks of memory. One example of data used there is the internal processes' C struct, used internally by the VM.

std_alloc: catch-all allocator for whatever didn't fit the previous categories. The process registry for named process is there.

The entire list of where given data types live can be found in the source.

By default, there will be one instance of each allocator per scheduler (and you should have one scheduler per core), plus one instance to be used by linked-in drivers using async threads. This ends up giving you a structure a bit like the drawing above, but split it in N parts at each leaf.

Each of these sub-allocators will request memory from mseg_alloc and sys_alloc depending on the use case, and in two possible ways. The first way is to act as a multiblock carrier (mbcs), which will fetch chunks of memory that will be used for many Erlang terms at once. For each mbc, the VM will set aside a given amount of memory (~8MB by default in our case, which can be configured by tweaking VM options), and each term allocated will be free to go look into the many multiblock carriers to find some decent space in which to reside.

Whenever the item to be allocated is greater than the single block carrier threshold (sbct), the allocator switches this allocation into a single block carrier (sbcs). A single block carrier will request memory directly from mseg_alloc for the first 'mmsbc' entries, and then switch over to sys_alloc and store the term there until it's deallocated.

So looking at something such as the binary allocator, we may end up with something similar to:

Whenever a multiblock carrier (or the first 'mmsbc' single block carriers) can be reclaimed, mseg_alloc will try to keep it in memory for a while so that the next allocation spike that hits your VM can use pre-allocated memory rather than needing to ask the system for more each time.

When we call erlang:memory(total), what we get isn't the sum of all the memory set aside for all these carriers and whatever mseg_alloc has set aside for future calls, but what actually is being used for Erlang terms (the filled blocks in the drawings above). This information, at least, explained that variations between what the OS reports and what the VM internally reports are to be expected. Now we needed to know why our nodes had such a variation, and whether it really was from a leak.

Fortunately, the Erlang VM allows us to get all of the allocator information by calling:
code:
[{{A, N}, Data} || A <- [temp_alloc, eheap_alloc, binary_alloc, ets_alloc,
                          driver_alloc, sl_alloc, ll_alloc, fix_alloc, std_alloc],
                   {instance, N, Data} <- erlang:system_info({allocator,Allocator})]
The call isn't pretty and the data is worse. In that entire data dump, you will retrieve the data for all allocators, for all kinds of blocks, sizes, and metrics of what to use. I will not dive into the details of each part; instead, refer to the functions I have put inside the recon library that will perform the diagnostics outlined in the next sections of this article.

To figure out whether the Logplex nodes were leaking memory, I had to check that all allocated blocks of memory summed up to something roughly equal to the memory reported by the OS. The function that performs this duty in recon is recon_alloc:memory(allocated). The function will also report what is being actively used (recon_alloc:memory(used)) and the ratio between them (recon_alloc:memory(usage)).

Fortunately for Logplex (and me), the memory allocated matched the memory reported by the OS. This meant that all the memory the program made use of came from Erlang's own term allocators, and that the leak came from C code directly was unlikely.

The next suspected culprit was memory fragmentation. To check out this idea, you can compare the amount of memory consumed by actively allocated blocks in every allocator to the amount of memory attributed to carriers, which can be done by calling recon_alloc:fragmentation(current) for the current values, and recon_alloc:fragmentation(max) for the peak usage.

By looking at the data dumps for these functions (or a similar one), Lukas figured out that binary allocators were our biggest problem. The carrier sizes were large, and their utilization was impressively low: from 3% in the worst case to 24% in the best case. In normal situations, you would expect utilization to be well above 50%. On the other hand, when he looked at the peak usage for these allocators, binary allocators were all above 90% usage.

Lukas drew a conclusion that turned out to match our memory graphs. Whenever the Logplex nodes have a huge spike in binary memory (which correlates with spikes in input, given that we deal with binary data for most of our operations), a bunch of carriers get allocated, giving something like this:

Then, when memory gets deallocated, some remnants are kept in Logplex buffers here and there, leading to a much lower rate of utilization, looking similar to this:

The result is a bunch of nearly empty blocks that cannot be freed. The Erlang VM will never do defragmentation, and that memory keeps being hogged by binary data that may take a long time to go away; the data may be buffered for hours or even days, depending on the drain. The next time there is a usage spike, the nodes might need to allocate more into ETS tables or into the eheap_alloc allocator, and most of that memory is no longer free because of all the nearly empty binary blocks.

Fixing this problem is the hard part. You need to know the kind of load your system is under and the kind of memory allocation patterns you have. For example, I knew that 99% of our binaries will be smaller or equal to 10kb, because that's a hard cap we put on line length for log messages. You then need to know the different memory allocation strategies of the Erlang virtual machine:

Best fit (bf)

Address order best fit (aobf)

Address order first fit (aoff)

Address order first fit carrier best fit (aoffcbf)

Address order first fit carrier address order best fit (aoffcaobf)

Good fit (gf)

A fit (af)

For best fit (bf), the VM builds a balanced binary tree of all the free blocks' sizes, and will try to find the smallest one that will accommodate the piece of data and allocate it there. In the drawing above, having a piece of data that requires three blocks would likely end in area 3.

Address order best fit (aobf) will work similarly, but the tree instead is based on the addresses of the blocks. So the VM will look for the smallest block available that can accommodate the data, but if many of the same size exist, it will favor picking one that has a lower address. If I have a piece of data that requires three blocks, I'll still likely end up in area 3, but if I need two blocks, this strategy will favor the first mbcs in the diagram above with area 1 (instead of area 5). This could make the VM have a tendency to favor the same carriers for many allocations.

Address order first fit (aoff) will favor the address order for its search, and as soon as a block fits, aoff uses it. Where aobf and bf would both have picked area 3 to allocate four blocks, this one will get area 2 as a first priority given its address is lowest. In the diagram below, if we were to allocate four blocks, we'd favor block 1 to block 3 because its address is lower, whereas bf would have picked either 3 or 4, and aobf would have picked 3.

Address order first fit carrier best fit (aoffcbf) is a strategy that will first favor a carrier that can accommodate the size and then look for the best fit within that one. So if we were to allocate two blocks in the diagram above, bf and aobf would both favor block 5, aoff would pick block 1. aoffcbf would pick area 2, because the first mbcs can accommodate it fine, and area 2 fits it better than area 1.

Address order first fit carrier address order best fit (aoffcaobf) will be similar to aoffcbf, but if multiple areas within a carrier have the same size, it will favor the one with the smallest address between the two rather than leaving it unspecified.

Good fit (gf) is a different kind of allocator; it will try to work like best fit (bf), but will only search for a limited amount of time. If it doesn't find a perfect fit there and then, it will pick the best one encountered so far. The value is configurable through the mbsd VM argument.

A fit (af), finally, is an allocator behavior for temporary data that looks for a single existing memory block, and if the data can fit, af uses it. If the data can't fit, af allocates a new one.

Each of these strategies can be applied individually to every kind of allocator, so that the heap allocator and the binary allocator do not necessarily share the same strategy.

Hopefully someone else than me finds this stuff super interesting :toot:

_{(images are from my S3 account at work -- no leeching here)}

# ? Nov 8, 2013 15:00

Posting Principle: Dec 10, 2011; by Ralp

Anyone going to the Toronto Erlang Factory Lite?

# ? Nov 8, 2013 16:25

Adbot: ADBOT LOVES YOU

# ? Apr 27, 2024 17:51

MononcQc: May 29, 2007

Yeah, I'm speaking there: http://www.erlang-factory.com/conference/Toronto2013/speakers/FredHebert

# ? Nov 8, 2013 16:50

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Dr. Erlang Love or: How I Learned to Stop Worrying and Love Crashing

«‹›5 »

code:

code:

Paul MaudDib posted:

code:

code:

code:

code:

code:

code:

code:

MononcQc posted:

Shinku ABOOKEN posted:

quote:

Shinku ABOOKEN posted:

Shinku ABOOKEN posted:

Cocoa Crispies posted:

quote:

MononcQc posted:

Otto Skorzeny posted:

Shinku ABOOKEN posted:

Rapsey posted:

Shinku ABOOKEN posted:

The Insect Court posted:

MononcQc posted:

MononcQc posted:

MononcQc posted:

quote:

MononcQc posted:

Mniot posted:

Mniot posted:

MononcQc posted:

MononcQc posted:

Cocoa Crispies posted:

Shinku ABOOKEN posted:

more like dICK posted:

more like dICK posted:

leper khan posted:

Cocoa Crispies posted:

quote:

code: