Saturday, April 20, 2013

Acme as an editor

On using Acme as a day-to-day text editor

I've been using the Acme text editor from Plan9Port as my standard text editor for about 9 months now. Before that, I have used Emacs and Vim quite a lot. I never really got the hang of either Sublime Text or TextMate. The latter because I couldn't run it on all operating systems, the former because is was too new to bother.
With Acme, you sacrifice almost everything. There is no configuration file. This is a plus. I have spent way too much time messing with configuration, where I should have been messing with adaptation. The acme defaults are designed to be sensible and to be easy to work with. The standard font choice works well, and even though it is not antialiased by default, I tend to like the font nonetheless.
Other sacrifices are syntax highlighting, automatic indentation, specific language support and so on. But you gain that the editor always work and there are no upgrades which bother you when working. The editor is built to be a window manager of sorts and you use it as a hub to connect other software together. This hub becomes the main focus of everything you are doing.
What pleases me about the acme editor is that it is simple. You can learn most things it can do in a week and then the power stems from combination of those simple things. It is very much a Zen-style editor with few things going on. You will have to like that choices have been made for you and that you have to adapt to those. But with this editor I spend much more time working on code bases than trying to get my editor to behave.

Setting up acme

  • Grab Plan 9 from User Space and ./INSTALL it somewhere. This gives you the basis environment, but it does require some help to get running.
  • Grab Plan 9 setup which is my small tools as shell scripts to manipulate files
  • I have some changes to $HOME/.profile:
    export BROWSER='chromium'
    PLAN9=/home/jlouis/P/plan9
    PATH=$PATH:$PLAN9/bin
    
    # Plumb files instead of starting new editor.
    EDITOR=E
    unset FCEDIT VISUAL
    
    # Get rid of backspace characters in Unix man output.
    PAGER=nobs
    
    # Default font for Plan 9 programs.
    font=$PLAN9/font/lucsans/euro.8.font
    
    # Equivalent variables for rc(1).
    home=$HOME
    prompt="$H=;          "
    user=$USER
    
    export \
            BROWSER\
            ⋯
    
            PLAN9\
            ⋯
            font\
            home\
            prompt\
            user
    
  • Acme is started once through the acme-start.rc script. This also starts the plumber service.
  • $HOME/lib/plumbing is linked so I get some additional plumbing rules in addition to the default rules primarily quick access to github stuff

On mouse usage

Acme requires a good mouse to be really effective. Find a gaming mouse with good DPI resolution and then proceed to configure it so it has acceleration and sensitivity settings that you like. It shouldn't be needed to move the mouse too much, yet the movement should be precise. Some mice has microprocessors in them which smooths movement so when you sweep a line, it is easier to stay on the line. It all depends on what mouse you have.
In a moded editor like vim, you are usually either in command mode for cursor movement or in insert mode for entering text. To understand acme, it is the same, either you have a hand on your mouse and are doing commands, or you are inserting text into the buffer at some place. Note that in a system like acme, you can do a lot of tasks on the mouse alone. You can double-click next to a " character to select the whole string or to select pairs (), [] or {}. Click in the start of a line select the whole line. And so on. Since you can also select the \n character, you can easily move around large textual parts of the code. Cut, copy and paste is also on the mouse alone. So most of the (common) things you do in the vim command mode is done with the mouse.
You do have access to a command language, which comes from the sam(1) editor. Learning how that language works helps a lot. I do a lot of my surgery on files by writing commands that change the contents of a selection.
Is the mouse more efficient than the keyboard? Hell yes! The more complex an editing task is, the more effective the mouse is. And for most other simple things, the speed is about the same as the keyboard movement for me.

Working with acme

The key concept of acme is that you use it as the main entry point for all work you do. One of my screens is full-screen acme, and it usually runs two major windows in acme, at the least: scratch and win. The latter is a standard shell so you can open files and operate on commands. I either open files by doing lc and then 3-click them or by executing B <filename>. Remember that ^F completes filenames.
The scratch is a file I continually write containing helper commands, small snippets, GH references and so on. I usually have a global one, and one per project I am working on. Usual contents:
  • Urls to important web pages. 3-click them and chromium takes you there
  • Github issues
  • Git branch names enclosed in brackets: [jl/fix-eqc-test/37]
  • Notes of importance, thougths.
  • Complex commands with notes on their usage so they can be copied in and used quickly
If I am working on a branch, there are usually commands helpful to that branch in the scratch buffer. If there is a gg foo command you can just Snarf it and then use the Send command in the shell window to fire it off in the source code.
I usually keep done things in the scratch buffer as well for documentation. Every 2-3 months I then move it to a file scratch.$(date +%Y%m%d) to remember that.
I make heavy use of the fact that acme has a neat way to enter unicode directly, so there are a lot of correct punctuation and a lot of things you won't usually see in ASCII. The editor uses a variable width font by default which is really good when reading and scratching stuff down. Though I also use the Font command if I need a fixed-with font at times.
Acme is a visual-spatial editor environment. By default, it doesn't hide information from you. At work, it is not uncommon that i have over 50 open buffers on a big 27 inch Mac display. You can do that with acme easily and since you have spatiality, you can also remember where you put a window and get back to it easily. I usually run 4 columns:
  • One with documentation and scratch pads.
  • One which contain the main code I am working on right now.
  • One with shells, erlang shells and code mixed.
  • A narrow strip containing directory output to quickly get at a specific file. This strip also holds windows with error output.
On smaller screens, I run a 2 column setup which is the default one.

Working with Erlang in acme

Most of what I do is Erlang. I sometimes work in different languages, but the operation is roughly the same.
  • A shell is used to run rebar compile. I add [rebar compile] to the tag and then click at the right of [ to select it. A 2-click now recompiles. Other typical things in the tag could be make or make test and so on.
  • The gg command is a shorthand for git grep -n. Need a specific thing? I gg it and then the filename comes up with a line number. 3-clicking it is understood by the plumber to open that file on that line.
  • I tend to avoid using tools which can follow code paths. Mainly because if you need a tool, then chances are that the code itself is quite convoluted and nasty
  • I have a window which runs an erlang console for the project I am working on. I often dynamically load code into the erlang node and test it out. It is rare that I reboot the node unless I am doing something startup-specific coding.
  • Documentation: Edit , <erl -man lists in a dummy window for the purpose
  • I often search for code. :/^keyfind will search for keyfind, but at the start of the line. I keep such a line around in the tag for searches.
  • The Edit , d command clears a window by selecting all contents and then deleting it.
  • I often utilize the shell commands: <date +%Y-%m-%d inserts the current date into the buffer for instance. Selecting text and sending it through |sort will sort export lists and atom tables.
  • Each project is written to a dump file with Dump acme.projectname. This way, you can easily get back with Load acme.projectname which restores your current window layout and more.
  • I use the shell a lot when I write code. In practice I see the UNIX system as the IDE and then I use acme to access that IDE. It works wonders.

Wednesday, February 20, 2013

A review of Fred Hebert's ``Learn you an Erlang for greater Good:''



A disclaimer, to start it all off, in the interest of fairness and honesty: I got a review copy of the book from No Starch Press, and was asked to review it.

I remember a couple of years ago, on IRC (Internet Relay Chat) that Fred had started writing a series of articles on how Erlang worked. It all began in the small, with him writing about the language constructs, the data types, how to use the language on so on. Everytime he wrote a chapter, he would seek out the help of the channel for proofreading the parts and for suggestions how to improve it. I like Fred's work because he made a good introduction to the language and I had a place to point whenever somebody wanted to learn about Erlang. Even better, it was a free resource, straight on the web, and you could just send a link to somebody about a complicated part.

I am not sure how much Fred really envisioned it as a book. He started off from the idea of "Learn you a Haskell for greater good" which is a book written in the same, light, style. But as time passed, he kept writing and he kept adding more chapters to the book. It was also nice when Fred had covered another subject, since it eased the transition for new Erlang programmers. We had another source we could point to when people wanted to consider those aspects of the language.

Since then, Fred has written chapters on concurrency, on distribution and on general program design in Erlang. His style has always been to mix up fact and fun, and he avoided falling into the trap of writing a book full of boring tirades on the language. There is a reader group for such a book as well, but we already have that covered by a language definition and a short tutorial. Many people have enjoyed his book.

And now—now you can get the book in dead wood as well! No starch press made this possible and published all of Fred's amazing work, so you can go buy a book to read everywhere. It is an excellent introduction to Erlang and you can go to the web and look for the first chapter or two in order to get a feel for his writing style. If it is one you like, you should definitely consider buying the book. While the style may be too slow for a seasoned functional programmer, it is the right fit if you have yet to experience functional programming. I know that many people still prefer a book when they are reading. The reasons for this are many, but reading off a low resolution screen doesn't have the same feel to it as printed text. A side-effect is that the book has cleaned up all of the writing and made it clearer.

The major selling point of this book is that it contains it all. You will be exposed to sequential, concurrent and distributed Erlang. Fred also covers some parts which are often skipped in other books, like the dialyzer, mnesia and ETS tables. He also covers most of the testing tools like EUnit and Common Test. And in addition, Fred explains how OTP works and how to structure your OTP applications. Few other books offer that level of depth. Many of the chapters cover things which I consider to be essential to efficient Erlang programming in the large: especially the dialyzer, releases, testing and ETS are important for the professional Erlang programmer.

If you want to know it all, LYSE is a great book to buy. And you can buy it here:

http://learnyousomeerlang.com

Saturday, January 12, 2013

How Erlang does scheduling


In this, I describe why Erlang is different from most other language runtimes. I also describe why it often forgoes throughput for lower latency.

TL;DR - Erlang is different from most other language runtimes in that it targets different values. This describes why it often seem to perform worse if you have few processes, but well if you have many.

From time to time the question of Erlang scheduling gets asked by different people. While this is an abridged version of the real thing, it can act as a way to describe how Erlang operates its processes. Do note that I am taking Erlang R15 as the base point here. If you are a reader from the future, things might have changed quite a lot—though it is usually fair to assume things only got better, in Erlang and other systems.

Toward the operating system, Erlang usually has a thread per core you have in the machine. Each of these threads runs what is known as a scheduler. This is to make sure all cores of the machine can potentially do work for the Erlang system. The cores may be bound to schedulers, through the +sbt flag, which means the schedulers will not "jump around" between cores. It only works on modern operating systems, so OSX can't do it, naturally. It means that the Erlang system knows about processor layout and associated affinities which is important due to caches, migration times and so on. Often the +sbt flag can speed up your system. And at times by quite a lot.

The +A flag defines a number of async threads for the async thread pool. This pool can be used by drivers to block an operation, such that the schedulers can still do useful work while one of the pool-threads are blocked. Most notably the thread pool is used by the file driver to speed up file I/O - but not network I/O.

While the above describes a rough layout towards the OS kernel, we still need to address the concept of an Erlang (userland) process. When you call spawn(fun worker/0) a new process is constructed, by allocating its process control block in userland. This usually amounts to some 600+ bytes and it varies from 32 to 64 bit architectures. Runnable processes are placed in the run-queue of a scheduler and will thus be run later when they get a time-slice.

Before diving into a single scheduler, I want to describe a little bit about how migration works. Every once in a while, processes are migrated between schedulers according to a quite intricate process. The aim of the heuristic is to balance load over multiple schedulers so all cores get utilized fully. But the algorithm also considers if there is enough work to warrant starting up new schedulers. If not, it is better to keep the scheduler turned off as this means the thread has nothing to do. And in turn this means the core can enter power save mode and get turned off. Yes, Erlang conserves power if possible. Schedulers can also work-steal if they are out of work. For the details of this, see [1].

IMPORTANT: In R15, schedulers are started and stopped in a "lagged" fashion. What this means is that Erlang/OTP recognizes that starting a scheduler or stopping one is rather expensive so it only does this if really needed. Suppose there is no work for a scheduler. Rather than immediately taking it to sleep, it will spin for a little while in the hope that work arrives soon. If work arrives, it can be handled immediately with low latency. On the other hand, this means you cannot use tools like top(1) or the OS kernel to measure how efficient your system is executing. You must use the internal calls in the Erlang system. Many people were incorrectly assuming that R15 was worse than R14 for exactly this reason.

Each scheduler runs two types of jobs: process jobs and port jobs. These are run with priorities like in an operating system kernel and is subject to the same worries and heuristics. You can flag processes to be high-priority, low-priority and so on. A process job executes a process for a little while. A port job considers ports. To the uninformed, a "port" in Erlang is a mechanism for communicating with the outside world. Files, network sockets, pipes to other programs are all ports. Programmers can add "port drivers" to the Erlang system in order to support new types of ports, but that does require writing C code. One scheduler will also run polling on network sockets to read in new data from those.

Both processes and ports have a "reduction budget" of 2000 reductions. Any operation in the system costs reductions. This includes function calls in loops, calling built-in-functions (BIFs), garbage collecting heaps of that process[n1], storing/reading from ETS, sending messages (The size of the recipients mailbox counts, large mailboxes are more expensive to send to). This is quite pervasive, by the way. The Erlang regular expression library has been modified and instrumented even if it is written in C code. So when you have a long-running regular expression, you will be counted against it and preempted several times while it runs. Ports as well! Doing I/O on a port costs reductions, sending distributed messages has a cost, and so on. Much time has been spent to ensure that any kind of progress in the system has a reduction cost[n2].

In effect, this is what makes me say that Erlang is one of a few languages that actually does preemptive multitasking and gets soft-realtime right. Also it values low latency over raw throughput, which is not common in programming language runtimes.

To be precise, preemption[2] means that the scheduler can force a task off execution. Everything based on cooperation cannot do this: Python twisted, Node.js, LWT (Ocaml) and so on. But more interestingly, neither Go (golang.org) nor Haskell (GHC) is fully preemptive. Go only switches context on communication, so a tight loop can hog a core. GHC switches upon memory allocation (which admittedly is a very common occurrence in Haskell programs). The problem in these systems are that hogging a core for a while—one might imagine doing an array-operation in both languages—will affect the latency of the system.

This leads to soft-realtime[3] which means that the system will degrade if we fail to meet a timing deadline. Say we have 100 processes on our run-queue. The first one is doing an array-operation which takes 50ms. Now, in Go or Haskell/GHC[n3] this means that tasks 2-100 will take at least 50ms. In Erlang, on the other hand, task 1 would get 2000 reductions, which is sub 1ms. Then it would be put in the back of the queue and tasks 2-100 would be allowed to run. Naturally this means that all tasks are given a fair share.

Erlang is meticously built around ensuring low-latency soft-realtime properties. The reduction count of 2000 is quite low and forces many small context switches. It is quite expensive to break up long-running BIFs so they can be preempted mid-computation. But this also ensures an Erlang system tend to degrade in a graceful manner when loaded with more work. It also means that for a company like Ericsson, where low latency matters, there is no other alternative out there. You can't magically take another throughput-oriented language and obtain low latency. You will have to work for it. And if low latency matters to you, then frankly not picking Erlang is in many cases an odd choice.

[1] "Characterizing the Scalability of Erlang VM on Many-core Processors" http://kth.diva-portal.org/smash/record.jsf?searchId=2&pid=diva2:392243
[2] http://en.wikipedia.org/wiki/Preemption_(computing)
[3] http://en.wikipedia.org/wiki/Real-time_computing

[n1] Process heaps are per-process so one process can't affect the GC time of other processes too much.
[n2] This section is also why one must beware of long-running NIFs. They do not per default preempt, nor do they bump the reduction counter. So they can introduce latency in your system.
[n3] Imagine a single core here, multicore sort of "absorbs" this problem up to core-count, but the problem still persists.

(Smaller edits made to the document at Mon 14th Jan 2013)

Friday, December 28, 2012

Hacking the brains of other people with API design.


In UNIX there is a specific error number which can be returned from system calls. This error, EAGAIN is used by the OS kernel whenever it has a complex state in which it is deemed too hard to resolve a proper answer to the userland application. The solution is almost a non-solution: you punt the context back to the user program and ask that it goes again and retries the operation. Then the kernel gets rid of the complex state and the next time the program enters the kernel, we can be in another state without the trouble.

Here is an interesting psychological point: we can use our code to condition another persons brain to cook up a specific program that serves our purpose. That is, we can design our protocols such that they force the user to adapt certain behaviour to his programs. One such trick is deliberate fault injection.

Say you are serving requests through a HTTP server. Usually, people would imagine that 200 OK is what should be returned always on succesful requests, but I beg to differ. Sometimes—say 1/1000 requests—we deliberately fail the request. We return a 503 Service Unavailable back to the user. This conditions the user to write error-handling code for this request early on. You can't use the service properly without handling this error, since it occurs too often. You can even add a "Retry-After" header and have him go immediately again.

This deliberate fault injection has many good uses.

  • First, it enforces users of your service to adapt a more biological and fault tolerant approach to computing. Given enough of this kind of conditioning, programmers will automatically begin adding error-handling code to their requests, because otherwise it may not work.
  • Second, it gives you options in case of accidents: say your system is suddenly hit by an emergency which elevates the error rate to 10%. This has no effect, since your users are already able to handle the situation.
  • Third, you can break conflicts by rejecting one or both requests.
  • Fourth, you can solve some distribution problems by failing the request and have the client retry. 
  • Fifth, simple round-robin load balancing is now useful. If you hit an overloaded server, you just return 503 and the client will retry another server.


I have a hunch that Amazons Web Services uses this trick. Against S3, I've seen an error rate suspiciously close to 1/500. It could be their own way of implementing a chaos monkey and then conditioning all their users to write code in a specific way with it.

The trick is also applicable in a lot of other contexts. Almost every protocol has some point where you can deliberately inject faults in order to make other clients behave correctly. It is very useful in testing as well. Use QuickCheck to randomly generate requests and let a certain amount be totally wrong. These wrong requests must then be rejected by the system. Otherwise something is wrong with it.

More generally, this is an example of computer programs being both formal and chaotic at the same time. One can definitely find interesting properties of biological processes to copy into computer systems. While it is nice to be able to prove that your program is correct, the real world is filled with bad code, faulty systems, breaking network switches and so on. Having a reaction to this by having your system be robust to smaller errors is definitely going to be needed. Especially in the longer run, where programs will become even more complex and communicate even more with other systems; other systems over which you have no direct control.

You can see fault-injection as a type of mutation. The programs coping with the mutation are the programs which should survive in the longer run.

Consider hacking the brain of your fellow programmers. And force them to write robust programs by conditioning their minds into doing so.

Thanks to DeadZen for proof-reading and comments.

Tuesday, December 04, 2012

Some Epigrams


I am no Alan Jay Perlis, nor am I really worthy.

  • Function parameters fornicate. If you have 7, they will quickly breed to 14.
  • Any "new" idea which a person thinks about has a 98% chance of having been researched better and more deeply before 1980. Thus most new ideas aren't.
  • Age rule for the young: If a concept is older than you and still is alive you must understand it. If it is in hibernation it may come back again. If there is no trace of it - some bozo is about to reinvent it.
  • Dynamic typing is a special case of Static typing.
  • Beware the scourge of boolean blindness.
  • Prefer persistence over ephemerality.
  • The program which can be formally reasoned about is usually the shortest, the correct and the fastest.
  • "We will fix it later" - later never occurs.
  • Project success is inversely proportional to project size.
  • Code not written sometimes has emergent behaviour in the system. Either by not having bugs or by executing invisible code infinitely fast in zero seconds.
  • Your portfolio of closed source projects doesn't exist.
  • Version control or doom.
  • Around the year 1999 the number of programmers increased 100-fold. The skill level didn't.
  • Program state is contagious. Avoid like the plague.
  • Business logic is a logic. Inconsistent logic?

  • 0.01: The factor of human beings who can program
  • 0.001: The factor of human beings who can program concurrently
  • 0.0001: The factor of human beings who can program distributively



  • If your benchmark shows your code an order of magnitude faster than the established way, you are correct. For the wrong problem.
  • Debugging systems top-down is like peeling the onion inside-out.
  • A disk travelling on the back of army ants has excellent throughput but miserable latency. So has many Node.js systems.
  • Beware of the arithmetic mean. It is a statistic, and usually a lie.
  • Often, speed comes with a sacrifice of flexibility on the altar of complexity.
  • Sometimes correctness trumps speed. Sometimes it is the other way around.
  • Optimal may be exponentially more expensive to compute than the 99th percentile approximation.



  • The programmer is more important than the programming language
  • Programming languages without formal semantics is akin to a dumping ground. The pearls are few and far between.
  • The brain is more important than the optimizing compiler
  • The tools necessary for programs of a million lines of code are different than those for 1000 lines.
  • Specializing in old tools contains the danger of ending as an extinct dinosaur.
  • Like introduction of 'null', Object Oriented Programming is a grave mistake.
  • The string is heaven because it can encode anything. The string is hell because it can encode anything.



  • Idempotence is your key to network programming.
  • Protocol design is your key to network programming.
  • Sun RPC is usually not the solution. Corollary: HTTP requests neither.
  • Your protocol must have static parts for structure and dynamic parts for extension.
  • Only trust systems you have control over and where you can change the behaviour.
  • If a non-programmer specifies a distributed system, they always violate the CAP theorem.
  • In a distributed system, the important part is the messages. What happens inside a given node is uninteresting. Especially what programming language it is written in.
  • A distributed system can have more failure scenarios than you can handle. Trying is doom.
  • The internet has a failure rate floor. If your system has a failure rate underneath it, you are error-free to the customer.
  • If your system is doing a million $100 requests a year. A failure rate of 10 requests per year is not worth fixing.
  • If your system employs FIFO queues, latency can build up. Bufferbloat is not only in TCP.
  • Beware the system overload situation. It is easier to reject requests than handle them. You need back-pressure to inform.


Tuesday, November 27, 2012

Is the keyboard or the mouse faster for text input?


A very common thing that crops up now and then is the question in the title. What is fastest for editing text, the keyboard or the mouse? The answer which is an often quoted answer is an older "Ask Tog" article[1a, 1b, 1c]. They come up again and again in these discussions and then the keyboardists battle it out against the mouse-zealots.

Since I have been working in most of the "grand" editors out there, Emacs and vi(m) for years, I do have something to say about this subject I think. Currently, I am writing this blog post, and most of my coding in the acme(1)-editor[2]. Acme is often seen as being an editor which is very mouse-centered, but there is more to the game than just being a mouse editor.

First of all, what keyboard shortcuts do acme(1) understand? It understands 5 commands in total: Let ^ stand for the control character. Then it understands ^A and ^E which moves the cursor to the start and end of the line respectively. It understands ^H which is delete character before cursor (backspace) and ^W which kills a whole word. Finally it understands ^U which deletes from the cursor to the start of the line. The very reason for supporting these shortcuts are that they are very deeply rooted in UNIX. A lot of systems understand these commands and when entering text on end, these commands are very nice to have available. I guess I am a boring typist because when I see I have written a word incorrectly, I often just kill the whole word and type it again. The shortcut ^W is a nice quickly typed command on the left hand of a QWERTY style keyboard.

Secondly, and I think this is a very important point, acme(1) has a command language stemming from the sam(1) editor. It may be that the mouse is often used, but if you are to change every occurrence of 'foo' into 'bar' you just execute the command "Edit , s/foo/bar/g". This is almost like in vi. I don't think anybody would argue that for a large piece of text this would be faster to do than to manually go and edit the text. The reason is that we are programming the editor. We are writing a program which carries out the mere task for us. And the cognitive overload of doing so is smaller than being the change-monkey. In the command the comma is a shorthand for "all of the files lines". What if we only wanted the change on the 2nd paragraph of the text? In acme(1) you can just select that text and then execute "Edit s/foo/bar/g". Which narrows the editing to the selection only. As you go from "program" to "specific" editing, then the mouse and the spatial user interface makes it faster and faster.

The [1c] reference has a task which is trying to prove a point. A piece of text needs the execution of, essentially "Edit s/\|/e/g", replacing every '|' with an 'e'. The program above is clearly the fastest way to do it for large texts. And you don't even have to think about that program when you know the editor. But the time it takes to find each letter and replace it is subject to the cognitive overhead the article talks about. It adds up when you are doing lots of these small edits all day.

For editing source code, a peculiar thing happens. I often grab the mouse and then I more or less stay on the mouse. Note that acme has the usual ability to cut-and-paste text on the mouse alone. You don't need the keyboard for this. It means that you can do a lot of text surgery with the mouse alone. Since you can select the end-of-line codepoint, you can easily reorder lines, including indentation. Often, renaming variables happens on the mouse alone. Also, there is some tricks that the mouse has hidden. Clicking twice right next to a parenthesis '(' selects all text up to the matching ')'. The same with quotes. It allows you to quickly cut out parts of your structured code and replace it with other code.

Then there is text search. When writing large bodies of programs, you will often end up searching for text more than editing text. The quest is that of something you need to find. Since the mouse in acme(1) has search on a right click by design, most text can be clicked to find the next specimen you need to consider. A more complex invocation is through the "plumber" which understand the context of the text being operated upon. A line like "src/pqueue.erl:21:" is understood as "Open the file "src/pqueue.erl" and goto line 21 by a right click. Combine this with a command like "git grep -n foo" in a shell window and you can quickly find what you are looking for. I often use the shell as my search tool and then click on my target line. You can even ask grep to provide context to find the right spot to edit.

Good editors can be programmed, and a mouse-centered editor is no exception. Apart from the sam(1) built-in command language, you can also write external unix programs to pipe text through. I have a helper for Erlang terms, called erlfmt, which will reindent any piece of Erlang nicely. I have the same for JSON structures since they are often hard to read.

The thing that makes acme(1) work though stems from an old idea, by Niklaus Wirth and Jürg GutKnecht[3]: The Oberon operating system. In this operating system, the graphical user interface is a TUI or a textual user interface in which spatiality plays a big role. Not unlike the modern tiling window managers, the system lays out windows next to each other in ways so they never overlap. But unlike the tiling window managers, the interface is purely textual. You can change the menu bars by writing another piece of text there if you want. The same is present in acme(1). You often end up changing your environment into how you want it to look. Since you can "Dump" and "Load" your current environment, each project often ends up with a setup-file that makes the configuration for that particular environment. I essentially have one for each project I am working on. In many Erlang projects, there is a shell window where the menu (called the tag in acme(1)-speak) is extended with the command 'make'. This makes it easy to rebuild the project. And errors are reported as "src/file.erl:LINE:" like above, making error correction painless and fast.

The key is that to make the mouse efficient, you need to build the environment around the mouse. That is, your system must support the mouse directly and make it possible to carry out many things on the mouse alone. It is rather sad to see that most modern editing environments shun a so effective editing tool and removes it totally from the entering of text. But perhaps the new touch-style interfaces will change that again? Currently their problem seems to be that the mobile phones and tablets are not self-hosting: we are not programming them via themselves. That probably has to happen before good programming user interfaces using touch becomes a possibility. I must admit though, that the idea of actually touching the 'make' button you wrote down there yourself is alluring.


[1a] http://www.asktog.com/TOI/toi06KeyboardVMouse1.html
[1b] http://www.asktog.com/TOI/toi22KeyboardVMouse2.html
[1c] http://www.asktog.com/SunWorldColumns/S02KeyboardVMouse3.html
[2] Acme is part of the plan9 port: http://swtch.com/plan9port/
[3] Note that the original Oberon Native System is living on in the Bluebottle/AOS/A2 system today, see http://en.wikipedia.org/wiki/Oberon_(operating_system) and http://en.wikipedia.org/wiki/Bluebottle_OS

Monday, October 29, 2012

Ramblings on the thesis of Bjarne Däcker


The following were the initial research requirements for Erlang when they sat out to investigate a new language for telecom[0] (link at the bottom). It is in the thesis written by Bjarne Däcker, and I think it would be fun to scribble down my thoughts on the different requirements. My view may very well differ from the original views, since I came into the world of Erlang pretty late.

Handling of a very large number of concurrent activities

In a telecom system, or in an internet webserver, many things happen concurrently with each other. While one person is initiating a call, another person may be talking on a line while a third caller is trying to set up a conference call between 4 parties. This requires you to be able to operate many things concurrently with each other.

In a webserver, it is the same thing. While you are taking in new GET requests, somebody is doing a POST somewhere while another client is getting data through a Server-Sent-Event channel[1].

Note that this is not a requirement for parallelism at all. The only requirement is that we can easily describe such concurrent activities. We don't care if it executes on a single core at all.

Actions to be performed at a certain point in time or within a certain time

For a telecom system, this is quite important. You must be able to handle timing quite precisely. In principle, you would like to have hard realtime, but in practice soft real-time is often enough.

But note: This means that you will prefer low latency over system throughput. It is more important that the system begins responding within due time that it is important it can deliver Gigabytes of bandwidth throughput. Often, latency and throughput are opposite one another. Getting latency down can hurt throughput and vice versa.

It also means that your system must focus on being able to run many timers at once and handle all of them precisely. You may be woken up later than the 200ms you specified, but not before.

Systems distributed over several computers

This is a requirement for robustness of the system. The interesting thing to note here is that there are two large categories of systems of distributed nature: shared-nothing (SN) and those who are not. While it is highly desirable to have an SN system, these are not always easily possible to get. The problem occurs as soon as you need to share state between the given architectures. Many developers attempt to avoid sharing systems, for good reasons. But for certain problems, you cannot avoid sharing data. This is where a language with seamless distribution shines.

Sharing information is very important in a telecom system. A configuration change must eventually be distributed to all end points. If one node goes down, another node must be able to keep on operating. So a telecom system must share some information quickly and cannot be made as an entirely shared-nothing architecture.

There are other areas where you need to track state, preferably across machines: Computer Game servers, Instant Messaging systems, and Databases are a few such examples. Do also note that every shared-nothing system eventually has a place which shares state. It can be a database deep in the backend which handles multiple requests. It can be a memcached instance. Or a file on disk, even. In any case, few systems share no state.

Where seamless distribution really rocks is when you need in-memory objects of state. If the disk turns out to be too slow, you need to materialize the thing you are operating on in memory and then periodically checkpoint the state to persistent storage. This is the case where it becomes too expensive to take a request, load the state from disk, change and manipulate the state and then store it back to disk.

Interaction with hardware

In telecom, there are certain operations which are impossible to achieve in software. Part of the 3g protocol is the recalculate optimal mobile-phone-to-mast configurations once every millisecond. This makes it impossible to do in software with general purpose chips. You need to handle it with FPGAs or even purposefully crafted chips.

Back in the day, when Erlang was first developed, the problem has probably been the need to handle ATM switching hardware from the software layer. It also suggest that efficient handling of binary protocol data is important.

Very large software systems

Of course, what constitutes very large is subject to change over the years. But it does yield some thoughts on how the construct a language. In very large software projects, you will have many programmers working on the same code base. They must be able to use each other code easily. It must also be possible to evolve the code in one end of the system without affecting other ends.

Compile speed is important. A recompile can't take too long in this setup. Also, it must be easy to construct interfaces that other programmers can use. Note that a major part is to battle change-over-time in the software, where certain parts of the code gets manipulated over a period of years. It creates its own slew of problems since code must still fit together.

Another important point when programming-in-the-large is that you need a way to split up a program into packages and pieces. Otherwise, you can't really manage the complexity. You need a way to take different pieces, describe their dependencies and then assemble them into a working system. Preferably, you also want to be able to seamlessly upgrade one part of the software while keeping other parts constant. This suggests that you must be prepared to replace a package at some point in time, without needing to go back and change other parts of the software.

Complex functionality such as feature interaction

This requirement ties in with the shared-nothing approach from above. In certain systems, like telecom and computer game servers, the different features of the system will interact in intricate ways. You can't use a database for storing this since the changes must be kept into main memory. Otherwise it is too slow. In other words, it is important that the language allows you to write elaborate and complex solutions to problems where different parts of the system interact in non-trivial ways.

This requirement is very far from the typical web server, where there is only a single interaction chain. A client will talk to a database. Most of the other things happen to be mere glue facilitating this main requirement.

Continuous operation for many years

Telecom systems are expected to have long lifetimes. The systems are expected to run for many years without being stopped for maintenance. Hence you need to handle a continuous operation of the system. If a fault occurs, you must be able to inspect the fault while the system is running. You can't stop it and have a look at the stopped system. Furthermore, the concurrency constraints means that you can't really halt the system, since other parts of the system will continue to operate normally.

It also means that there has to be an upgrade path going forward. When Erlang was designed, it was not clear what kind of system architecture there would be in the future. There were MIPS, Digital Alpha, x86, HP-PA RISC, Sun SPARC, PowerPC and so on. And there were as many different software platforms: OS/2, Windows, UNIX in different incantations, WxWorks, QNX, NeXt and so on. This may have been the deciding factor in making Erlang into a virtual machine where ease of portability is more important than execution speed or hardware utilization.

Software maintenance (reconfiguration, etc.) without stopping the system

This is a requirement in internet networking equipment as well as in telecom systems. You can't stop a router when you decide to reconfigure it. Also, it means that configuration is not always a static thing you can keep in a configuration file. Some of the configuration may be dynamic in nature and be configured as you go along. Probably, this decision was what led to the incorporation of the mnesia database into Erlang.

It also means that you need to introspect and upgrade the software while it is running. You can't stop operation just to get the system up again. Luckily, on the internet, we often can get away with some kind of service interruption, if done correctly. In a shared nothing architecture, we can often roll servers one at a time and thus upgrade service without anyone noticing. We can do database upgrades by rewriting client code so it can operate on multiple different schemas at a time and then we can go upgrade the scheme. In schemaless databases, we can even upgrade the database schema lazily in a read-repair fashion as we are reading old records.

Games like Guild Wars 2 employ rolling upgrades by running two versions of the software on the same machine. See for instance the Green/Blue archtecture idea by Martin Fowler, et al[2]. The idea is that when they upgrade the game, they begin adding new players to the new version while keeping the old version running until the last player leaves the server. Of course they can hint the player to reconnect when the population becomes low. It does mean, however, that the player can decide when they want the reconnect. If they are in the middle of something important in the game, they can wait a bit.

But there are important things to be thinking about here. How do you upgrade the state of the player from the old version to the new one, and so on.

Stringent quality and reliability requirements

There are certain decisions in Erlang which supports these requirements. First, the language decided to use garbage collection which eliminates many bugs pertaining to memory management right away. Note that the way the garbage collection is handled in Erlang means that usually GC times are extremely short-lived and thus never a problem for latency.

Second, the language is very functional. Only a few parts operate in an imperative way, amongst those the messaging primitives and the ETS tables. The effect is the elimination of a lot of state-bugs in the code. These are often problematic in many imperative languages.

Another decision is that integers are not bounded in size by default. There are no exceptional cases and there are no overflow/underflow bugs which can occur. A measurement was that quite many bugs in code bases are due to these errors. And the price to correct faults in large systems tend to be expensive due to the vast amounts of QA needed. By tolerating such bugs in the virtual machine you can eliminate the cost of fixing these bugs altogether.

The language prefers operating on functional structure in programs. This means your programs have few variables used for indexing into structure and you operate with maps and folds over large general structures. It also means your code flow avoids complex if-then-else-mazes but has a single generic flow in them which processes data.

Finally, programs are written in a certain style, OTP, which means that a lot of patterns are covered once and for all. As soon as you see an OTP-compliant system, you instinctively know how to absorb its inner workings. It helps quite a lot when you need to understand a system. OTP also enourages splitting up your systems into multiple process contexts. This means that each part is easier to understand. You only need to understand the part itself and the process contexts it communicates with. Often, this limits the complexity of the system, since you can get away with analyzing only a subset of the whole.

OTP also encourages you to think into system protocols. To an Erlang-programmer an API is often a protocol which describes how you must communicate with a subsystem. It is different from usual library APIs in the sense that it is not always just function calls. It may be asynchronous messages that flows back and forth. That is, the protocol may specify that you send certain messages and you will get certain, often different, messages delivered to your mailbox. The erlang terms are symbolic, so you have very good ways to describe the contents of a messages.

Fault tolerance both to hardware failures and software errors

Note the emphasis that you must be fault tolerant to hardware failure as well as software failure. In certain situations, the hardware breaks down partially, but can still operate on degraded service. If a link is faulty, or you cannot use a given telephony channel, then you may be able to route around the given problem.

In my opinion, this is one of the places where Erlang fares best. In a highly distributed system, you have to sacrifice some failure scenarios. The reason is that handling all of them is too complex and takes too long time. Some failure scenarios are even impossible to handle at all, and you are forced to aim differently.

A system can not be free of errors in hardware or software. The thing under your control is the error rate. Even in a highly consistent single-machine-system, that system may break down. It means that the error-rate can never be 0, like in the distributed case. Everything you did not account for is a fault and the system must be built to tolerate those. This is a fairly complex thing to handle, and Erlang is built with a toolbox allowing you to handle the nastier errors of the lot.

In practice, you are lucky on the internet. There is a noise floor for errors. Suppose your system fails 1 in a million requests. Now suppose that a user uses your service a million times. On average, the poor guy should have a service disruption. But what if his ISP has a rate of 10 in a million? This is the noise floor in effect. People will just retry the request and if you can then give service, you are relatively safe.

[0] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.88.1957
[1] http://www.w3.org/TR/eventsource/
[2] http://martinfowler.com/bliki/BlueGreenDeployment.html

About Me

My Photo
Lambda-loving CS Geek. Likes metal music. Likes dogs. Likes cats. Does not like pictures of dogs and cats (unless they are lambdacats!)

Has an unhealthy coffee addiction. Calls himself the coffee zombie in the morning (BEEEEANS!)

Has a neverending curiosity gene. Likes intelligence.