SandStorm

Note: This might be quite obsoleted now

As you are reading this, you are probably scraching your head, and thinking,
"SandStorm... That's sounds familiar from somewhere...". After trying to
activate some of those grey cells(They don't tend to work at 3AM, especially
after a whole day of coding and bug-hunting), you finally find out why. "Ah,
HailStorm! that's where it's coming from!".

Then you start to wonder what SandStorm is. Perhaps it is the code name for
the next stage in Microsoft's world domination plans. Perhaps it is just
another name for HailStorm. Or perhaps this is all a bad dream, and you fell
asleep while coding(That's what happends when you are too lazy to make a cup
of coffe).

All of these conclusions are wrong, of course(perhaps the last one isn't,
though), and to prevent brain overloading, I will explain what SandStorm is
all about.

First, it's name is mostly for marketing reasons. It has nothing to do with
HailStorm, with the only common properity is the general aim(World
domination of course !).

Now that you know what SandStorm isn't, you can sit down, relax, and read
what SandStorm is.

In short, SandStorm provides a loose frame-work for creating scaleable, 
complex distributed applications, using XML-RPC for communication.

So what SandStorm consists of?

SandStorm can be divided into three parts:

A) Application API
B) Library implementation
C) Standard components.

Application API
===============

SandStorm's application API is pretty minimal. it consists of the registry
API, and component API.

The registry, a component by itself, contain lists of all of the known
components, except the registry,as well as their location, or URL.

component registeration is provided by XML-RPC interface of course, and can
be done either by the component itself, or by a "3rd party" utility.

The name of the component in the registry marks the method name namespace it
occupies, so a component named as "my.namespace" means that all the
methods releated to that component start with "my.namespace".

Each component, in addition to the method in it's "private" namespace, must
also conform to Eric Kidd's introspection API, as well as a set of standard
component methods, which take the "active.component" namespace.

Library implementation
======================

While the SandStorm spec may be used directly with XML-RPC, it is much
better to have some kind of abstraction layer between the client and
SandStorm, as well as simplifying the creation of SandStorm components.

The first abstraction layer was built in PHP, on top of Useful Inc
XML-RPC implementation.

Useful's implementation itself was heavyly patched and modified(part of the
patch was accepted to the next version of Useful's XML-RPC), and a nice
abstraction layer was put on top of it.

Then a wrapper was added, simplifying access to the registry and the
components, with server wrapper closely following.

Since then, python library was created on top of lightly patched xmlrpclib
from PythonWare, and Eric Kidd's introspection implementation.

Perl client/server library was also created, using Eric Kidd's registry, and
Frontier::RPC2 module, which is not included in the package because of too
much depencies and size issues(Although there is a guy working on
alternative for Frontier::RPC2) 

There is also Ruby client code, although the XML-RPC library itself is not
included. 

Last but not least, there is support of course for SandML, an XML
programming dialect, designed and aimed as a glue language between XML-RPC
services, but that will be discussed later:)


Standard components
===================

This is where all the fun starts:)

Having a registry and support libraries is nice, but otherwise useless. so
SandStorm comes with a quite rich collection of components, most of them
written as CGI's, with a minority written as stand-alone components, and
they mostly have CGI counterparts.

I will not describe all of them, and the included API browser can teach you
quickly about their API, but i will describe the more exotic and interesting
components.

SandStorm::Cache
================

Namespace: active.cache
Implementation : CGI, Standalone

Description:

This component was designed to be used by dynamic content generators. one of
the obstacles of dynamic content is that generating it can have a
significant overhead.

The common sense suggests that the solution is to cache it. but cache
managment is hard, and since the nature of CGI's, it is usally done by
storing it on disk.

SandStorm::Cache provides a simple API for managing your cache by
hash->value pairs, no matter which language you use.

There are two implementations of the API, a simple CGI based, which stores
it on disk, and a high-performence stand-alone implementation, storing 
cache on memory.

It is usally suggested to use the CGI implementation only for testing
purposes. the stand-alone component has low-latency, and does things like
flushing unused entries(using TTL), and compression for big stuff.

It also provides an alternative API, which instead of client given hashes,
uses server-assigned id numbers. the hash API is in fact wrapped on top of
the API, so this one can be slightly faster.

SandStorm::Mirror
=================

Namespace: active.mirror
Implementation: CGI, Stand-alone

Description:

This component provides file transfer service(optionally deleting the
source), for mirroring parts of your site. basicly it's API is pretty
simple. name of the filename, and the target URI, which can be in theory
anything, although only FTP is currently supported.

Like in the case of SandStorm::Cache, there are once again two
implementations, which although they conform to the same API, they behave in
a different way.

The CGI implementation is synchornized, that is, the method will return only
after the file transfer is completed.

The Stand-alone implementation, on the other hand, is async. It has a file
transfer queue, and a fixed number of threads serving it. when a file
transfer request is issued, it is moved to the queue, and the method returns
true.

The advantage of the CGI implementation is that you know when a transfer was
completed, but the stand-alone implementation, on the other hand, provides
scalability.

It is possible that in the future the stand-alone implementation will have
an API for "record keeping", that is, things will be async, but can be
traced by file transfer id number.

SandStorm::Public
=================

Namespace: none
Implementation: Stand-alone

Description:

One of SandStorm's assumptions(and in fact XML-RPC as a whole) is that
security and authentication should be provided by HTTP. it's a good
assumption, since it simplifies things, but the problem is that most XML-RPC
libraries rarely support HTTP auth and/or SSL.

Now, since SandStorm mostly provides the back-end, this is not so critical,
as back-ends are usally behind a firewall.

But what if you wish to give your clients a restricted access to the
application API after all? For example, you may wish to provide read-only
access to SandStorm::Cache, or your newswire component.

That's where SandStorm::Public steps in. 

basicly, it acts as XML-RPC proxy for all other components(Clients do not need
the registry at all). But it also provides ACL mechanism for controlling who
gets in and who isn't.

The ACL is largely inspired from IPChains. the ACL is a table of rules,
where each is a collection of matches.

A match has a selector string, for example "auth: user pass". if the
selector is correct, a policy exception is raised. If policy is defined by
the ACL interepter, execution finished. if it isn't, then the rule table is
searched for a matching rule, and when found, the rule is executed.

The ACL rules are described in XML, with another high-level XML dialect, for
more "component-oriented" approach.

SandStorm::SandML
=================

Namespace: none
Implementation: Stand-alone

Description:

This one, like SandStorm::Public, does not provide services directly.
instead, it provides a run-time enviroment for SandML.

SandML can be defined as the "native" language of SandStorm. it allows
creating very simple components, or glue together a few simple components
into a big complex one.

A simple SandML document may look like this:

<sml id="my.namespace">
<method id="myMethod" desc="Just a test">
<params>
<param id="name" type="string"/>
<param id="action" type="string"/>
</params>

<def id="toret" type="string">
<var id="name"/>
<var type="string"> Is </var>
<var id="action"/>
</def>

<use component="my.logger"/>
<call id="logAction">
<var id="toret"/>
</call>

<return>
<var id="toret"/>
</return>
</method>
</sml>

This might seem at first more cryptic then an average perl code, but it
basicly describes a component with the "my.namespace" namespace. it has one
method, "myMethod". it accepts two parameters, a name and an action.

when calling:

my.namespace.myMethod("idan","coding")

"idan Is coding" string is returned, and another method,
"my.logger.logAction" accepts the string as well.

There are also some other directives in the language, that I do not have the
time to describe in this document.

So what SandStorm::SandML does with this document? First, it is
semi-compiled into what is called DCode, which, without going into technical
explaintions, is a sort of mini-VM, with a minimal set of 7 Instructions.

After compiled as DCode, SandStorm::SandML registers all SandML components,
and serves as a bridge between the client and the DCode VM

It should be note that currently SandStorm::SandML is a seperate package,
and can be described as alpha software. while the SandML->DCode translation
is very clean, the DCode VM probably has quite a lot of bugs, and doesn't
yet handle errors well.


Summary
=======

This article/document is a brief introduction for what SandStorm stands for,
and for what it offers.

My next article will describe how to install SandStorm, and writing simple
client/server in python. It will probably take a bit of time, as I first
have to finish SandStorm 0.62, but I believe it will worth the wait:)