bg



what does eternity look like?

Code-wise, eternity seems to look like a graph. At least this is what the ANGEL APPLICATION looks like in python:

Comments (3)  Permalink

Fixing urlparse: Make the simple easy, keep the complex solvable

In my previous post, I presented netaddress, an RFC 3986 compliant (I believe) URI parser (and all the shenanigans that come with it, such as numerical IP addresses). Now, while it's good to know that that's available, it has made the parsing simple URI's (the most common case) more complicated than it needs to be. This is because it now exposes most of the complexity inherent in URI's. But this is yet another place where parser combinators really shine. Say, I'd want to parse URI's of the simplified form $(scheme)://$(host)$(path), then this is all you need to do:

from rfc3986 import scheme, reg_name, path_abempty
from pyparsing import Literal
host = reg_name.setResultsName("host")
path = path_abempty.setResultsName("path")
URI = scheme + Literal("://") + host + path

And now you've got yourself a validating parser for your reduced grammar. Nice, no? I've added this as an extra module ("notQuiteURI") to netaddress, so you can use it like this:

>>> from netaddress import notQuiteURI 
>>> uri = notQuiteURI.URI.parseString("http://host.name.com/path/to/resource")
>>> uri.scheme
'http'
>>> uri.host
'host.name.com'
>>> uri.path
(['/', 'path', '/', 'to', '/', 'resource'], {})

Update: netaddress is now available through the python cheese shop. If you're interested, you should be able to install it by simply typing:

$ easy_install netaddress
 Permalink

Fixing urlparse: More on pyparsing and introducing netaddress

This is the last in a series of three posts (1, 2), discussing issues with pythons urlparse module. Here, I intend to provide a solution.

In the last post, I was talking about parser combinators and parsec in particular, mentioning pyparsing towards the end. The angel-app being a python application, parsec, while cool, is of no immediate use. pyparsing on the other hand provides parsec-like functionality for python. Consider this excerpt from the RFC 3986-compliant URI parser that I'm about to present in this post (please ignore as usual the blog's spurious formatting):


dec_octet = Combine(Or([
Literal("25") + ZeroToFive, # 250 - 255
        Literal("2") + ZeroToFour + Digit,     # 200 - 249
        Literal("1") + repeat(Digit, 2),       # 100 - 199
        OneToNine + Digit,                     # 10 - 99
        Digit                                  # 1-9    
        ]))
IPv4address = Group(repeat(dec_octet + Literal("."), 3) + dec_octet)

And now:

>>> from netaddress import IPv4address 
[snipped warning message]
>>> IPv4address.parseString("127.0.0.1")
([(['127', '.', '0', '.', '0', '.', '1'], {})], {})
>>> IPv4address.parseString("350.0.0.1")
Traceback (most recent call last):
File "", line 1, in ?
[snip]
egg/pyparsing.py", line 1244, in parseImpl
raise exc
pyparsing.ParseException: Expected "." (at char 2), (line:1, col:3)

Anyhow, what I mean to say is this: We have a validating URI parser now. Apart from the bugs that are still to be expected for a piece of code at this early stage, it should be RFC 3986 compliant. You can get either the python package, or a tarball of the darcs repository (unfortunately my zope account chockes on the "_darcs" directory filename, so I'm still looking for a good way to host the darcs).


This is how one would use it:

>>> from netaddress import URI
>>> uri = URI.parseString("http://localhost:6221/foo/bar")
>>> uri.port
'6221'
>>> uri.host
'localhost'
>>> uri.scheme
'http'

Or, in the case of a more complex parse:

>>> uri = URI.parseString("http://vincent@localhost:6221/foo/bar")
>>> uri.asDict().keys()
['scheme', 'hier_part']
>>> uri.hier_part.path_abempty
(['/', 'foo', '/', 'bar'], {})
>>> uri.hier_part.authority.userinfo
'vincent'
>>> uri.hier_part.authority.port
'6221'

Hope you find this useful.

Comments (5)  Permalink

Fixing urlparse: A case for Parsec and pyparsing

In a previous post, I described issues with parsing and validating URL's with the functionality provided by Python's stdlib. I will just restate that clearly, all messages exchanged by angel-app nodes must be validated in order for it to work properly. What to do? First of all, I was of course not the first person to notice the module's shortcomings. However, I was surprised at the answers that popped up: It seems like no one was interested in actually coming up with a validating parser (perhaps even just for a subset of the complete URI syntax), but instead people focussed on fixing specific cases where the parser would fail -- in essence adding new features, rather than putting the whole system on a solid basis. Suggestions go so far as to propose a new URI parsing module. However, the proposed new module is again based on the premise that the input represents a valid URI, the behavior in the case of an invalid input is again left undefined. WTF? Have these people never looked beyond string.split() and regexes?


Dudes, writing a VALIDATING PARSER is NOT THAT HARD, if you have a reasonable grammar and good libs. Why do people keep pretending that it is? Sure, you might be afraid of having to fire up lex, yacc and antlr, and for good reason. But with sufficiently dynamic languages, that's usually unnecessary, if you have a parser combinator library handy.


The key idea behind parser combinators is that you write your parser in a bottom up fashion, in just the same way that you would define your grammar. You write a parser for a small part of the grammar, then combine these partial parsers to form a complex whole. The canonical example in this context is Haskell's parsec library. Let's start out with a simple restricted URI grammar:

module RestrictedURI where

import Text.ParserCombinators.Parsec

data URI = URI {
host :: [String],
port :: Int,
path :: [String]
} deriving (Eq, Show, Read)

schemeP = string "http" "scheme"
schemeSepP = string "://" "scheme separator"

hostPartP = many lower "part of a host name"
hostNameP = sepBy hostPartP (string ".") "host name"

pathSegmentP = sepEndBy1 (many1 alphaNum) (string "/") "multiple path segments"
pathP = do {
root - string "/" "absolute path required";
segments - pathSegmentP;
return (root:segments)
} "an absolute path, optionally terminated by a /"

restrictedURIP :: Parser URI
restrictedURIP =
do {
ignored - schemeP;
ignored - schemeSepP;
h - hostNameP;
p - pathP;
return (URI h 80 p)
} "a subset of the full URI grammar"


parseURI :: String -> (Either ParseError URI)
parseURI = parse restrictedURIP ""


(Where you should forgive me for the blog inserting break tags all over the place). But just to illustrate:

vincent$ ghci 
GHCi, version 6.8.1: http://www.haskell.org/ghc/ :? for help
Loading package base ... linking ... done.
Prelude> :l restrictedURI
[1 of 1] Compiling RestrictedURI ( restrictedURI.hs, interpreted )
Ok, modules loaded: RestrictedURI.
*RestrictedURI> parseURI "http://localhost.com/foo/bar"
Loading package parsec-2.1.0.0 ... linking ... done.
Right (URI {host = ["localhost","com"], port = 80, path = ["/","foo","bar"]})

Plus, we get composability, validation and error messages essentially for free:

*RestrictedURI> parseURI "http://localhost2.com/foo/bar" 
Left (line 1, column 17): unexpected "2" expecting lowercase letter,
"." or an absolute path, optionally terminated by a /

Now consider the following excerpt from Haskell's Network.URI.

--  RFC3986, section 3.1  
uscheme :: URIParser String
uscheme =
do { s - oneThenMany alphaChar (satisfy isSchemeChar)
; char ':'
; return $ s++":"
}

(Again, please forgive for the blog eating my code, but you can also get it from the haskell web site.) And compare that to the ABNF found in the corresponding section of the RFC:

scheme      = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )

Note how the complete URI grammar specification in the RFC is barely a page long. So yeah, implementing this grammar is a significant amount of work (of course you could always choose to support just a well-defined subset), but if you have a good parser combinator library, it's just a few hours of mechanically transforming the ABNF into your parser grammar. You can even watch the Simpsons while doing it (I did). In the case of Network.URI, this boils down a line count of 1278, with about half of the lines being comments or empty lines. Not only that, but given the complete grammar specification, it's super easy to formulate a modified grammar.


As it turns out, Python has a library quite like parsec, it's called pyparsing and I'll bore you with it in my next (and last) post on this topic.
 Permalink

Think you can Trust Python's stdlib? Think again.

It's been a while that I've blogged about Ken Thompson's Reflections on Trusting Trust. And this week I was bitten hard by its moral:

The moral is obvious. You can't trust code that you did not totally create yourself. (Especially code from companies that employ people like me.) No amount of source-level verification or scrutiny will protect you from using untrusted code. In demonstrating the possibility of this kind of attack, I picked on the C compiler. I could have picked on any program-handling program such as an assembler, a loader, or even hardware microcode. As the level of program gets lower, these bugs will be harder and harder to detect. A well installed microcode bug will be almost impossible to detect.

The task seemed simple enough. We had been passing around links between clones in a URL-like format of the type ${host}:${port}/${path}, with a small custom parser (an ugly hack) for parsing and unparsing these things. As we adapted the code to support IPv6 it turned out that in many cases (i.e. unless the nodename field was configured), raw IPv6 addresses would be passed around, and the parser would of course choke on that. Fair enough, I thought, time to use the established standards and

import urlparse 

Now this is supposed to split the URI into parts corresponding to scheme, host, path etc. like so

>>> urlparse.urlparse("http://foo.com/bar") 
('http', 'foo.com', '/bar', '', '', '')

Of course, most nodes still had the old clone links lying around, and I was surprised to find the parse for these entries:

>>> urlparse.urlparse("foo.com:6221/bar") 
('foo.com', '', '6221/bar', '', '', '')

Hmm. OK. Let's look at the internals of that parser, and vi urlparse.py:

def urlsplit(url, scheme='', allow_fragments=1): """Parse a URL into 5 components: :/// ?#

[snip]

(e.g. netloc is a single string) and we don't expand % escapes."""
key = url, scheme, allow_fragments
cached = _parse_cache.get(key, None)
if cached:
return cached
if len(_parse_cache) >= MAX_CACHE_SIZE: # avoid runaway growth
clear_cache()
netloc = query = fragment = ''
i = url.find(':')
if i > 0:
if url[:i] == 'http': # optimize the common case
scheme = url[:i].lower()
url = url[i+1:]
if url[:2] == '//':
netloc, url = _splitnetloc(url, 2)

[snip]

else:
scheme, url = url[:i].lower(), url[i+1:]
[snip]

return tuple

(Why do blogs always _INSIST_ on fucking up source code? But we're kind of on topic, so maybe this fits). Anyhow, we have a fancy caching scheme, but the parser itself consists of a bunch of if and uri.split() statements. Talk about premature optimization. More than that, one should think that language implementors know a thing or two about parsers...

Consider: the parser is written in such a way that the result is predictable if and only if the input string represents a valid URL. But how do you find out if a string is indeed a URL? The answer is easy: you use a parser. In other words, the urlparse module is in most cases useless, because unless have sufficient control over the input (unlikely for networking apps) the parse result is essentially undefined.

However the urlparse module is not only "useless", it is in fact dangerous, since by using it for untrusted input, the behaviour of your app is by implication also essentially undefined (how do you handle an undefined result?). Now consider the following quick google code search. I don't suppose that any of the following names rings a bell with you: Zope, Plone, twisted, Turbogears, mailman, django, chandler, bittorrent. Surely all of these software packages have carefully reviewed all of their uses of urlparse, and properly identify and handle all cases where an arbitrary result may be returned... Script kiddies, REJOICE!

 Permalink

ANGEL APPLICATION is getting dressed

angel-app

Thanks to extensive recent efforts by etoy.POL, the M∞ ANGEL APPLICATION is about to become usable by mere mortals.

He has been working pretty hard to expose the most essential functionality via a convenient GUI interface, to be released soon. A recent addition that I'm particularly fond of is an embedded python shell, that allows the interaction with the guts of a running instance:





 Permalink

ANGEL APPLICATION 0.2.0

angel-app

We are happy to announce the immediate availability of ANGEL APPLICATION version 0.2.0.


This update is mainly a major rework of the underlying networking code. For more in-depth information about what has been done by the development crew, see this blog post from Vincent. Also, more information is available on the M∞ ANGEL-APPLICATION Developer Wiki.

One important thing to note: if you are upgrading from an older version, you will have to purge/empty your local repository once before being able to help safeguard MISSION ETERNITY data forever. This can be done with a single mouse-click in the File menu -> "Purge repository".


Grab your copy now and become an ANGEL in the global social memory network!

Comments (3)  Permalink

CALL for testing ANGEL APPLICATION release candidate 0.2.0rc1

Dear all,

the ANGEL APPLICATION source code has reached a point which we think is good for creating a new public release for m221e ANGELS.

To make sure things go well, we kindly ask that each etoy.AGENT running MAC OS X or a Unix-ish operating system downloads the RELEASE CANDIDATE of the software, which is available at

http://angelapp.missioneternity.org/index.py/Download

All we ask for is starting it and checking the following things:

- does it crash?
- does the "p2p process" run continuously?
- do all the icons and images show up correctly?

If you encounter problems, you can do the following:

- purge the repository via the new File menu command and see if the problem persists
- remove all previous data like so:

    rm -rf ~/.angel-app
    rm ~/.angelrc

and see if the problem persists
- report the operating system version
- for mac users, consider copy pasting output from Console.app (it shows the logging of angel-app)

For a list of changes, I suggest looking at agent Vincent's blog post at:
http://www.etoy.com/blog/archive/2007/10/27/angel-application-approaching-beta.html

It would be nice to get feedback (also positive ;-) ) during the weekend.

thank you!

Comments (16)  Permalink

ANGEL APPLICATION - approaching beta

We're highly pleased with the progress we have been making lately: The next release of the ANGEL APPLICATION is to be expected for one of the coming weekends (obviously, it's ready when it's ready, we're largely debian nerds after all). The obligatory screenie (looks haven't changed much, tho'):




Major changes include:

  • a completely revamped security model: we have abandoned our previously mixed pull/push model in favor of a purely pull model. This greatly simplifies the code, and increases security by disallowing any (with one tiny, optional, exception) modification of data on the clients by remote agents. However, this required
  • NAT traversal support. This we implemented by adding optional support for NAT traversal via teredo/miredo. This in turn required
  • (optional) support for IPv6 in the twisted matrix library, our primary infrastructure library. The extension is available as a (limited, but self-contained) add-on module from our subversion repository.
  • To support transparent addressing in the face of a schizophrenic internet infrastructure, agent.POL has implemented a dynamic DNS service that supports IPv6 (note e.g. the clone located at vincent.dyn.kraeutler.net, IPv6 required). He's currently offering that as a free service on majimoto.net. We plan to integrate it more tightly into the angel-app as time and resources permit.
  • A revamped configuration subsystem.
  • Improved GUI support.
  • An extensive code cleanup, resulting in a reasonably clean object model and a rather thorough unit test harness, while actually reducing the size of the code base.


I'm currently in the process of stress-testing the system by letting POL's home machine backup my holiday pictures (again, IPv6 support required). Things are looking good so far ;-) Stay tuned, or grab the latest snapshot from svn.

 Permalink

Making nerds happy

A real peer-to-peer application needs a way for the end-user to tweak everything, right? So here we go: the ANGEL APPLICATION has just received a first mockup of a preferences dialogue:



I think especially our CEO is going to love the new experience ;-)

Please not that these changes are only in the source-code repository right now and not yet available in a "download version". Same goes for other enhancements and also design changes that are actively being worked on. Check out the subversion commit feed in RSS

Comments (1)  Permalink
Prev Next11-20/30
etoy.com twisting values since 1994