铜关公:Allegra ? Blog Archive ? Labeling the Twisted Bits

来源:百度文库 编辑:九乡新闻网 时间:2024/05/06 18:37:13

Labeling the Twisted Bits

Rene Dudfield askedGlyph to “label the shit bits” out of Twisted and let the stable APIstands out. Well, having to pick a Python networking framework for myproject, I did just that a year ago.

And there was nothing left standing.

Living in a house of cards

Here are the five main issues with the Twisted library:

  • Defered are slow to run and complicated to apply.
  • The I/O core is nothing more than asyncore mis-refactored, with a critical couple gone missing from the framework.
  • The base API is so impractical that you can’t develop an HTTP/1.1 peer with it, not even in five years and probably never.
  • The idea of integrating every GUI in a network peer library is a huge patch on the absence of a web peer to host network user interfaces.
  • Twisted support for threading is useless for all applications that demand synchronization of instances’ method and state.

The result stands, but in a precarious equilibrium, like a house of cards.

Here is how, in details, with links to the problematic sources and their alternative solutions in Allegra.

Never is better than right now

Twisted Defered is a good illustration of how you canundermine future development when defining core interfaces. Have a lookat the sources of:

/twisted/internet/defer.py

and ask yourself (amongst other things) why defered comes with ahard-wired branch for success and failure? There are many errorconditions and handling them all should be left to the continuation,where it belongs.

The interface is wrong and the implemention is as bad.

As I explained previously,Twisted implementation of asynchronous continuations and time events iscomplicated, made of sophistication developed on simplisms.

Allegra’s finalization provide a more complex implementation to decouple "when to continue" from "what to continue" and "how to continue it".

Complex is better than complicated

Even a quick look at the maze of its interwingled sources shouldconvince anyone that there is a problem with this obviously twistedimplementation of asynchronous socket I/O.

If you have the patience to unravel those:

/twisted/internet/selectreactor.py
/twisted/internet/abstract.py
/twisted/internet/protocol.py

then you are in for a nasty surprise.

There I found a broken reimplementation of what comes with Python’sstandard library: a map of channel objects wrapping socket filedescriptors; a loop around calls to select; producer and consumer interfaces; etc.

Names have been changed, implementations have been twisted and an elegant design has been refactored to death.

The couple between events and dispatchers implemented in asyncore is gone, and all its benefits were lost too.

Twisted protocol developers cannot leverage the original readable and writable interfaces to set the state of their channels in the event loop independently from the protocol and transport implementation.

No gain, just pain

Also, Twisted developers cannot develop a consistent API forbuffered I/O, pipelining support, practical producer interface andconvenient collector sink. Twisting asyncore.py broke asynchat.py, bringing no gain to application developers. Just pain.

You can see the effects in the implementation of the simplest stream protocols that comes with Twisted:

/twisted/protocols/basic.py

The whole damn thing is so complicated to apply that in five yearsnobody bothered to produce an implementation of netstrings that is leanon buffering (like Allegra’s async_net).

It is no wonder either that no one came up with a working HTTP/1.1client or server in the same interval of time. Apparently it is too hardto develop such complex application protocols on Twisted simplistic LineReceiver and the pile of, er, complications that sits beneath.

A lot more of the same …

Because something as usefull and apparently simple as chunkedencoding for HTTP could not be developed with Twisted, the frameworkexpanded "downard". Hooking every possible GUI and implementing everypossible transport layers, even the most exotics.

These could be usefull for 0,1% of network application developers.The one that can tolerate the performance of something like PTCPimplemented in Python and who can support the burden of a user interfacefor each kind of GUI. And who can cope with I/O events everywhere.

But what about the 99,9% who want to pipeline web requests or parse adeeply encapsulated XML document as it is collected from the network?Without worrying at all about I/O events.

What about a network-aware and cross-plateform web user interface?

And what about threading?

Tangled in threads

Most system calls and libraries exhibit synchronous interfaces.BSDDB, OpenSSL and countless other good pieces of software should bethreaded. Alas, Twisted support for threading is of the same poorquality as the rest of the library:

/twisted/python/threadpool.py

Here you have a single thread pool for the whole application, withno practical way to synchronize instance methods and the burden of yetanother Defered for each threaded call.

Yet, the implementation is so complicated that a function like:

def isInIOThread():
return ioThread == getThreadID()

was required in the infuckingcredibly twisted module

/twisted/python/threadable.py

to make sure not to confuse when a process runs in the asynchronous event loop and when it runs in another thread.

There is a simpler, faster and leaner way not to get tangled inthreads and it has been available in Medusa for allmost ten years. See select_trigger and thread_loop and make your own judgement.

Things don’t get wrong

They start wrong.

To get an API right you must stick to the best waypossible: follow the Zen of your tools and trade; be a dedicatedcraftman; measure twice cut once; and above of all be humble, don’t tryto reinvent that wheel.

Improving the best one availlable is hard enough.

To develop innovative network peer applications I needed a stableand sound framework. A rock on which to build, a powerfull toolbox todevelop complex applications with less code. If possible, one that isactively maintained and developped.

Unfortunately, Twisted has been taking the Python community in thewrong direction for network peer development, undermining support forthe one obvious way to do it right.

Too much damage has allready been done. Too many intelligent people have come to defend bad sources. If you want "Python on Peers" to have its day, it may be time to review Allegra.

If you are using Twisted, now is better than never.

This entry was postedon Friday, May 19th, 2006 at 9:59 pm and is filed under Uncategorized, Python.You can follow any responses to this entry through the RSS 2.0 feed.Responses are currently closed, but you can trackback from your own site.

19 Responses to “Labeling the Twisted Bits”

  1. L. Daniel Burr Says:

    Please, stop slinging mud. It is fine that you are in love with Medusa, and fine that you love your own code. It is not fine that you write inflammatory nonsense in a crass attempt to generate some notoriety for your project.

    Try building a community of your own, rather than engaging in these undisguised attempts at provocation.

  2. Allen Says:

    Sigh. Laurent, it’s not enough to keep repeating how bad Twisted is. You need to come up with some examples of how the problems you perceive actually affect anything. You provide absolutely zero evidence for your “five main issues”, and one of them is a flat-out falsehood: twisted.web2 _does_ implement HTTP 1.1. You don’t understand Deferreds (two ‘r’s). You fail to distinguish between “could not be developed” and “was not developed”. I could just as easily argue that Allegra is totally useless for VoIP applications since nobody has yet written a SIP stack with it. As for “synchronized instance methods”, the problems with this approach have been well-known for many years in the Java world; it’s not going to get any better as time goes by.

  3. Laurent Szyster Says:

    >It is not fine that you write inflammatory nonsense in a crass attempt to generate some notoriety for your project.

    I don’t believe that “inflammatory nonsense” brings notoriety to Allegra.

    Documenting its module does. Explaining what Finalization does. Putting code under peer review does. Describing how Inversion of Control can be applied for network peer programming will.

    >Try building a community of your own, rather than engaging in these undisguised attempts at provocation.

    They are provocations, indeed.

    They may be inflammatory for some people.

    But if you think they are “nonsense”, please tell me where my comparison is wrong?

    Thanks for reading,

  4. Laurent Szyster Says:

    >twisted.web2 _does_ implement HTTP 1.1.

    Kind of. Chunked-encoding is coupled to the rest of the HTTP channel implemented. What if the content must be parsed? Or decoded. Can you encapsulate further? Not without adding as much code as is needed to move from HTTP/1.0 to a minimal HTTP/1.1.

    Look at all those chunk-encoding related state and lines of code introduced in the implementation of HTTP. It may take as much if not more to get GZIP compression or MULTIPART decoding into that web server. If you can manage to keep it all stable.

    >You provide absolutely zero evidence for your “five main issues”, and one of them is a flat-out falsehood: twisted.web2 _does_ implement HTTP 1.1.

    Can you make an argumented refutation of my claim, citing code and stuff? Tell me where precisely I’m wrong and why. Please.

    Thank you

  5. Allen Says:

    > Can you make an argumented refutation of my claim, citing code and stuff? Tell me where precisely I’m wrong and why. Please.

    No, you miss my point — *you* are the one making these claims. You are responsible for “citing code and stuff”.

    Let’s start with the first:

    “Defered are slow to run and complicated to apply.”

    What evidence is there for this?

    “ask yourself (amongst other things) why defered comes with a hard-wired branch for success and failure?”

    Precisely the same reason Python has “return” and “raise”.

    “The interface is wrong and the implemention is as bad.”

    What evidence is there for this?

  6. Laurent Szyster Says:

    >No, you miss my point — *you* are the one making these claims. You are responsible for “citing code and stuff”.

    I did. Have you followed the links?

    >”Defered are slow to run and complicated to apply.” What evidence is there for this?

    For each Defered: an object instanciation, plus a few attributes, plus a list. And then the callbacks, of course. And their loops. And there are defered all over, even in the stream protocols, go figure.

    For each finalization attributed: a deque.append, plus a single callback, in a single loop for finalized. The CPython VM does the rest: “fire” an event and call __del__ () method.

    And there are fewer finalizated instances used, only at the usefull articulations where an instance was allready required: request, channels, select_trigger, synchronized components, etc.

    I actually did benchmark finalizations costing around 70.000 per seconds on a 1.7Ghz PC, instanciating a million of them, chained in a thousand continuations.

    http://laurentszyster.be/blog/to-the-ravening-hordes/

    If you try this with defered, I’m curious to run your test code.

    >Precisely the same reason Python has “return” and “raise”.

    First, it’s a bad comparison: “success” is the continuation bound to an internal success-state to the defered, “return” is a value followed by a continuation set externaly from the function.

    And if you need exceptions is it not simpler to just raise one. You can do that in a finalization, actually exiting the continuation it may be part of.

    The purpose of defered success and failure is to implement a common pattern: the two way branch. Too bad when there are three. Redundant when there is only one option: success and continuation.

    This is a specific problem of defered as a basic construct for composition of asynchronous continuations. The “failure” option is not needed in many cases where we have a simple pipe, the branch is too short for anything else than “success” or “failure”.

    >>The interface is wrong and the implemention is as bad.”
    > What evidence is there for this?

    All of the above.

    If you want to dispell a myth, do a benchmark yourself.

  7. Andrew Says:

    “The purpose of defered success and failure is to implement a common pattern: the two way branch.”

    No, it’s as Allen already said: precisely the same reason Python has “return” and “raise”. Plenty has been written about why Deferreds are designed the way they are, you don’t need to make up reasons.

    It’s clear that you misunderstand much of the design of Twisted, and have no intent of trying to understand it except to criticise.

    I’ve seen no sign that you’ve actually attempted to write anything with Twisted, or examined code of actual applications happily using Twisted. If you want to really impress people about Allegra vs. Twisted, if that really matters so much to you, rather than abstract arguments about how the code is factored, *take a real application written with Twisted and rewrite it to use Allegra*, and blog about how much better you think the Allegra version is. Give a concrete, real-world example for once.

    In several entries now you persist in claiming the lack of a great HTTP 1.1 implementation built on Twisted means Twisted is flawed, rather than the real explanation that no-one cares enough to make it better. No amount of rebuttal from the people involved influences you on this point.

    And I have to wonder: if you’re so happy with Allegra, why bother blogging about Twisted at all?

  8. Nicola Larosa Says:

    This misguided attempt to piggyback on Twisted’s popularity is making an ass of yourself. It’s just sad.

  9. Steve Says:

    I am afraid I agree with the other commenters here. If you have to knock twisted to justify your own code then you should stop trying to justify your own code.

    If it doesn’t stand alone then it has no right to exist. At least twisted has justified its existence by proving its utility. What’s your excuse for continuing to pollute Planet Python with your diatribes?

  10. Glyph Lefkowitz Says:

    Laurent,

    You mentioned the number 70,000 in one benchmark, so I will do you one better: I can instantiate, run, and finalize well over 100,000 Deferreds per second on this 1.5MhZ laptop, even while downstepped on battery power to 600 MhZ.

    This is not using any optimizations, such as James Knight’s C version of Deferred, which I have heard rumored is 3x as fast.

    While Deferreds can be a bottleneck in extremely performance-intensive applications, they are generally more-efficient than ad-hoc abstractions that try to do the same thing.

  11. mark Says:

    These posts do provide one useful service though. I know you don’t understand enough about the issues (or are totally focused on the wrong things) for me to bother trying allegra and you have gotten an awful lot of useful information from the Twisted guys.

  12. Laurent Szyster Says:

    To Steve:

    >What’s your excuse for continuing to pollute Planet Python with your diatribes?

    Peer review. I’m reviewing Twisted.

    And that comes with harsh criticisms, because it’s bad in many ways, nobody’s perfect. And some more perfectible than others. There are ugly bugs in probably every software that has not been peer reviewed (unless you are as talented as DJ Bernstein of course). Even our Python BDFL made big mistakes.

    Don’t try to defend the project like if it was a sacred cow.

    There is dissent and it is rational, argumented and factual. I did my first part of the job, reviewing Twisted sources. If you don’t want to review Allegra sources, can you defend the one I critic?

    To Glyph:

    Python instanciation is reputed to weight 50 times more than the C/C++ equivalent. No wonder the figures are so close (but I would love to see your test code). But 70.000 Finalization per second includes the time it took to instanciate the objects (9.846692 seconds for a million instance) *and* finalize (4.090665).

    Finalization alone is possibly faster than 250.000 per second on a 1.7Ghz processor, provided that you attribute it to an existing instance, something that would have been instanciated anyway, with or without continuation. And that’s the case in Allegra.

    Because the main difference between defered and finalization is that new instances of Defered are created all over Twisted. In Allegra, Finalizations may be attributed to instances that should have been instanciated anyway, like requets, channels, thread loops, synchronized components, etc …

    You can instanciate Finalizations, Branch or Join instance to program complex asynchronous continuations between those instances. But you won’t have to in many places.

    And if you do and that it becomes a burden, it is possible to specialize them for your application purposes. Since each can be a lot more complex than a boolean branch, you will instanciate less of them, making applications of finalization faster.

    Anything on the four other twisted bits? I read your comments on the second one, that missing couple and its wide effect in Twisted.

    What about encapsulable and reusable protocols? What about the “downard” slope toward GUI and funky transport layers. What about the absence of a web peer? What about threads and support for synchronization of instance methods?

    If you want to provide application developers with a network component object model that can plug in the existing *synchronous* API, it must support the common network interface (the web) and have a no-brainer synchronization interface. With the simplest implementation possible: this is Python, not C, each line may weight on performance a 100 times more.

    Regards,

  13. Allen Short Says:

    >If you want to provide application developers with a network component
    >object model that can plug in the existing *synchronous* API

    This may be one of your fundamental misunderstandings:

    We don’t. Ever.

    This is not what Twisted is for. Interfacing with legacy synchronous libraries is one thing, and Twisted has a pretty good record of doing that. Enabling people to write applications that makes bad assumptions about concurrency is another, and there’s no support, present or future, for doing that.

  14. Laurent Szyster Says:

    To Allen:

    >>If you want to provide application developers with a network component
    >object model that can plug in the existing *synchronous* API
    >
    >We don’t. Ever.
    >
    >This is not what Twisted is for. Interfacing with legacy synchronous libraries is one thing, and Twisted has a pretty good record of doing that.

    So which way is it. “Never” or “pretty good record of doing that”

    You can’t have it both way, can you?

    >Enabling people to write applications that makes bad assumptions about concurrency is another, and there’s no support, present or future, for doing that.

    Not bad assumptions.

    Stable assumptions. For whole parts of the library - actually for anything that rely on async_net async_chat implementation of readable and writable - there is each time only one assumption to make and it is safe.

    In handle_read () the protocols and transport are both readable.

    There is no doubt that a readability condition has been agreed concurrently, evaluated at the same time, from the different state of buffers and the channel’s collector stack. And it is certain that there is data to be read from the socket. The dispatcher is ready to read, the socket is ready to be read.

    Idem for handle_write () or any other event handlers dispatched by

    handle_read_event ()
    handle_write_event ()

    There is no uncertainty about the readability or writability.

    When a write event is handled, the channel socket *is* writable and the channel protocol is ready to produce output. When the current producer is stalled, if the output_fifo is empty, the channel will not be writable, excluded from the list of sockets polled.

    That *when* is an implicit but safe assumption to make, at all times.

    Without a single line of code.

    So in Allegra there is no need to call pauseProducing () or startReading () after having tested the I/O state of the FileDescriptor … at all steps in both transport and protocols.

    Just push producers, let them stall, set collectors, let them collect. And if your protocol does stall, let him just resume buffer collection, nothing more.

    If you use async_net, the same assumption holds, but you are expected to make do without stallable producers, just iterables of 8-bit byte strings. This is an alternative for newer, faster, multiplexed and chunked protocols.

    In any case, the I/O state is allways tested only once, for each run of the loop. Exactly before the sockets are polled for I/O events, not in between.

    And probably with a smaller effect on performance under load than that mambo-jumbo of readability/writablity tests that litter Twisted.

    Regards,

  15. tds Says:

    If Allegra is so amazing why not publish it under a usable licese as MIT or BSD license. GPL is not usable for most people.
    Use same license as twisted and we will see which one will get a broader acceptance.

  16. Laurent Szyster Says:

    To tds:

    >GPL is not usable for most people.

    You mean all those people that use GNU/Linux?

    Regards,

  17. tds Says:

    It is possible to write closed source application on Linux. Because it is the base system and not a framework or library.
    But if I use Allegra it is not possible to write a closed source application.
    So the only choice is twisted. :-)

  18. oli Says:

    yeah, GPL is really useless for a framework and library. LGPL is much better or just release it as public domain (like web.py)

  19. Laurent Szyster Says:

    To tds, Oli and other GPL-averts

    When people start to dismiss a library because of its licence, it’s a sure sign that they don’t have much else to say about its sources.

    Anyway, let’s make that licencing issue clear.

    There are three ways to go with the GNU Public Licence:

    1. If you want to write free software for a greater good using Allegra sources, the GPL will suite your needs perfectly. That’s the GNU way.

    2. If you want to make a buck installing or distributing Allegra’s applications, you’re free to do so as long as you comply with the GPL. That’s the Linux distro way.

    3. If you want to use Allegra sources to write commercial applications to make a profit, buy a commercial licence. That’s the MySQL way.

    Regards,