Python Interview 1 - The Pythonic Way

From Masterminds of Programming, Chapter 2.

Pythonis amodern, general-purpose, high-level language developed by Guido van Rossum as a result of his work with the ABC programming language. Python’s philosophy is pragmatic; its users often speak of the Zen of Python, strongly preferring a single obviousway to accomplish any task. Ports exist for VMs such as Microsoft’s CLR and the JVM, but the primary implementation is CPython, still developed by van Rossumandother volunteers, who just released Python 3.0, abackward-incompatible rethinking of parts of the language and its core libraries.

The Pythonic Way

What differences are there between developing a programming language and developing a “common” software project?

Guido van Rossum : More than with most software projects, your most important users are programmers themselves. This gives a language project a high level of “meta” content. In the dependency tree of software projects, programming languages are pretty much at the bottom—everything else depends on one or more languages. This also makes it hard to change a language—an incompatible change affects so many dependents that it’s usually just not feasible. In other words, all mistakes, once released, are cast in stone. The ultimate example of this is probably C++, which is burdened with compatibility requirements that effectively require code written maybe 20 years ago to be still valid.

How do you debug a language?

Guido : You don’t. Language design is one area where agile development methodologies just don’t make sense—until the language is stable, few people want to use it, and you won’t find the bugs in the language definition until you have so many users that it’s too late to change things. Of course there’s plenty in the implementation that can be debugged like any old program, but the language design itself pretty much requires careful design up front, because the cost of bugs is so exorbitant.

How do you decide when a feature should go in a library as an extension or when it needs to have support from the core language?

Guido : Historically, I’ve had a pretty good answer for that. One thing I noticed very early on was that everybody wants their favorite feature added to the language, and most people are relatively inexperienced about language design. Everybody is always proposing “let’s add this to the language,” “let’s have a statement that does X.” In many cases, the answer is, “Well, you can already do X or something almost like X by writing these two or three lines of code, and it’s not all that difficult.” You can use a dictionary, or you can combine a list and a tuple and a regular expression, or write a little metaclass—all of those things. I may even have had the original version of this answer from Linus, who seems to have a similar philosophy.

Telling people you can already do that and here is how is a first line of defense. The second thing is, “Well, that’s a useful thing and we can probably write or you can probably write your own module or class, and encapsulate that particular bit of abstraction.” Then the next line of defense is, “OK, this looks so interesting and useful that we’ll actually accept it as a new addition to the standard library, and it’s going to be pure Python.” And then, finally, there are things that just aren’t easy to do in pure Python and we’ll suggest or recommend how to turn them into a C extension. The C extensions are the last line of defense before we have to admit, “Well, yeah, this is so useful and you really cannot do this, so we’ll have to change the language.”

There are other criteria that determine whether it makes more sense to add something to the language or it makes more sense to add something to the library, because if it has to do with the semantics of namespaces or that kind of stuff, there’s really nothing you can do besides changing the language. On the other hand, the extension mechanism was made powerful enough that there is an amazing amount of stuff you can do from C code that extends the library and possibly even adds new built-in functionality without actually changing the language. The parser doesn’t change. The parse tree doesn’t change. The documentation for the language doesn’t change. All your tools still work, and yet you have added new functionality to your system.

I suppose there are probably features that you’ve looked at that you couldn’t implement in Python other than by changing the language, but you probably rejected them. What criteria do you use to say this is something that’s Pythonic, this is something that’s not Pythonic?

Guido : That’s much harder. That is probably, in many cases, more a matter of a gut feeling than anything else. People use the word Pythonic and “that is Pythonic” a lot, but nobody can give you a watertight definition of what it means for something to be Pythonic or un-Pythonic.

You have the “Zen of Python,” but beyond that?

Guido : That requires a lot of interpretation, like every good holy book. When I see a good or a bad proposal, I can tell if it is a good or bad proposal, but it’s really hard to write a set of rules that will help someone else to distinguish good language change proposals from bad change proposals.

Sounds almost like it’s a matter of taste as much as anything.

Guido : Well, the first thing is always try to say “no,” and see if they go away or find a way to get their itch scratched without changing the language. It’s remarkable how often that works. That’s more of a operational definition of “it’s not necessary to change the language.”

If you keep the language constant, people will still find a way to do what they need to do. Beyond that it’s often a matter of use cases coming from different areas where there is nothing application-specific. If something was really cool for the Web, that would not make it a good feature to add to the language. If something was really good for writing shorter functions or writing classes that are more maintainable, that might be a good thing to add to the language. It really needs to transcend application domains in general, and make things simpler or more elegant.

When you change the language, you affect everyone. There’s no feature that you can hide so well that most people don’t need to know about. Sooner or later, people will encounter code written by someone else that uses it, or they’ll encounter some obscure corner case where they have to learn about it because things don’t work the way they expected.

Often elegance is also in the eye of the beholder. We had a recent discussion on one of the Python lists where people were arguing forcefully that using dollar instead of self-dot was much more elegant. I think their definition of elegance was number of keystrokes.

There’s an argument to make for parsimony there, but very much in the context of personal taste.

Guido : and simplicity and generality all are things that, to a large extent, depend on personal taste, because what seems to cover a larger area for me may not cover enough for someone else, and vice versa.

How did the Python Enhancement Proposal (PEP) process come about?

Guido : That’s a very interesting historical tidbit. I think it was mostly started and champi- oned by Barry Warsaw, one of the core developers. He and I started working together in ‘95, and I think around 2000, he came up with the suggestion that we needed more of a formal process around language changes.

I tend to be slow in these things. I mean I wasn’t the person who discovered that we really needed a mailing list. I wasn’t the person who discovered that the mailing list got unwieldy and we needed a newsgroup. I wasn’t the person to propose that we needed a website. I was also not the person to propose that we needed a process for discussing and inventing language changes, and making sure to avoid the occasional mistake where things had been proposed and quickly accepted without thinking through all of the consequences.

At the time between 1995 and 2000, Barry, myself, and a few other core developers, Fred Drake, Ken Manheimer for a while, were all at CNRI, and one of the things that CNRI did was organize the IETF meetings. CNRI had this little branch that eventually split off that was a conference organizing bureau, and their only customer was the IETF. They later also did the Python conferences for a while, actually. Because of that it was a pretty easy boondoggle to attend IETF meetings even if they weren’t local. I certainly got a taste of the IETF process with its RFCs and its meeting groups and stages, and Barry also got a taste of that. When he proposed to do something similar for Python, that was an easy argument to make. We consciously decided that we wouldn’t make it quite as heavy-handed as the IETF RFCs had become by then, because Internet standards, at least some of them, affect way more industries and people and software than a Python change, but we definitely modeled it after that. Barry is a genius at coming up with good names, so I am pretty sure that PEP was his idea.

We were one of the first open source projects at the time to have something like this, and it’s been relatively widely copied. The Tcl/Tk community basically changed the title and used exactly the same defining document and process, and other projects have done similar things.

Do you find that adding a little bit of formalism really helps crystallize the design decisions around Python enhancements?

Guido : I think it became necessary as the community grew and I wasn’t necessarily able to judge every proposal on its value by itself. It has really been helpful for me to let other people argue over various details, and then come with relatively clear-cut conclusions.

Do they lead to a consensus where someone can ask you to weigh in on a single particular crystallized set of expectations and proposals?

Guido : Yes. It often works in a way where I initially give a PEP a thumb’s up in the sense that I say, “It looks like we have a problem here. Let’s see if someone figures out what the right solution is.” Often they come out with a bunch of clear conclusions on how the problem should be solved and also a bunch of open issues. Sometimes my gut feelings can help close the open issues. I’m very active in the PEP process when it’s an area that I’m excited about—if we had to add a new loop control statement, I wouldn’t want that to be designed by other people. Sometimes I stay relatively far away from it like database APIs.

What creates the need for a new major version?

Guido : It depends on your definition of major. In Python, we generally consider releases like 2.4, 2.5, and 2.6 “major” events, which only happen every 18–24 months. These are the only occasions where we can introduce new features. Long ago, releases were done at the whim of the developers (me, in particular). Early this decade, however, the users requested some predictability—they objected against features being added or changed in “minor” revisions (e.g., 1.5.2 added major features compared to 1.5.1), and they wished the major releases to be supported for a certain minimum amount of time (18 months). So now we have more or less time-based major releases: we plan the series of dates leading up to a major release (e.g., when alpha and beta versions and release candidates are issued) long in advance, based on things like release manager availability, and we urge the developers to get their changes in well in advance of the final release date.

Features selected for addition to releases are generally agreed upon by the core developers, after (sometimes long) discussions on the merits of the feature and its precise specification. This is the PEP process: Python Enhancement Proposal, a document-base process not unlike the IETF’s RFC process or the Java world’s JSR process, except that we aren’t quite as formal, as we have a much smaller community of developers. In case of prolonged disagreement (either on the merits of a feature or on specific details), I may end up breaking a tie; my tie-breaking algorithm is mostly intuitive, since by the time it is invoked, rational argument has long gone out of the window.

The most contentious discussions are typically about user-visible language features; library additions are usually easy (as they don’t harm users who don’t care), and internal improvements are not really considered features, although they are constrained by pretty stringent backward compatibility at the C API level.

Since the developers are typically the most vocal users, I can’t really tell whether features are proposed by users or by developers—in general, developers propose features based on needs they perceived among the users they know. If a user proposes a new feature, it is rarely a success, since without a thorough understanding of the implementation (and of language design and implementation in general) it is nearly impossible to properly pro pose a new feature. We like to ask users to explain their problems without having a spe cific solution in mind, and then the developers will propose solutions and discuss the merits of different alternatives with the users.

There’s also the concept of a radically major or breakthrough version, like 3.0. Historically, 1.0 was evolutionarily close to 0.9, and 2.0 was also a relatively small step from 1.6. From now on, with the much larger user base, such versions are rare indeed, and provide the only occasion for being truly incompatible with previous versions. Major versions are made backward compatible with previous major versions with a specific mechanism available for deprecating features slated for removal.

How did you choose to handle numbers as arbitrary precision integers (with all the cool advantages you get) instead of the old (and super common) approach to pass it to the hardware?

Guido : I originally inherited this idea from Python’s predecessor, ABC. ABC used arbitrary precision rationals, but I didn’t like the rationals that much, so I switched to integers; for reals, Python uses the standard floating-point representation supported by the hardware (and so did ABC, with some prodding).

Originally Python had two types of integers: the customary 32-bit variety (“int”) and a separate arbitrary precision variety (“long”). Many languages do this, but the arbitrary precision variety is relegated to a library, like Bignum in Java and Perl, or GNU MP for C. In Python, the two have (nearly) always lived side-by-side in the core language, and users had to choose which one to use by appending an “L” to a number to select the long variety. Gradually this was considered an annoyance; in Python 2.2, we introduced automatic conversion to long when the mathematically correct result of an operation on ints could not be represented as an int (for example, 2**100).

Previously, this would raise an OverflowError exception. There was once a time where the result would silently be truncated, but I changed it to raising an exception before ever letting others use the language. In early 1990, I wasted an afternoon debugging a short demo program I’d written implementing an algorithm that made non-obvious use of very large integers. Such debugging sessions are seminal experiences.

However, there were still certain cases where the two number types behaved slightly different; for example, printing an int in hexadecimal or octal format would produce an unsigned outcome (e.g., –1 would be printed as FFFFFFFF), while doing the same on the mathematically equal long would produce a signed outcome (–1, in this case). In Python 3.0, we’re taking the radical step of supporting only a single integer type; we’re calling it int, but the implementation is largely that of the old long type.

Why do you call it a radical step?

Guido: Mostly because it’s a big deviation from current practice in Python. There was a lot of discussion about this, and people proposed various alternatives where two (or more) representations would be used internally, but completely or mostly hidden from end users (but not from C extension writers). That might perform a bit better, but in the end it was already a massive amount of work, and having two representations internally would just increase the effort of getting it right, and make interfacing to it from C code even hairier. We are now hoping that the performance hit is minor and that we can improve performance with other techniques like caching.

How did you adopt the “there should be one—and preferably only one—obvious way to do it” philosophy?

Guido: This was probably subconscious at first. When Tim Peters wrote the “Zen of Python” (from which you quote), he made explicit a lot of rules that I had been applying without being aware of them. That said, this particular rule (while often violated, with my consent) comes straight from the general desire for elegance in mathematics and computer science. ABC’s authors also applied it, in their desire for a small number of orthogonal types or concepts. The idea of orthogonality is lifted straight from mathematics, where it refers to the very definition of having one way (or one true way) to express something. For example, the XYZ coordinates of any point in 3D space are uniquely determined, once you’ve picked an origin and three basis vectors.

I also like to think that I’m doing most users a favor by not requiring them to choose between similar alternatives. You can contrast this with Java, where if you need a listlike data structure, the standard library offers many versions (a linked list, or an array list, and others), or C, where you have to decide how to implement your own list data type.

What is your take on static versus dynamic typing?

Guido: I wish I could say something simple like “static typing bad, dynamic typing good,” but it isn’t always that simple. There are different approaches to dynamic typing, from Lisp to Python, and different approaches to static typing, from C++ to Haskell. Languages like C++ and Java probably give static typing a bad name because they require you to tell the compiler the same thing several times over. Languages like Haskell and ML, however, use type inferencing, which is quite different, and has some of the same benefits as dynamic typing, such as more concise expression of ideas in code. However the functional paradigm seems to be hard to use on its own—things like I/O or GUI interaction don’t fit well into that mold, and typically are solved with the help of a bridge to a more traditional language, like C, for example.

In some situations the verbosity of Java is considered a plus; it has enabled the creation of powerful code-browsing tools that can answer questions like “where is this variable changed?” or “who calls this method?” Dynamic languages make answering such questions harder, because it’s often hard to find out the type of a method argument without analyzing every path through the entire codebase. I’m not sure how functional languages like Haskell support such tools; it could well be that you’d have to use essentially the same technique as for dynamic languages, since that’s what type inferencing does anyway—in my limited understanding!

Are we moving toward hybrid typing?

Guido: I expect there’s a lot to say for some kind of hybrid. I’ve noticed that most large systems written in a statically typed language actually contain a significant subset that is essentially dynamically typed. For example, GUI widget sets and database APIs for Java often feel like they are fighting the static typing every step of the way, moving most correctness checks to runtime.

A hybrid language with functional and dynamic aspects might be quite interesting. I should add that despite Python’s support for some functional tools like map( ) and lambda, Python does not have a functional-language subset: there is no type inferencing, and no opportunity for parallellization.

Why did you choose to support multiple paradigms?

Guido: I didn’t really; Python supports procedural programming, to some extent, and OO. These two aren’t so different, and Python’s procedural style is still strongly influenced by objects (since the fundamental data types are all objects). Python supports a tiny bit of functional programming—but it doesn’t resemble any real functional language, and it never will. Functional languages are all about doing as much as possible at compile time— the “functional” aspect means that the compiler can optimize things under a very strong guarantee that there are no side effects, unless explicitly declared. Python is about having the simplest, dumbest compiler imaginable, and the official runtime semantics actively discourage cleverness in the compiler like parallelizing loops or turning recursion into loops.

Python probably has the reputation of supporting functional programming based on the inclusion of lambda, map, filter, and reduce in the language, but in my eyes these are just syntactic sugar, and not the fundamental building blocks that they are in functional languages. The more fundamental property that Python shares with Lisp (not a functional language either!) is that functions are first-class objects, and can be passed around like any other object. This, combined with nested scopes and a generally Lisp-like approach to function state, makes it possible to easily implement concepts that superficially resemble concepts from functional languages, like currying, map, and reduce. The primitive operations that are necessary to implement those concepts are built in Python, where in functional languages, those concepts are the primitive operations. You can write reduce( ) in a few lines of Python. Not so in a functional language.

When you created the language, did you consider the type of programmers it might have attracted?

Guido: Yes, but I probably didn’t have enough imagination. I was thinking of professional programmers in a Unix or Unix-like environment. Early versions of the Python tutorial used a slogan something like “Python bridges the gap between C and shell programming,” because that was where I was myself, and the people immediately around me. It never occurred to me that Python would be a good language to embed in applications until people started asking about that.

The fact that it was useful for teaching first principles of programming in a middle school or college setting or for self-teaching was merely a lucky coincidence, enabled by the many ABC features that I kept—ABC was aimed specifically at teaching programming to nonprogrammers.

How do you balance the different needs of a language that should be easy to learn for novices versus a language that should be powerful enough for experienced programmers to do useful things? Is that a false dichotomy?

Guido: Balance is the word. There are some well-known traps to avoid, like stuff that is thought to help novices but annoys experts, and stuff that experts need but confuses novices. There’s plenty enough space in between to keep both sides happy. Another strategy is to have ways for experts to do advanced things that novices will never encounter—for example, the language supports metaclasses, but there’s no reason for novices to know about them.