Category Archives: Python

[Video 168] Joris Van Den Bossche: Introduction to Pandas

Pandas is a Python library for reading and manipulating structured data, and is quickly becoming a standard among a large number of data scientists looking to work with, clean, and analyze data. Indeed, that’s what many data scientists (and other analysts) spend a great deal of time doing: They have to take dirty data sets and clean them.  Then, after cleaning the data sets, the data has to be manipulated.  Finally, the results need to be displayed for others to see and use.  Pandas makes all of these tasks fairly easy, but also efficient — thanks in no small part to its use of NumPy arrays.

In this talk by researcher Joris Van den Bossche, we’re introduced to Pandas, learning about its functionality but also where and how to use it. If you’re a data scientist, or experimenting with such manipulations, then this talk will help you to understand Pandas from the perspective of someone who uses it every day.

Slides for the talk are at http://www.slideshare.net/PoleSystematicParisRegion/track-13-joris-van-den-bossche.

Andrew T. Baker: Demystifying Docker

Docker is sort of like a virtual machine, but not exactly. It lets you install applications more easily, and is extremely popular — but it’s hard for people to describe what it is, what it does, and why people are going ga-ga over it. In this talk, Andrew T. Baker introduces Docker to a Python audience. (So for example, he describes the Python-related Docker installations, and what extras they provide.) He explains how Docker simplifies application configuration and rollout, how it is different from (and similar to) other virtualization technologies, and where it’s going in the near future.

Ian Oszvald: Cleaning Confused Collections of Characters

The world is a messy place, and trying to make sense of it can be quite demanding for a program — or the programmer writing that program. If you’re trying to make sense of text files, such as Word documents or PDF, then it’s particularly difficult to extract useful meaning. Adjacent words in the final output might not really be adjacent in the file, character encodings might not be set correctly, and poorly standardized things such as measurements and dates can also cause trouble. For this reason, any sort of serious data analysis starts with the cleaning up of the data source, turning it into something that can be handled reasonably. In this talk, Ian Oszvald describes some of the Python programming techniques he employs in his job to clean data, so that he can then manipulate and work with it.

Daniel Rocco: Pushy Postgres and Python

It’s common for applications to wait for data to arrive in a database. What’s a good way to do that? One is to poll the database every so often, checking to see if our data has arrived. But there’s another way — maybe the database can tell us when something has happened, and we can be informed asynchronously. It turns out that PostgreSQL supports just this sort of notification, using the NOTIFY and LISTEN commands. Moreover, if you’re using Python, you can easily subscribe to PostgreSQL’s notification channels, and wait for PostgreSQL to inform your Python program that new data has come in.  This talk, by Daniel Rocco, shows you how a combination of PostgreSQL and Python makes it quite easy to use these asynchronous notifications.

Bob Ippolito: What can Python learn from Haskell?

Python is an established, widely used programming language, Haskell, for all of its innovations, is still a fairly niche language — and, unlike Python, it’s also compiled and functional. Nevertheless, in this talk, Bob Ippolito tells Python developers that Haskell has a lot to teach Python — particularly in the area of type checking as a way to ensure that programs won’t encounter surprising runtime errors.

Raymond Hettinger: Transforming Code into Beautiful, Idiomatic Python

It used to be hard to get a program to work.  Nowadays, however, it’s easy to get a program to work — but does it work well? Is it maintainable? Python has long emphasized the need for not only working code, but for maintainable, easy to read, and idiomatic code.  In this talk, Raymond Hettinger describes a number of ways in which working (but non-standard) Python code can be turned into something that looks, feels, and acts like the Python that we’re encouraged to write.

Nick Coghlan: Nobody Expects the Python Packaging Authority

If you have written a Python program, then you have almost certainly used Python packages. A large number of packages are distributed via PyPI, the Python Package Index. The easiest way to download, install, and use packages from PyPI is “pip,” which is standard in all of the most recent versions of Python. It turns out that behind PyPI and pip is a group of developers known as the “Python Packaging Authority.” In this talk, PPA member and Python core developer Nick Coghlan describes what the PPA does, and how it tries to solve problems that Python developers face now and in the future.

Glen Jarvis: Ansible Hands-On Training

Ansible is a Python-based provisioning system, similar to the Ruby-based Chef and Puppet, which has been gaining popularity in devops. It’s written in Python, which means that if you want to use Ansible, you’ll need to know at least some (but not much) Python. In this talk, Glen Jarvis introduces Ansible, demonstrating how it works and how to use it for allocating and configuring servers.

Sarah Mount: Message-passing concurrency for Python

Many Python developers, or developers new to Python, want to know they can best get handle multiprocessing — typically using threads, but sometimes using processes. This is a legitimate question, and often leads to disappointment when they hear about the GIL and related restrictions. However, threads aren’t the only way to handle concurrency in Python; we can learn a great deal from other paradigms and programming languages. In this talk, Sarah Mount introduces several of these ideas, and particularly message passing, and considers how and why we might wish to use them in Python.

Raymond Hettinger: Python’s Class Development Toolkit

Python is an object-oriented language, meaning that nearly everything in the language is an object. You can (and are encouraged to) create your own classes. But what are the best ways to create classes?  And what tools does Python provide for us to create classes as easily and well as possible?  In this talk, Raymond Hettinger shows us how to create Python classes, starting with the basics and working up to testing and user feedback. Even if you have written many Python classes before, you’re likely to learn something from this talk.