One of the growth areas for Python over the last few years has been in the area of data science: Tools such as Jupyter (aka “IPython Notebook“), along with NumPy and Pandas, have made Python one of the main languages that people use when doing data science. But how does a data scientist use Jupyter? How does a data-science firm use it to promote collaboration and exploration? And where are notebooks note less preferable than traditional Python modules? In this talk, Brian Lange provides a high-level overview of where and how Jupyter notebooks are used in his data-science firm, discussing their relative advantages and disadvantages in consulting and training.
In machine learning, we train a computer to find correlations in our data — correlations that we might not find ourselves, or that would be too complex and time-consuming for a human to do. When you engage in a machine-learning project, you need to ensure that your data is reliable, and then choose an appropriate model. Then you need to check your model. It turns out that if you use Python, doing this sort of analysis is fairly straightforward. In this talk, Tenn Leeuwenburg provides a tutorial in machine learning using Python, talking about models, data, neural networks, and the appropriateness (or not) of certain kinds of models to particular situations.
One of the great advances in Python over the last few years has been the maturation of generators — starting as an alternative way to create an iterator, then as the basis for coroutines, and now as the basis for asynchronous programming. But just what are coroutines, and how do they allow us to write asynchronous code? In this talk, A. Jesse Jiryu Davis walks us through the creation of a simple HTTP client, first without coroutines, then with them, and then using asynchronous techniques. If you’ve often wondered how these additions to Python might be useful in your work, or how the asynchronous additions to Python 3 are used, this talk should be of great interest to you.
So, you’ve got a lot of data. You might even say “big data.” You want to analyze it, and so you turn to Python and the SciPy stack. But you might also want to benefit from a relational database, such as PostgreSQL — either because the data is already in there, or because it’ll be useful to take advantage of some of PostgreSQL’s features. In this talk, Josh Berkus shows us how we can use PL/Python, running inside of the database — thus avoiding the need to transfer data from the database to Python. If you’re doing data science, using Python, or using PostgreSQL, then this talk will show yo how to combine these tools for fast and flexible open-source tools that help you to do great data science.
IPython notebook (aka Jupyter) is a well known, Python-based system for working with and collaborating on data science. But sometimes you don’t want to have people work on a data-science project, so much as be able to review certain aspects of that data. In other words, you want to create a small application that lets people review and play with limited aspects of the data. As Andrew Campbell explains in this talk, it’s now possible to create and use widgets from IPython, to create useful and interesting applications — streamlining and speeding up the process even more than before.
If you’re doing data science, then the odds are good that you’re using Python. And if you’re using Python to do data science, then you’re probably using matplotlib to create visualizations — charts, plots, and graphs that help us to make sense of the data you have collected and analyzed. In this talk, matplotlib lead developer Michael Droettboom introduces matplotilb, and shows how it can be used to create amazing charts, plots, and graphs in a variety of styles.
When we program on a Unix machine, we often use a terminal window. But what does a terminal program do? What is it trying to emulate, what features does it support, and how can you take advantage of it when you’re working? Why does it work the way it does, anyway? And using a bit of Python and understanding of what’s happening behind the scenes, what sorts of interesting things can we do? In this talk, Thomas Ballinger answers al of these questions, mixing his descriptions with extensive live-coding demos.
We keep hearing about big data and data science. But how can these disciplines be used with real-world cases? What sort of data is appropriate for data science? And how can we apply Python‘s many data-analysis libraries to these problems? In this talk, Robert Layton shows how he was able to predict sports winners using two of the most popular Python data-analysis libraries, Pandas and scikit-learn.
So, you want to download information from the Web? Great — but if the data isn’t available via an API, then you’re going to need to scrape it. That means retrieving the HTML, parsing it, and turning into data you can really use. A popular way to do so in Python is scrapy, an open-source framework for crawling and downloading data. In this talk, Karthik Ananth introduces scrapy, and demonstrates why it’s a powerful tool for creating your own crawlers, either for widespread scraping purposes for specific, single-use projects.
Functional programming is hot. So are Python and data science. So the combination makes for an almost unbeatable combination — using functional programming techniques in Python for data-science work. In this talk, Joel Grus introduces Python’s basic functional techniques and libraries, and then such methods as k-clusters for grouping data. If you have always been interested in how to use Python to analyze data, and/or have always wanted to push your functional programming knowledge to the limits, this talk will most certainly interest you.