One of the growth areas for Python over the last few years has been in the area of data science: Tools such as Jupyter (aka “IPython Notebook“), along with NumPy and Pandas, have made Python one of the main languages that people use when doing data science. But how does a data scientist use Jupyter? How does a data-science firm use it to promote collaboration and exploration? And where are notebooks note less preferable than traditional Python modules? In this talk, Brian Lange provides a high-level overview of where and how Jupyter notebooks are used in his data-science firm, discussing their relative advantages and disadvantages in consulting and training.
In machine learning, we train a computer to find correlations in our data — correlations that we might not find ourselves, or that would be too complex and time-consuming for a human to do. When you engage in a machine-learning project, you need to ensure that your data is reliable, and then choose an appropriate model. Then you need to check your model. It turns out that if you use Python, doing this sort of analysis is fairly straightforward. In this talk, Tenn Leeuwenburg provides a tutorial in machine learning using Python, talking about models, data, neural networks, and the appropriateness (or not) of certain kinds of models to particular situations.
We’re used to hearing about data science being used in a variety of industries, such as politics, law enforcement, and medicine. But it’s also being used in journalism, to change and improve the ways in which the news is reported. What does it mean to be a data scientist at a newspaper, and in how is it changing the practice of journalism? In this talk, Chris Wiggins describes how the New York Times is using data science in its work, for journalism and for improving its business.
Apache Spark has taken the data-science world by storm, offering a new way to process and analyze large quantities of data. Spark provides interfaces in a number of popular languages, such as Java, Scala, and Python, making it possible to perform large-scale data analysis in relatively short periods of time. Indeed, Spark’s claim to fame is that it can do very fast analysis of very large quantities of data. In this talk, Spark inventor Matei Zaharia introduces the technology, describes how it compares and interacts with others, and provides examples of how to use Spark to answer questions about large-scale data sets.
ClojureScript is getting lots of attention as an elegant way to write client-side programs. But what sorts of things can you really do? What sorts of applications are made possible by ClojureScript? In this talk, Chandu Tennety describes an application he wrote to analyze and visualize bird migration. He describes how they read the data, interfaced with other libraries (e.g., D3), and even stored the data using Datomic.
So, you’ve got a lot of data. You might even say “big data.” You want to analyze it, and so you turn to Python and the SciPy stack. But you might also want to benefit from a relational database, such as PostgreSQL — either because the data is already in there, or because it’ll be useful to take advantage of some of PostgreSQL’s features. In this talk, Josh Berkus shows us how we can use PL/Python, running inside of the database — thus avoiding the need to transfer data from the database to Python. If you’re doing data science, using Python, or using PostgreSQL, then this talk will show yo how to combine these tools for fast and flexible open-source tools that help you to do great data science.
IPython notebook (aka Jupyter) is a well known, Python-based system for working with and collaborating on data science. But sometimes you don’t want to have people work on a data-science project, so much as be able to review certain aspects of that data. In other words, you want to create a small application that lets people review and play with limited aspects of the data. As Andrew Campbell explains in this talk, it’s now possible to create and use widgets from IPython, to create useful and interesting applications — streamlining and speeding up the process even more than before.
If you’re doing data science, then the odds are good that you’re using Python. And if you’re using Python to do data science, then you’re probably using matplotlib to create visualizations — charts, plots, and graphs that help us to make sense of the data you have collected and analyzed. In this talk, matplotlib lead developer Michael Droettboom introduces matplotilb, and shows how it can be used to create amazing charts, plots, and graphs in a variety of styles.
We keep hearing about big data and data science. But how can these disciplines be used with real-world cases? What sort of data is appropriate for data science? And how can we apply Python‘s many data-analysis libraries to these problems? In this talk, Robert Layton shows how he was able to predict sports winners using two of the most popular Python data-analysis libraries, Pandas and scikit-learn.
Functional programming is hot. So are Python and data science. So the combination makes for an almost unbeatable combination — using functional programming techniques in Python for data-science work. In this talk, Joel Grus introduces Python’s basic functional techniques and libraries, and then such methods as k-clusters for grouping data. If you have always been interested in how to use Python to analyze data, and/or have always wanted to push your functional programming knowledge to the limits, this talk will most certainly interest you.