Category Archives: Data science

[Video 288] Hillary Green-Lerman: The Science of Doubt — Creating Good Controls for Data Science Experiments

Data science is all about working large data sets.  Data scientists often perform experiments, in order to find interesting and useful correlations.  But it’s easy to perform analyses that aren’t accurate, or that help you to draw conclusions that are less than accurate. How can you be sure that your analysis is robust? Use techniques that embrace your doubts, and allow you to demonstrate to yourself and others that the correlations you’re seeing really exist. In this talk, Hillary Green-Lerman introduces this problem, and then describes solutions which allow you to confidently describe the conclusions you’re drawing from your data-science experiment.

[Video 271] Rachel Rakov: Using Python for Sarcasm Detection in Speech

When you speak with someone else, are you ever sarcastic? Of course you are. How do you know if someone is being sarcastic when they speak? And is there a way that we can detect that sarcasm automatically? It turns out that the answer is “yes” — we can detect sarcasm automatically, to a larger degree than you might believe. Amazingly enough, this can be done using a lot of common, open-source tools written in Python. In this talk, Rachel Rakov describes her research into the detection of sarcasm, and describes the ways in which she is using Python tools to conduct it.

[Video 246] Soren Macbeth: Data Science in Clojure

Data science is a growing field, in size and importance. Data scientists use a variety of tools and languages to accomplish their goals — including, according to Soren Macbeth, Clojure. In this talk, Macbeth describes how Clojure can be used for data science, what libraries are available for Clojure developers, and why Clojure is a good choice for developers looking to join the ranks of data scientists.

 

[Video 239] Matthias Bussonnier, Jonathan Frederic, and Thomas Kluyver: Jupyter Advanced Topics Tutorial

If you have been using Jupyter (aka IPython) notebook for your work in Python, then you’re in good company; many developers (including me) now use it instead of the text-based interactive Python shell. Just using Jupyter has dramatically improved my productivity. However, it turns out that Jupyter, like many open-source projects, is highly customizable. In this talk, Matthias Bussonnier, Jonathan Frederic, and Thomas Kluyver show us how we can customize and change Jupyter Notebook, to make it a custom environment, special for our needs.

[Video 237] Michael Stonebreaker: Turing Lecture, 2015

If you use databases, then you almost certainly should be grateful to Michael Stonebreaker, who has been researching, creating, and advancing databases for many decades. Stonebreaker was awarded this year’s Turing award (the top prize in computer science) by the Association for Computing Machinery (ACM), recognizing his work. In this lecture, Stonebreaker gives us a survey of database history and technology, as well as where databases are headed. Whether you are a fan of SQL or NoSQL, anyone who uses databases should listen to this talk.

[Video 226] Cathy O’Neil: Keynote, Yale Day of Data

Data science doesn’t exist in a vacuum. Data scientists work with companies, governments, non-profits, universities, newspapers, and other organizations. Each organization has its own priorities, and has its own take on data science — why to use it, and what to do with the results it produces. In this talk, Cathy O’Neil describes the different types of motivations and pressures data scientists encounter, and what it’ll take (hint: communication, connections, and community) for high-quality data science to be increasingly ubiquitous and useful.

[Video 209] Jeremy Howard: The Data Science Revolution

We hear a lot about “big data” and “machine learning.” But why are these topics interesting? What sorts of advantages can they give us? In this talk, Jeremy Howard describes the ways in which machine learning and data analysis have already changed the worlds of business and medicine, and ways in which we can expect to see our lives change in the future. If you’ve wondered why people are getting excited about “big data,” this is a high-level, entertaining explanation.

[Video 202] Edmund Jackson: Clojure Data Science

Data science is a collection of disciplines that allow us to look through large collections of data, in order to discover and understand the patterns hidden within.  Clojure has emerged as an excellent language for doing data science — because of its interoperability with the JVM, along with  its clean and functional syntax. In this talk, Edmund Jackson demonstrates why you would want to use Clojure for data science, and how you could go about doing so.

[Video 191] Mark Madsen: Following Google — or Don’t follow the followers, follow the leader

Computers have brought the issues of user interface (UI/UX) to the forefront — but designers and engineers have been considering these issues for thousands of years, across many different technologies. Creating and improving technologies is never a matter of right vs. wrong, but rather a matter of balancing the trade-offs. In this talk, Mark Madsen describes a number of the trade-offs that engineers have made throughout history, and then tells us how this perspective applies to SQL vs. NoSQL and big data analytics. Even if you’re not working with big data, this talk is a fun and clever history of various data storage and retrieval mechanisms, starting with clay tablets.

[Video 187] Kyle Kastner: Machine Learning 101

Machine learning” sounds like the basis for a new (and possibly bad) science fiction movie. But in fact, it’s the idea that we can ask the computer to identify patterns in large data sets — patterns that we would otherwise miss. Now, machine learning isn’t new, but it has become increasingly important, given the rise of big data, and the need for businesses to understand their customers better. Python has become an increasingly popular tool for implementing machine learning, thanks in part to scikit-learn, a Python package built on top of NumPy and SciPy.  In this talk, Kyle Kastner introduces scikit-learn, and describes what we can do it with, as well as how to do so.