Cover image of Data Skeptic
(418)
Technology
Science

Data Skeptic

Updated 5 days ago

Technology
Science
Read more

The Data Skeptic Podcast features interviews and discussion of topics related to data science, statistics, machine learning, artificial intelligence and the like, all from the perspective of applying critical thinking and the scientific method to evaluate the veracity of claims and efficacy of approaches.

Read more

The Data Skeptic Podcast features interviews and discussion of topics related to data science, statistics, machine learning, artificial intelligence and the like, all from the perspective of applying critical thinking and the scientific method to evaluate the veracity of claims and efficacy of approaches.

iTunes Ratings

418 Ratings
Average Ratings
284
82
22
20
10

All things Microsoft

By almennur - Dec 10 2019
Read more
The recent episodes suspiciously feature only people from Microsoft who tout Azure.

Could be better

By LizRr78 - Nov 27 2019
Read more
Would rate higher, but the “banter” drives me up a wall.

iTunes Ratings

418 Ratings
Average Ratings
284
82
22
20
10

All things Microsoft

By almennur - Dec 10 2019
Read more
The recent episodes suspiciously feature only people from Microsoft who tout Azure.

Could be better

By LizRr78 - Nov 27 2019
Read more
Would rate higher, but the “banter” drives me up a wall.
Cover image of Data Skeptic

Data Skeptic

Latest release on Feb 22, 2020

The Best Episodes Ranked Using User Listens

Updated by OwlTail 5 days ago

Rank #1: Machine Learning Done Wrong

Podcast cover
Read more

Cheng-tao Chu (@chengtao_chu) joins us this week to discuss his perspective on common mistakes and pitfalls that are made when doing machine learning. This episode is filled with sage advice for beginners and intermediate users of machine learning, and possibly some good reminders for experts as well. Our discussion parallels his recent blog postMachine Learning Done Wrong.

Cheng-tao Chu is an entrepreneur who has worked at many well known silicon valley companies. His paper Map-Reduce for Machine Learning on Multicore is the basis for Apache Mahout. His most recent endeavor has just emerged from steath, so please check out OneInterview.io.

Apr 01 2016

25mins

Play

Rank #2: Advertising Attribution with Nathan Janos

Podcast cover
Read more

A conversation with Convertro's Nathan Janos about methodologies used to help advertisers understand the affect each of their marketing efforts (print, SEM, display, skywriting, etc.) contributes to their overall return.

Jun 06 2014

1hr 16mins

Play

Rank #3: [MINI] Backpropagation

Podcast cover
Read more

Backpropagation is a common algorithm for training a neural network.  It works by computing the gradient of each weight with respect to the overall error, and using stochastic gradient descent to iteratively fine tune the weights of the network.  In this episode, we compare this concept to finding a location on a map, marble maze games, and golf.

Apr 07 2017

15mins

Play

Rank #4: Being Bayesian

Podcast cover
Read more

This episode explores the root concept of what it is to be Bayesian: describing knowledge of a system probabilistically, having an appropriate prior probability, know how to weigh new evidence, and following Bayes's rule to compute the revised distribution.

We present this concept in a few different contexts but primarily focus on how our bird Yoshi sends signals about her food preferences.

Like many animals, Yoshi is a complex creature whose preferences cannot easily be summarized by a straightforward utility function the way they might in a textbook reinforcement learning problem. Her preferences are sequential, conditional, and evolving. We may not always know what our bird is thinking, but we have some good indicators that give us clues.

Oct 26 2018

24mins

Play

Rank #5: [MINI] The Girlfriend Equation

Podcast cover
Read more

Economist Peter Backus put forward "The Girlfriend Equation" while working on his PhD - a probabilistic model attempting to estimate the likelihood of him finding a girlfriend. In this mini episode we explore the soundness of his model and also share some stories about how Linhda and Kyle met.

Nov 28 2014

16mins

Play

Rank #6: [MINI] Recurrent Neural Networks

Podcast cover
Read more

RNNs are a class of deep learning models designed to capture sequential behavior.  An RNN trains a set of weights which depend not just on new input but also on the previous state of the neural network.  This directed cycle allows the training phase to find solutions which rely on the state at a previous time, thus giving the network a form of memory.  RNNs have been used effectively in language analysis, translation, speech recognition, and many other tasks.

Aug 18 2017

17mins

Play

Rank #7: [MINI] The T-Test

Podcast cover
Read more

The t-test is this week's mini-episode topic. The t-test is a statistical testing procedure used to determine if the mean of two datasets differs by a statistically significant amount. We discuss how a wine manufacturer might apply a t-test to determine if the sweetness, acidity, or some other property of two separate grape vines might differ in a statistically meaningful way.

Oct 17 2014

17mins

Play

Rank #8: Zillow Zestimate

Podcast cover
Read more

Zillow is a leading real estate information and home-related marketplace. We interviewed Andrew Martin, a data science Research Manager at Zillow, to learn more about how Zillow uses data science and big data to make real estate predictions.

Sep 01 2017

37mins

Play

Rank #9: Quantum Computing

Podcast cover
Read more

In this week's episode, Scott Aaronson, a professor at the University of Texas at Austin, explains what a quantum computer is, various possible applications, the types of problems they are good at solving and much more. Kyle and Scott have a lively discussion about the capabilities and limits of quantum computers and computational complexity.

Dec 01 2017

47mins

Play

Rank #10: Crypto

Podcast cover
Read more

How do people think rationally about small probability events?

What is the optimal statistical process by which one can update their beliefs in light of new evidence?

This episode of Data Skeptic explores questions like this as Kyle consults a cast of previous guests and experts to try and answer the question "What is the probability, however small, that Bigfoot is real?"

Jul 17 2015

1hr 24mins

Play

Rank #11: [MINI] Logistic Regression on Audio Data

Podcast cover
Read more

Logistic Regression is a popular classification algorithm. In this episode, we discuss how it can be used to determine if an audio clip represents one of two given speakers. It assumes an output variable (isLinhda) is a linear combination of available features, which are spectral bands in the discussion on this episode.

Keep an eye on the dataskeptic.com blog this week as we post more details about this project.

Thanks to our sponsor this week, the Data Science Association.  Please check out their upcoming conference in Dallas on Saturday, February 18th, 2017 via the link below.

dallasdatascience.eventbrite.com

Jan 27 2017

20mins

Play

Rank #12: [MINI] Activation Functions

Podcast cover
Read more

In a neural network, the output value of a neuron is almost always transformed in some way using a function. A trivial choice would be a linear transformation which can only scale the data. However, other transformations, like a step function allow for non-linear properties to be introduced.

Activation functions can also help to standardize your data between layers. Some functions such as the sigmoid have the effect of "focusing" the area of interest on data. Extreme values are placed close together, while values near it's point of inflection change more quickly with respect to small changes in the input. Similarly, these functions can take any real number and map all of them to a finite range such as [0, 1] which can have many advantages for downstream calculation.

In this episode, we overview the concept and discuss a few reasons why you might select one function verse another.

Jun 16 2017

14mins

Play

Rank #13: Opinion Polls for Presidential Elections

Podcast cover
Read more

Recently, we've seen opinion polls come under some skepticism.  But is that skepticism truly justified?  The recent Brexit referendum and US 2016 Presidential Election are examples where some claims the polls "got it wrong".  This episode explores this idea.

Apr 28 2017

52mins

Play

Rank #14: [MINI] Multiple Regression

Podcast cover
Read more

This episode is a discussion of multiple regression: the use of observations that are a vector of values to predict a response variable. For this episode, we consider how features of a home such as the number of bedrooms, number of bathrooms, and square footage can predict the sale price.

Unlike a typical episode of Data Skeptic, these show notes are not just supporting material, but are actually featured in the episode.

The site Redfin gratiously allows users to download a CSV of results they are viewing. Unfortunately, they limit this extract to 500 listings, but you can still use it to try the same approach on your own using the download link shown in the figure below.

Feb 19 2016

18mins

Play

Rank #15: Data Science at Patreon

Podcast cover
Read more

In this week's episode of Data Skeptic, host Kyle Polich talks with guest Maura Church, Patreon's data science manager. Patreon is a fast-growing crowdfunding platform that allows artists and creators of all kinds build their own subscription content service. The platform allows fans to become patrons of their favorite artists- an idea similar the Renaissance times, when musicians would rely on benefactors to become their patrons so they could make more art. At Patreon, Maura's data science team strives to provide creators with insight, information, and tools, so that creators can focus on what they do best-- making art.

On the show, Maura talks about some of her projects with the data science team at Patreon. Among the several topics discussed during the episode include: optical music recognition (OMR) to translate musical scores to electronic format, network analysis to understand the connection between creators and patrons, growth forecasting and modeling in a new market, and churn modeling to determine predictors of long time support.

A more detailed explanation of Patreon's A/B testing framework can be found here

Other useful links to topics mentioned during the show:

OMR research

Patreon blog

Patreon HQ blog

Amanda Palmer

Fran Meneses

Mar 31 2017

32mins

Play

Rank #16: [MINI] Automated Feature Engineering

Podcast cover
Read more

If a CEO wants to know the state of their business, they ask their highest ranking executives. These executives, in turn, should know the state of the business through reports from their subordinates. This structure is roughly analogous to a process observed in deep learning, where each layer of the business reports up different types of observations, KPIs, and reports to be interpreted by the next layer of the business. In deep learning, this process can be thought of as automated feature engineering. DNNs built to recognize objects in images may learn structures that behave like edge detectors in the first hidden layer. Proceeding layers learn to compose more abstract features from lower level outputs. This episode explore that analogy in the context of automated feature engineering.

Linh Da and Kyle discuss a particular image in this episode. The image included below in the show notes is drawn from the work of Lee, Grosse, Ranganath, and Ng in their paper Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations.

Feb 24 2017

16mins

Play

Rank #17: [MINI] p-values

Podcast cover
Read more

In this mini, we discuss p-values and their use in hypothesis testing, in the context of an hypothetical experiment on plant flowering, and end with a reference to the Particle Fever documentary and how statistical significance played a role.

Jun 13 2014

16mins

Play

Rank #18: Data Science Hiring Processes

Podcast cover
Read more

Kyle shares a few thoughts on mistakes observed by job applicants and also shares a few procedural insights listeners at early stages in their careers might find value in.

Dec 28 2018

33mins

Play

Rank #19: Applied Data Science in Industry

Podcast cover
Read more

Kyle sits down with Jen Stirrup to inquire about her experiences helping companies deploy data science solutions in a variety of different settings.

Sep 06 2019

21mins

Play

Rank #20: [MINI] The Chi-Squared Test

Podcast cover
Read more

The χ2 (Chi-Squared) test is a methodology for hypothesis testing. When one has categorical data, in the form of frequency counts or observations (e.g. Vegetarian, Pescetarian, and Omnivore), split into two or more categories (e.g. Male, Female), a question may arrise such as "Are women more likely than men to be vegetarian?" or put more accurately, "Is any observed difference in the frequency with which women report being vegetarian differ in a statistically significant way from the frequency men report that?"

Feb 06 2015

17mins

Play