Cover image of Data Skeptic
(408)
Technology
Science

Data Skeptic

Updated 3 days ago

Technology
Science
Read more

The Data Skeptic Podcast features interviews and discussion of topics related to data science, statistics, machine learning, artificial intelligence and the like, all from the perspective of applying critical thinking and the scientific method to evaluate the veracity of claims and efficacy of approaches.

Read more

The Data Skeptic Podcast features interviews and discussion of topics related to data science, statistics, machine learning, artificial intelligence and the like, all from the perspective of applying critical thinking and the scientific method to evaluate the veracity of claims and efficacy of approaches.

iTunes Ratings

408 Ratings
Average Ratings
279
80
21
19
9

All things Microsoft

By almennur - Dec 10 2019
Read more
The recent episodes suspiciously feature only people from Microsoft who tout Azure.

Could be better

By LizRr78 - Nov 27 2019
Read more
Would rate higher, but the “banter” drives me up a wall.

iTunes Ratings

408 Ratings
Average Ratings
279
80
21
19
9

All things Microsoft

By almennur - Dec 10 2019
Read more
The recent episodes suspiciously feature only people from Microsoft who tout Azure.

Could be better

By LizRr78 - Nov 27 2019
Read more
Would rate higher, but the “banter” drives me up a wall.

Listen to:

Cover image of Data Skeptic

Data Skeptic

Updated 3 days ago

Read more

The Data Skeptic Podcast features interviews and discussion of topics related to data science, statistics, machine learning, artificial intelligence and the like, all from the perspective of applying critical thinking and the scientific method to evaluate the veracity of claims and efficacy of approaches.

Being Bayesian

Podcast cover
Read more

This episode explores the root concept of what it is to be Bayesian: describing knowledge of a system probabilistically, having an appropriate prior probability, know how to weigh new evidence, and following Bayes's rule to compute the revised distribution.

We present this concept in a few different contexts but primarily focus on how our bird Yoshi sends signals about her food preferences.

Like many animals, Yoshi is a complex creature whose preferences cannot easily be summarized by a straightforward utility function the way they might in a textbook reinforcement learning problem. Her preferences are sequential, conditional, and evolving. We may not always know what our bird is thinking, but we have some good indicators that give us clues.

Oct 26 2018

24mins

Play

Zillow Zestimate

Podcast cover
Read more

Zillow is a leading real estate information and home-related marketplace. We interviewed Andrew Martin, a data science Research Manager at Zillow, to learn more about how Zillow uses data science and big data to make real estate predictions.

Sep 01 2017

37mins

Play

[MINI] Convolutional Neural Networks

Podcast cover
Read more

CNNs are characterized by their use of a group of neurons typically referred to as a filter or kernel.  In image recognition, this kernel is repeated over the entire image.  In this way, CNNs may achieve the property of translational invariance - once trained to recognize certain things, changing the position of that thing in an image should not disrupt the CNN's ability to recognize it.  In this episode, we discuss a few high-level details of this important architecture.

May 19 2017

14mins

Play

Advertising Attribution with Nathan Janos

Podcast cover
Read more

A conversation with Convertro's Nathan Janos about methodologies used to help advertisers understand the affect each of their marketing efforts (print, SEM, display, skywriting, etc.) contributes to their overall return.

Jun 06 2014

1hr 16mins

Play

[MINI] GPU CPU

Podcast cover
Read more

There's more than one type of computer processor. The central processing unit (CPU) is typically what one means when they say "processor". GPUs were introduced to be highly optimized for doing floating point computations in parallel. These types of operations were very useful for high end video games, but as it turns out, those same processors are extremely useful for machine learning. In this mini-episode we discuss why.

Apr 14 2017

11mins

Play

Game Theory

Podcast cover
Read more

Thanks to our sponsor The Great Courses.

This week's episode is a short primer on game theory.

For tickets to the free Data Skeptic meetup in Chicago on Tuesday, May 15 at the Mendoza College of Business (224 South Michigan Avenue, Suite 350), click here,

May 11 2018

24mins

Play

Quantum Computing

Podcast cover
Read more

In this week's episode, Scott Aaronson, a professor at the University of Texas at Austin, explains what a quantum computer is, various possible applications, the types of problems they are good at solving and much more. Kyle and Scott have a lively discussion about the capabilities and limits of quantum computers and computational complexity.

Dec 01 2017

47mins

Play

Applied Data Science in Industry

Podcast cover
Read more

Kyle sits down with Jen Stirrup to inquire about her experiences helping companies deploy data science solutions in a variety of different settings.

Sep 06 2019

21mins

Play

[MINI] The T-Test

Podcast cover
Read more

The t-test is this week's mini-episode topic. The t-test is a statistical testing procedure used to determine if the mean of two datasets differs by a statistically significant amount. We discuss how a wine manufacturer might apply a t-test to determine if the sweetness, acidity, or some other property of two separate grape vines might differ in a statistically meaningful way.

Oct 17 2014

17mins

Play

[MINI] Recurrent Neural Networks

Podcast cover
Read more

RNNs are a class of deep learning models designed to capture sequential behavior.  An RNN trains a set of weights which depend not just on new input but also on the previous state of the neural network.  This directed cycle allows the training phase to find solutions which rely on the state at a previous time, thus giving the network a form of memory.  RNNs have been used effectively in language analysis, translation, speech recognition, and many other tasks.

Aug 18 2017

17mins

Play

Data Science Hiring Processes

Podcast cover
Read more

Kyle shares a few thoughts on mistakes observed by job applicants and also shares a few procedural insights listeners at early stages in their careers might find value in.

Dec 28 2018

33mins

Play

Crypto

Podcast cover
Read more

How do people think rationally about small probability events?

What is the optimal statistical process by which one can update their beliefs in light of new evidence?

This episode of Data Skeptic explores questions like this as Kyle consults a cast of previous guests and experts to try and answer the question "What is the probability, however small, that Bigfoot is real?"

Jul 17 2015

1hr 24mins

Play

Machine Learning Done Wrong

Podcast cover
Read more

Cheng-tao Chu (@chengtao_chu) joins us this week to discuss his perspective on common mistakes and pitfalls that are made when doing machine learning. This episode is filled with sage advice for beginners and intermediate users of machine learning, and possibly some good reminders for experts as well. Our discussion parallels his recent blog postMachine Learning Done Wrong.

Cheng-tao Chu is an entrepreneur who has worked at many well known silicon valley companies. His paper Map-Reduce for Machine Learning on Multicore is the basis for Apache Mahout. His most recent endeavor has just emerged from steath, so please check out OneInterview.io.

Apr 01 2016

25mins

Play

[MINI] p-values

Podcast cover
Read more

In this mini, we discuss p-values and their use in hypothesis testing, in the context of an hypothetical experiment on plant flowering, and end with a reference to the Particle Fever documentary and how statistical significance played a role.

Jun 13 2014

16mins

Play

Opinion Polls for Presidential Elections

Podcast cover
Read more

Recently, we've seen opinion polls come under some skepticism.  But is that skepticism truly justified?  The recent Brexit referendum and US 2016 Presidential Election are examples where some claims the polls "got it wrong".  This episode explores this idea.

Apr 28 2017

52mins

Play

The Master Algorithm

Podcast cover
Read more

In this week’s episode, Kyle Polich interviews Pedro Domingos about his book, The Master Algorithm: How the quest for the ultimate learning machine will remake our world. In the book, Domingos describes what machine learning is doing for humanity, how it works and what it could do in the future. He also hints at the possibility of an ultimate learning algorithm, in which the machine uses it will be able to derive all knowledge — past, present, and future.

Mar 16 2018

46mins

Play

[MINI] Decision Tree Learning

Podcast cover
Read more

Linhda and Kyle talk about Decision Tree Learning in this miniepisode.  Decision Tree Learning is the algorithmic process of trying to generate an optimal decision tree to properly classify or forecast some future unlabeled element based by following each step in the tree.

Sep 05 2014

13mins

Play

Let's Talk About Natural Language Processing

Podcast cover
Read more

This episode reboots our podcast with the theme of Natural Language Processing for the next few months.

We begin with introductions of Yoshi and Linh Da and then get into a broad discussion about natural language processing: what it is, what some of the classic problems are, and just a bit on approaches.

Finishing out the show is an interview with Lucy Park about her work on the KoNLPy library for Korean NLP in Python.

If you want to share your NLP project, please join our Slack channel.  We're eager to see what listeners are working on!

http://konlpy.org/en/latest/

Jan 04 2019

36mins

Play

[MINI] Bayesian Belief Networks

Podcast cover
Read more

A Bayesian Belief Network is an acyclic directed graph composed of nodes that represent random variables and edges that imply a conditional dependence between them. It's an intuitive way of encoding your statistical knowledge about a system and is efficient to propagate belief updates throughout the network when new information is added.

Aug 04 2017

17mins

Play

[MINI] Markov Chain Monte Carlo

Podcast cover
Read more

This episode explores how going wine testing could teach us about using markov chain monte carlo (mcmc).

Apr 03 2015

15mins

Play

Serverless NLP Model Training

Podcast cover
Read more

Alex Reeves joins us to discuss some of the challenges around building a serverless, scalable, generic machine learning pipeline.  The is a technical deep dive on architecting solutions and a discussion of some of the design choices made.

Dec 10 2019

29mins

Play

Team Data Science Process

Podcast cover
Read more

Buck Woody joins Kyle to share experiences from the field and the application of the Team Data Science Process - a popular six-phase workflow for doing data science.

Dec 03 2019

41mins

Play

Ancient Text Restoration

Podcast cover
Read more

Thea Sommerschield joins us this week to discuss the development of Pythia - a machine learning model trained to assist in the reconstruction of ancient language text.

Dec 01 2019

41mins

Play

ML Ops

Podcast cover
Read more

Kyle met up with Damian Brady at MS Ignite 2019 to discuss machine learning operations.

Nov 27 2019

36mins

Play

Annotator Bias

Podcast cover
Read more

The modern deep learning approaches to natural language processing are voracious in their demands for large corpora to train on.  Folk wisdom estimates used to be around 100k documents were required for effective training.  The availability of broadly trained, general-purpose models like BERT has made it possible to do transfer learning to achieve novel results on much smaller corpora.

Thanks to these advancements, an NLP researcher might get value out of fewer examples since they can use the transfer learning to get a head start and focus on learning the nuances of the language specifically relevant to the task at hand.  Thus, small specialized corpora are both useful and practical to create.

In this episode, Kyle speaks with Mor Geva, lead author on the recent paper Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets, which explores some unintended consequences of the typical procedure followed for generating corpora.

Source code for the paper available here: https://github.com/mega002/annotator_bias

Nov 23 2019

25mins

Play

NLP for Developers

Podcast cover
Read more

While at MS Build 2019, Kyle sat down with Lance Olson from the Applied AI team about how tools like cognitive services and cognitive search enable non-data scientists to access relatively advanced NLP tools out of box, and how more advanced data scientists can focus more time on the bigger picture problems.

Nov 20 2019

29mins

Play

Indigenous American Language Research

Podcast cover
Read more

Manuel Mager joins us to discuss natural language processing for low and under-resourced languages.  We discuss current work in this area and the Naki Project which aggregates research on NLP for native and indigenous languages of the American continent.

Nov 13 2019

22mins

Play

Talking to GPT-2

Podcast cover
Read more

GPT-2 is yet another in a succession of models like ELMo and BERT which adopt a similar deep learning architecture and train an unsupervised model on a massive text corpus.

As we have been covering recently, these approaches are showing tremendous promise, but how close are they to an AGI?  Our guest today, Vazgen Davidyants wondered exactly that, and have conversations with a Chatbot running GPT-2.  We discuss his experiences as well as some novel thoughts on artificial intelligence.

Oct 31 2019

29mins

Play

Reproducing Deep Learning Models

Podcast cover
Read more

Rajiv Shah attempted to reproduce an earthquake-predicting deep learning model.  His results exposed some issues with the model.  Kyle and Rajiv discuss the original paper and Rajiv's analysis.

Oct 23 2019

22mins

Play

What BERT is Not

Podcast cover
Read more

Allyson Ettinger joins us to discuss her work in computational linguistics, specifically in exploring some of the ways in which the popular natural language processing approach BERT has limitations.

Oct 14 2019

27mins

Play

SpanBERT

Podcast cover
Read more

Omer Levy joins us to discuss "SpanBERT: Improving Pre-training by Representing and Predicting Spans".

https://arxiv.org/abs/1907.10529

Oct 08 2019

24mins

Play

BERT is Shallow

Podcast cover
Read more

Tim Niven joins us this week to discuss his work exploring the limits of what BERT can do on certain natural language tasks such as adversarial attacks, compositional learning, and systematic learning.

Sep 23 2019

20mins

Play

BERT is Magic

Podcast cover
Read more

Kyle pontificates on how impressed he is with BERT.

Sep 16 2019

18mins

Play

Applied Data Science in Industry

Podcast cover
Read more

Kyle sits down with Jen Stirrup to inquire about her experiences helping companies deploy data science solutions in a variety of different settings.

Sep 06 2019

21mins

Play

Building the howto100m Video Corpus

Podcast cover
Read more

Video annotation is an expensive and time-consuming process. As a consequence, the available video datasets are useful but small. The availability of machine transcribed explainer videos offers a unique opportunity to rapidly develop a useful, if dirty, corpus of videos that are "self annotating", as hosts explain the actions they are taking on the screen.

This episode is a discussion of the HowTo100m dataset - a project which has assembled a video corpus of 136M video clips with captions covering 23k activities.

Related Links

The paper will be presented at ICCV 2019

@antoine77340

Antoine on Github

Antoine's homepage

Aug 19 2019

22mins

Play

BERT

Podcast cover
Read more

Kyle provides a non-technical overview of why Bidirectional Encoder Representations from Transformers (BERT) is a powerful tool for natural language processing projects.

Jul 29 2019

13mins

Play

Onnx

Podcast cover
Read more

Kyle interviews Prasanth Pulavarthi about the Onnx format for deep neural networks.

Jul 22 2019

20mins

Play

Catastrophic Forgetting

Podcast cover
Read more

Kyle and Linhda discuss some high level theory of mind and overview the concept machine learning concept of catastrophic forgetting.

Jul 15 2019

21mins

Play

Transfer Learning

Podcast cover
Read more

Sebastian Ruder is a research scientist at DeepMind.  In this episode, he joins us to discuss the state of the art in transfer learning and his contributions to it.

Jul 08 2019

29mins

Play

Facebook Bargaining Bots Invented a Language

Podcast cover
Read more

In 2017, Facebook published a paper called Deal or No Deal? End-to-End Learning for Negotiation Dialogues. In this research, the reinforcement learning agents developed a mechanism of communication (which could be called a language) that made them able to optimize their scores in the negotiation game. Many media sources reported this as if it were a first step towards Skynet taking over. In this episode, Kyle discusses bargaining agents and the actual results of this research.

Jun 21 2019

23mins

Play