Cover image of Data Science at Home
(44)
Technology
News
Tech News

Data Science at Home

Updated 8 days ago

Technology
News
Tech News
Read more

Technology, machine learning and algorithms. Come join the discussion! https://discord.gg/4UNKGf3

Read more

Technology, machine learning and algorithms. Come join the discussion! https://discord.gg/4UNKGf3

iTunes Ratings

44 Ratings
Average Ratings
26
11
2
2
3

Yes

By stevensforjesus - Nov 04 2017
Read more
This is so good

Very well done

By Charliea472 - Aug 02 2016
Read more
Very clear explanations of data science concepts.

iTunes Ratings

44 Ratings
Average Ratings
26
11
2
2
3

Yes

By stevensforjesus - Nov 04 2017
Read more
This is so good

Very well done

By Charliea472 - Aug 02 2016
Read more
Very clear explanations of data science concepts.
Cover image of Data Science at Home

Data Science at Home

Latest release on Feb 22, 2020

The Best Episodes Ranked Using User Listens

Updated by OwlTail 8 days ago

Rank #1: What is wrong with reinforcement learning? (Ep. 82)

Podcast cover
Read more

Join the discussion on our Discord server

After reinforcement learning agents doing great at playing Atari video games, Alpha Go, doing financial trading, dealing with language modeling, let me tell you the real story here.In this episode I want to shine some light on reinforcement learning (RL) and the limitations that every practitioner should consider before taking certain directions. RL seems to work so well! What is wrong with it?

Are you a listener of Data Science at Home podcast? A reader of the Amethix Blog? Or did you subscribe to the Artificial Intelligence at your fingertips newsletter? In any case let’s stay in touch! https://amethix.com/survey/

 

References

Oct 15 2019

21mins

Play

Rank #2: Episode 25: How to become data scientist [RB]

Podcast cover
Read more

In this episode, I speak about the requirements and the skills to become data scientist and join an amazing community that is changing the world with data analyticsa

Oct 16 2017

16mins

Play

Rank #3: Episode 39: What is L1-norm and L2-norm?

Podcast cover
Read more

In this episode I explain the differences between L1 and L2 regularization that you can find in function minimization in basically any machine learning model.

Jul 19 2018

21mins

Play

Rank #4: Episode 38: Collective intelligence (Part 1)

Podcast cover
Read more

This is the first part of the amazing episode with Johannes Castner, CEO and founder of CollectiWise. Johannes is finishing his PhD in Sustainable Development from Columbia University in New York City, and he is building a platform for collective intelligence. Today we talk about artificial general intelligence and wisdom.

All references and shownotes will be published after the next episode.Enjoy and stay tuned!

Jul 12 2018

30mins

Play

Rank #5: Have you met Shannon? Conversation with Jimmy Soni and Rob Goodman about one of the greatest minds in history (Ep. 81)

Podcast cover
Read more

Join the discussion on our Discord server

In this episode I have an amazing conversation with Jimmy Soni and Rob Goodman, authors of “A mind at play”, a book entirely dedicated to the life and achievements of Claude Shannon. Claude Shannon does not need any introduction. But for those who need a refresh, Shannon is the inventor of the information age. 

Have you heard of binary code, entropy in information theory, data compression theory (the stuff behind mp3, mpg, zip, etc.), error correcting codes (the stuff that makes your RAM work well), n-grams, block ciphers, the beta distribution, the uncertainty coefficient?

All that stuff has been invented by Claude Shannon :) 

 
Articles: 
https://medium.com/the-mission/10-000-hours-with-claude-shannon-12-lessons-on-life-and-learning-from-a-genius-e8b9297bee8f https://medium.com/the-mission/on-claude-shannons-103rd-birthday-here-are-103-memorable-claude-shannon-quotes-maxims-and-843de4c716cf?source=your_stories_page--------------------------- http://nautil.us/issue/51/limits/how-information-got-re_invented http://nautil.us/issue/50/emergence/claude-shannon-the-las-vegas-cheat
Claude's papers:
https://medium.com/the-mission/a-genius-explains-how-to-be-creative-claude-shannons-long-lost-1952-speech-fbbcb2ebe07f http://www.math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf
 
A mind at play (book links): 
http://amzn.to/2pasLMz -- Hardcover
https://amzn.to/2oCfVL0 -- Audio

Oct 10 2019

32mins

Play

Rank #6: Episode 67: Classic Computer Science Problems in Python

Podcast cover
Read more

Today I am with David Kopec, author of Classic Computer Science Problems in Python, published by Manning Publications.

His book deepens your knowledge of problem solving techniques from the realm of computer science by challenging you with interesting and realistic scenarios, exercises, and of course algorithms. There are examples in the major topics any data scientist should be familiar with, for example search, clustering, graphs, and much more.

Get the book from https://www.manning.com/books/classic-computer-science-problems-in-python and use coupon code poddatascienceathome19 to get 40% discount.

 

References

Twitter https://twitter.com/davekopec

GitHub https://github.com/davecom

classicproblems.com

Jul 02 2019

28mins

Play

Rank #7: Attacking machine learning for fun and profit (with the authors of SecML Ep. 80)

Podcast cover
Read more

Join the discussion on our Discord server

As ML plays a more and more relevant role in many domains of everyday life, it’s quite obvious to see more and more attacks to ML systems. In this episode we talk about the most popular attacks against machine learning systems and some mitigations designed by researchers Ambra Demontis and Marco Melis, from the University of Cagliari (Italy). The guests are also the authors of SecML, an open-source Python library for the security evaluation of Machine Learning (ML) algorithms. Both Ambra and Marco are members of research group PRAlab, under the supervision of Prof. Fabio Roli. 

SecML Contributors

Marco Melis (Ph.D Student, Project Maintainer, https://www.linkedin.com/in/melismarco/)Ambra Demontis (Postdoc, https://pralab.diee.unica.it/it/AmbraDemontis) Maura Pintor (Ph.D Student, https://it.linkedin.com/in/maura-pintor)Battista Biggio (Assistant Professor, https://pralab.diee.unica.it/it/BattistaBiggio)

References

SecML: an open-source Python library for the security evaluation of Machine Learning (ML) algorithms https://secml.gitlab.io/.

Demontis et al., “Why Do Adversarial Attacks Transfer? Explaining Transferability of Evasion and Poisoning Attacks,” presented at the 28th USENIX Security Symposium (USENIX Security 19), 2019, pp. 321–338. https://www.usenix.org/conference/usenixsecurity19/presentation/demontis

W. Koh and P. Liang, “Understanding Black-box Predictions via Influence Functions,” in International Conference on Machine Learning (ICML), 2017. https://arxiv.org/abs/1703.04730

Melis, A. Demontis, B. Biggio, G. Brown, G. Fumera, and F. Roli, “Is Deep Learning Safe for Robot Vision? Adversarial Examples Against the iCub Humanoid,” in 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), 2017, pp. 751–759. https://arxiv.org/abs/1708.06939

Biggio and F. Roli, “Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning,” Pattern Recognition, vol. 84, pp. 317–331, 2018. https://arxiv.org/abs/1712.03141

Biggio et al., “Evasion attacks against machine learning at test time,” in Machine Learning and Knowledge Discovery in Databases (ECML PKDD), Part III, 2013, vol. 8190, pp. 387–402. https://arxiv.org/abs/1708.06131

Biggio, B. Nelson, and P. Laskov, “Poisoning attacks against support vector machines,” in 29th Int’l Conf. on Machine Learning, 2012, pp. 1807–1814. https://arxiv.org/abs/1206.6389

Dalvi, P. Domingos, Mausam, S. Sanghai, and D. Verma, “Adversarial classification,” in Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Seattle, 2004, pp. 99–108. https://dl.acm.org/citation.cfm?id=1014066

Sundararajan, Mukund, Ankur Taly, and Qiqi Yan. "Axiomatic attribution for deep networks." Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017. https://arxiv.org/abs/1703.01365

Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. "Model-agnostic interpretability of machine learning." arXiv preprint arXiv:1606.05386 (2016). https://arxiv.org/abs/1606.05386

Guo, Wenbo, et al. "Lemna: Explaining deep learning based security applications." Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. ACM, 2018. https://dl.acm.org/citation.cfm?id=3243792

Bach, Sebastian, et al. "On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation." PloS one 10.7 (2015): E0130140. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0130140 

Oct 01 2019

34mins

Play

Rank #8: Episode 53: Estimating uncertainty with neural networks

Podcast cover
Read more

Have you ever wanted to get an estimate of the uncertainty of your neural network? Clearly Bayesian modelling provides a solid framework to estimate uncertainty by design. However, there are many realistic cases in which Bayesian sampling is not really an option and ensemble models can play a role.

In this episode I describe a simple yet effective way to estimate uncertainty, without changing your neural network’s architecture nor your machine learning pipeline at all.

The post with mathematical background and sample source code is published here.

Jan 23 2019

15mins

Play

Rank #9: Episode 44: The predictive power of metadata

Podcast cover
Read more

In this episode I don't talk about data. In fact, I talk about metadata.

While many machine learning models rely on certain amounts of data eg. text, images, audio and video, it has been proved how powerful is the signal carried by metadata, that is all data that is invisible to the end user.Behind a tweet of 140 characters there are more than 140 fields of data that draw a much more detailed profile of the sender and the content she is producing... without ever considering the tweet itself.

References You are your Metadata: Identification and Obfuscation of Social Media Users using Metadata Information https://www.ucl.ac.uk/~ucfamus/papers/icwsm18.pdf

Aug 21 2018

21mins

Play

Rank #10: Episode 63: Financial time series and machine learning

Podcast cover
Read more

In this episode I speak to Alexandr Honchar, data scientist and owner of blog https://medium.com/@alexrachnogAlexandr has written very interesting posts about time series analysis for financial data. His blog is in my personal list of best tutorial blogs. We discuss about financial time series and machine learning, what makes predicting the price of stocks a very challenging task and why machine learning might not be enough.As usual, I ask Alexandr how he sees machine learning in the next 10 years. His answer - in my opinion quite futuristic - makes perfect sense. 

You can contact Alexandr on

Enjoy the show!

Jun 04 2019

21mins

Play

Rank #11: Episode 30: Neural networks and genetic evolution: an unfeasible approach

Podcast cover
Read more

Despite what researchers claim about genetic evolution, in this episode we give a realistic view of the field.

Nov 21 2017

22mins

Play

Rank #12: Episode 54: Reproducible machine learning

Podcast cover
Read more

In this episode I speak about how important reproducible machine learning pipelines are. When you are collaborating with diverse teams, several tasks will be distributed among different individuals. Everyone will have good reasons to change parts of your pipeline, leading to confusion and definitely a number of options that soon explode. In all those cases, tracking data and code is extremely helpful to build models that are reproducible anytime, anywhere. Listen to the podcast and learn how.

Mar 09 2019

11mins

Play

Rank #13: Episode 40: Deep learning and image compression

Podcast cover
Read more

Today’s episode  will be about deep learning and compression of data, and in particular compressing images. We all know how important compressing data is, reducing the size of digital objects without affecting the quality. As a very general rule, the more one compresses an image the lower the quality, due to a number of factors like bitrate, quantization error, etcetera. I am glad to be here with Tong Chen,  researcher at the School of electronic Science and Engineering of Nanjing University, China.

Tong developed a deep learning based compression algorithm for images, that seems to improve over state of the art approaches like BPG, JPEG2000 and JPEG.

 

Reference

Deep Image Compression via End-to-End Learning - Haojie Liu, Tong Chen, Qiu Shen, Tao Yue, and Zhan Ma School of Electronic Science and Engineering, Nanjing University, Jiangsu, China

Jul 24 2018

17mins

Play

Rank #14: Episode 45: why do machine learning models fail?

Podcast cover
Read more

The success of a machine learning model depends on several factors and events. True generalization to data that the model has never seen before is more a chimera than a reality. But under specific conditions a well trained machine learning model can generalize well and perform with testing accuracy that is similar to the one performed during training.

In this episode I explain when and why machine learning models fail from training to testing datasets.

Aug 28 2018

16mins

Play

Rank #15: Episode 52: why do machine learning models fail? [RB]

Podcast cover
Read more

The success of a machine learning model depends on several factors and events. True generalization to data that the model has never seen before is more a chimera than a reality. But under specific conditions a well trained machine learning model can generalize well and perform with testing accuracy that is similar to the one performed during training.

In this episode I explain when and why machine learning models fail from training to testing datasets.

Jan 17 2019

15mins

Play

Rank #16: Episode 60: Predicting your mouse click (and a crash course in deeplearning)

Podcast cover
Read more

Deep learning is the future. Get a crash course on deep learning. Now! In this episode I speak to Oliver Zeigermann, author of Deep Learning Crash Course published by Manning Publications at https://www.manning.com/livevideo/deep-learning-crash-course

Oliver (Twitter: @DJCordhose) is a veteran of neural networks and machine learning. In addition to the course - that teaches you concepts from prototype to production - he's working on a really cool project that predicts something people do every day... clicking their mouse. 

If you use promo code poddatascienceathome19 you get a 40% discount for all products on the Manning platform

Enjoy the show!

 

References:

Deep Learning Crash Course (Manning Publications)

https://www.manning.com/livevideo/deep-learning-crash-course?a_aid=djcordhose&a_bid=e8e77cbf

Companion notebooks for the code samples of the video course "Deep Learning Crash Course"

https://github.com/DJCordhose/deep-learning-crash-course-notebooks/blob/master/README.md

Next-button-to-click predictor source code

https://github.com/DJCordhose/ux-by-tfjs

May 16 2019

39mins

Play

Rank #17: Episode 24: How to handle imbalanced datasets

Podcast cover
Read more

In machine learning and data science in general it is very common to deal at some point with imbalanced datasets and class distributions. This is the typical case where the number of observations that belong to one class is significantly lower than those belonging to the other classes.  Actually this happens all the time, in several domains, from finance, to healthcare to social media, just to name a few I have personally worked with. Think about a bank detecting fraudulent transactions among millions or billions of daily operations, or equivalently in healthcare for the identification of rare disorders. In genetics but also with clinical lab tests this is a normal scenario, in which, fortunately there are very few patients affected by a disorder and therefore very few cases wrt the large pool of healthy patients (or not affected). There is no algorithm that can take into account the class distribution or the amount of observations in each class, if it is not explicitly designed to handle such situations. In this episode I speak about some effective techniques to handle imbalanced datasets, advising the right method, or the most appropriate one to the right dataset or problem.

In this episode I explain how to deal with such common and challenging scenarios.

Oct 09 2017

21mins

Play

Rank #18: Episode 26: Deep Learning and Alzheimer

Podcast cover
Read more

In this episode I speak about Deep Learning technology applied to Alzheimer disorder prediction. I had a great chat with Saman Sarraf, machine learning engineer at Konica Minolta, former lab manager at the Rotman Research Institute at Baycrest, University of Toronto and author of DeepAD: Alzheimer′ s Disease Classification via Deep Convolutional Neural Networks using MRI and fMRI.

I hope you enjoy the show.

Oct 23 2017

54mins

Play

Rank #19: Episode 57: Neural networks with infinite layers

Podcast cover
Read more

How are differential equations related to neural networks? What are the benefits of re-thinking neural network as a differential equation engine? In this episode we explain all this and we provide some material that is worth learning. Enjoy the show!

 

Residual Block

 

References

[1] K. He, et al., “Deep Residual Learning for Image Recognition”, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770-778, 2016

[2] S. Hochreiter, et al., “Long short-term memory”, Neural Computation 9(8), pages 1735-1780, 1997.

[3] Q. Liao, et al.,”Bridging the gaps between residual learning, recurrent neural networks and visual cortex”, arXiv preprint, arXiv:1604.03640, 2016.

[4] Y. Lu, et al., “Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equation”, Proceedings of the 35th International Conference on Machine Learning (ICML), Stockholm, Sweden, 2018.

[5] T. Q. Chen, et al., ” Neural Ordinary Differential Equations”, Advances in Neural Information Processing Systems 31, pages 6571-6583}, 2018

Apr 23 2019

16mins

Play

Rank #20: Episode 43: Applied Text Analysis with Python (interview with Rebecca Bilbro)

Podcast cover
Read more

Today’s episode is about text analysis with python. Python is the de facto standard in machine learning. A large community, a generous choice in the set of libraries, at the price of less performant tasks, sometimes. But overall a decent language for typical data science tasks.

I am with Rebecca Bilbro, co-author of Applied Text Analysis with Python, with Benjamin Bengfort and Tony Ojeda.

We speak about the evolution of applied text analysis, tools and pipelines, chatbots.

Aug 14 2018

36mins

Play