Cover image of Machine Learning – Software Engineering Daily
(44)

Rank #189 in Tech News category

News
Tech News

Machine Learning – Software Engineering Daily

Updated 5 days ago

Rank #189 in Tech News category

News
Tech News
Read more

Machine learning and data science episodes of Software Engineering Daily.

Read more

Machine learning and data science episodes of Software Engineering Daily.

iTunes Ratings

44 Ratings
Average Ratings
34
5
2
2
1

iTunes Ratings

44 Ratings
Average Ratings
34
5
2
2
1
Cover image of Machine Learning – Software Engineering Daily

Machine Learning – Software Engineering Daily

Updated 5 days ago

Rank #189 in Tech News category

Read more

Machine learning and data science episodes of Software Engineering Daily.

Rank #1: TensorFlow in Practice with Rajat Monga

Podcast cover
Read more

TensorFlow is Google’s open source machine learning library. Rajat Monga is the engineering director for TensorFlow. In this episode, we cover how to use TensorFlow, including an example of how to build a machine learning model to identify whether a picture contains a cat or not.
TensorFlow was built with the mission of simplifying the process of deploying a machine learning model from research to production, so we also talk about that, as well as how TensorFlow can be used effectively in combination with Google’s open-source cluster manager, Kubernetes.

Sponsors

SnapCI is a continuous integration tool built by Thoughtworks. Go to snap.ci/softwareengineeringdaily to check it out. Alooma is your data pipeline as a service. Alooma is a fully managed tool for pulling from different data sources–MySQL, Postgres, elasticsearch, Salesforce, and many others. Go to alooma.com/sedaily for more information.

The post TensorFlow in Practice with Rajat Monga appeared first on Software Engineering Daily.

Aug 18 2016
44 mins
Play

Rank #2: TensorFlow Applications with Rajat Monga

Podcast cover
Read more

Rajat Monga is a director of engineering at Google where he works on TensorFlow. TensorFlow is a framework for numerical computation developed at Google.

The majority of TensorFlow users are building machine learning applications such as image recognition, recommendation systems, and natural language processing–but TensorFlow is actually applicable to a broader range of scientific computation than just machine learning. TensorFlow has APIs for decision trees, support vector machines, and linear algebra libraries.

The current focus of the TensorFlow team is usability. There are thousands of engineers building data intensive applications with TensorFlow, but Rajat and the rest of the TensorFlow team would like to see millions more. In today’s show, Rajat and I discussed how TensorFlow is becoming more usable, as well as some of the developments in TensorFlow around edge computing, TensorFlow Hub, and TensorFlow.js, which allows TensorFlow to run in the browser.

Transcript

Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Sponsors


Datadog was built to bring clarity to complex, dynamic applications—in the cloud, on-premises, in containers, or wherever they run. With beautiful dashboards, seamless integrations with more than 200 technologies, and distributed request tracing, Datadog provides deep, end-to-end visibility into the health and performance of modern applications. Visualize key metrics, set alerts to identify anomalies, and collaborate with your team to troubleshoot and fix issues fast. Try it yourself by starting a free, 14-day trial today. Listeners of this podcast will also receive a free Datadog T-shirt! softwareengineeringdaily.com/datadog


The octopus: a sea creature known for its intelligence and flexibility. Octopus Deploy: a friendly deployment automation tool for deploying applications like .NET apps, Java apps and more. Ask any developer and they’ll tell you it’s never fun pushing code at 5pm on a Friday then crossing your fingers hoping for the best. That’s where Octopus Deploy comes into the picture. Octopus Deploy is a friendly deployment automation tool, taking over where your build/CI server ends. Use Octopus to promote releases on-prem or to the cloud. Octopus integrates with your existing build pipeline–TFS and VSTS, Bamboo, TeamCity, and Jenkins. It integrates with AWS, Azure, and on-prem environments. Reliably and repeatedly deploy your .NET and Java apps and more. If you can package it, Octopus can deploy it! It’s quick and easy to install. Go to Octopus.com to trial Octopus free for 45 days. That’s Octopus.com


There’s no need to reinvent the wheel when it comes to making your app “realtime.” PubNub makes it simple, enabling you to build immersive and interactive experiences on the web, on mobile phones, embedded into hardware, and any other device connected to the Internet. With powerful APIs, and a robust global infrastructure, you can stream geolocation data, send chat messages, turn on your sprinklers, or rock your baby’s crib when they start crying (PubNub literally powers IoT cribs). 70 SDKs for web, mobile, IoT, and more means you can start streaming data in realtime without a ton of compatibility headaches, and no need to build your own SDKs from scratch. Go to PubNub.com/sedaily to get started. They offer a generous sandbox tier that’s free forever (until your app takes off).


GoCD is a continuous delivery tool created by ThoughtWorks. GoCD agents use Kubernetes to scale as needed. Check out gocd.org/sedaily and learn about how you can get started. GoCD was built with the learnings of the ThoughtWorks engineering team, who have talked about building the product in previous episodes of Software Engineering Daily. It’s great to see the continued progress on GoCD with the new Kubernetes integrations–and you can check it out for yourself at gocd.org/sedaily.

The post TensorFlow Applications with Rajat Monga appeared first on Software Engineering Daily.

Apr 26 2018
56 mins
Play

Rank #3: Self-Driving Engineering with George Hotz

Podcast cover
Read more

In the smartphone market there are two dominant operating systems: one closed source (iPhone) and one open source (Android). The market for self-driving cars could play out the same way, with a company like Tesla becoming the closed source iPhone of cars, and a company like Comma.ai developing the open source Android of self-driving cars.

George Hotz is the CEO of Comma.ai. Comma makes hardware devices that allow users with “normal” cars to be augmented with advanced cruise control and lane assist features. This means you can take your own car–for example, a Toyota Prius–and outfit your car to have something similar to the Tesla Autopilot. Comma’s hardware devices cost under $1000 to order online.

George joins the show to explain how the Comma hardware and software stack works in detail–from the low level interface with a car’s CAN bus to the high level machine learning infrastructure.

Users who purchase the Comma.ai hardware drive around with a camera facing the front of their windshield. This video is used to orient the state of the car in space. The video from that camera also gets saved and uploaded to Comma’s servers. Comma can use this video together with labeled events from the user’s driving experience to crowdsource their model for self-driving.

For example, if a user is driving down a long stretch of highway, and they turn on the Comma.ai driving assistance, the car will start driving itself and the video capture will begin. If the car begins to swerve into another lane, the user will take over for the car and the Comma system will disengage. This “disengagement” event gets labeled as such, and when that data makes it back to Comma’s servers, Comma can use the data to update their models.

George is very good at explaining complex engineering topics, and is also quite entertaining and open to discussing the technology as well as other competitors in the autonomous car space. I have not been able to get many other people on the show to talk about autonomous cars, so this was quite refreshing! I hope to do more in the future.

The post Self-Driving Engineering with George Hotz appeared first on Software Engineering Daily.

Aug 08 2018
1 hour 4 mins
Play

Rank #4: Convolutional Neural Networks with Matt Zeiler

Podcast cover
Read more

Convolutional neural networks are a machine learning tool that uses layers of convolution and pooling to process and classify inputs. CNNs are useful for identifying objects in images and video. In this episode, we focus on the application of convolutional neural networks to image and video recognition and classification.

Matt Zeiler is the CEO of Clarifai, an API for image and video recognition. Matt takes us through the basics of a convolutional neural network–you don’t need any background in machine learning to understand the content of the episode. He also discusses the subjective aspects of image and video recognition, and some of the tactics Clarifai has explored. This is far from a solved problem.

Matt also discusses the infrastructure of Clarifai–how they use Kubernetes, how models are deployed, and how models are updated.

Transcript

Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view the transcript for this episode.

Sponsors


Deep learning promises to dramatically improve how our world works. To make deep learning easier and faster, we need new kinds of hardware and software–which is why Intel acquired Nervana Systems, a platform for deep learning. Intel Nervana is hiring engineers to help develop a full stack for AI, from chip design to software frameworks. Go to softwareengineeringdaily.com/intel to apply for a job at Intel Nervana. If you know don’t know much about the company, check out the interviews I have conducted with engineers from the company. You can find these at softwareengineeringdaily.com/intel.


Oracle Dyn provides DNS that is as dynamic and intelligent as your applications. Dyn DNS gets your users to the right cloud service, CDN, or data center, using intelligent response to steer traffic based on business policies, as well as real-time internet conditions, like the security and performance of the network path. Get started with a free 30-day trial for your application by going to dyn.com/sedaily.  After the free trial, Dyn’s developer plans start at just $7 a month for world-class DNS. Rethink DNS. Go to dyn.com/sedaily to learn more and get your free trial of Dyn DNS.


Don’t let your database be a black box–drill down into the metrics of your database with 1-second granularity. VividCortex provides database monitoring for MySQL, Postgres, Redis, MongoDB, and Amazon Aurora. Database uptime, efficiency, and performance can all be measured using VividCortex. VividCortex uses patented algorithms to analyze and surface relevant insights, so users can be proactive, and fix performance problems before customers are impacted. If you have a database that you would like to monitor more closely, check out vividcortex.com/sedaily. Github, DigitalOcean, and Yelp all use VividCortex to understand database performance. Learn more at vividcortex.com/sedaily, and request a demo!

The post Convolutional Neural Networks with Matt Zeiler appeared first on Software Engineering Daily.

May 10 2017
54 mins
Play

Rank #5: Word2Vec with Adrian Colyer

Podcast cover
Read more

Machines understand the world through mathematical representations. In order to train a machine learning model, we need to describe everything in terms of numbers.  Images, words, and sounds are too abstract for a computer. But a series of numbers is a representation that we can all agree on, whether we are a computer or a human.

In recent shows, we have explored how to train machine learning models to understand images and video. Today, we explore words. You might be thinking–”isn’t a word easy to understand? Can’t you just take the dictionary definition?” A dictionary definition does not capture the richness of a word. Dictionaries do not give you a way to measure similarity between one word and all other words in a given language.

Word2vec is a system for defining words in terms of the words that appear close to that word. For example, the sentence “Howard is sitting in a Starbucks cafe drinking a cup of coffee” gives an obvious indication that the words “cafe,” “cup,” and “coffee” are all related. With enough sentences like that, we can start to understand the entire language.

Adrian Colyer is a venture capitalist with Accel, and blogs about technical topics such as word2vec. We talked about word2vec specifically, and the deep learning space more generally. We also explored how the rapidly improving tools around deep learning are changing the venture investment landscape.

If you like this episode, we have done many other shows about machine learning with guests like Matt Zeiler, the founder of Clarif.ai and Francois Chollet, the creator of Keras. You can check out our back catalog by downloading the Software Engineering Daily app for iOS, where you can listen to all of our old episodes, and easily discover new topics that might interest you. You can upvote the episodes you like and get recommendations based on your listening history. With 600 episodes, it is hard to find the episodes that appeal to you, and we hope the app helps with that.

Question of the Week: What is your favorite continuous delivery or continuous integration tool? Email jeff@softwareengineeringdaily.com and a winner will be chosen at random to receive a Software Engineering Daily hoodie. 

Sponsors


To build the kinds of things developers want to build today, they need better tools.  That’s why Amazon Web Services built Amazon Aurora. A relational database engine that’s compatible with MySQL and PostgreSQL, and provides up to five times the performance of standard MySQL—on the same hardware, at a tenth of the cost. Amazon Aurora from AWS can scale up to millions of transactions per minute. Automatically grow your storage up to 64 terabytes. And replicates data to three different Availability Zones. And you don’t have to manage a thing. There are no upfront charges, no commitments—you only pay for what you use. Check it out, at aurora.aws.


Toptal is the best place to find reasonably priced, extremely talented software engineers to build your projects from scratch or scale your workforce. Get a free pair of Apple Airpods when you use Toptal.com/sedaily to work with an engineer for at least 20 hours.


Cloudflare runs 10% of the Internet, providing performance and security to millions of websites. Many of you probably already use Cloudflare on your sites. We’re not talking about using Cloudflare today though, we’re here to talk about building on top of it. If you’re a developer you can build apps which can be installed by the the millions of sites which rely on Cloudflare. You can even sell your apps; they can make you money every month. Visit cloudflare.com/sedaily to watch how you can build and deploy an app in less than 3 minutes.

The post Word2Vec with Adrian Colyer appeared first on Software Engineering Daily.

Sep 13 2017
1 hour 1 min
Play

Rank #6: TensorFlow with Greg Corrado

Podcast cover
Read more

“You don’t mind if failures slow things down, but its very important that failures do not stop forward progress.”

TensorFlow is an open source machine learning library intended to bring large-scale, distributed machine learning and deep learning to everyone. Google recently released the framework to the public as a second-generation API, having learned from the successes and failures of DistBelief.

Greg Corrado is a senior research scientist and tech lead at Google, where he focuses on the research areas of machine intelligence, machine perception and natural language processing.

Questions

  • From the end-user’s point of view, how does Smart Reply work?
  • How can teams blend research and engineering to make better products?
  • How did the DistBelief project shape Tensor Flow?
  • How does Tensor Flow differ from streaming frameworks that are more generalized like Spark or Storm?
  • Why would I want to do machine learning on my phone?
  • How is Tensor Flow fault tolerant?
  • What are things the open source community should dive into in Tensor Flow, to fix and improve it?

Links

Sponsors

Hired.com is the job marketplace for software engineers. Go to hired.com/softwareengineeringdaily to get a $600 bonus upon landing a job through Hired.

Digital Ocean is the simplest cloud hosting provider. Use promo code SEDAILY for $10 in free credit.

The post TensorFlow with Greg Corrado appeared first on Software Engineering Daily.

Dec 15 2015
41 mins
Play

Rank #7: Self-Driving Deep Learning with Lex Fridman

Podcast cover
Read more

Self-driving cars are here. Fully autonomous systems like Waymo are being piloted in less complex circumstances. Human-in-the-loop systems like Tesla Autopilot navigate drivers when it is safe to do so, and lets the human take control in ambiguous circumstances.

Computers are great at memorization, but not yet great at reasoning. We cannot enumerate to a computer every single circumstance that a car might find itself in. The computer needs to perceive its surroundings, plan how to take action, execute control over the situation, and respond to changing circumstances inside and outside of the car.

Lex Fridman has worked on autonomous vehicles with companies like Google and Tesla. He recently taught a class on deep learning for semi-autonomous vehicles at MIT, which is freely available online. There was so much ground to cover in this conversation. Most of the conversation was higher level. How do you even approach the problem? What is the hardware and software architecture of a car?

I enjoyed talking to Lex, and if you want to hear more from him check out his podcast Take It Uneasy, which is about jiu jitsu, judo, wrestling, and learning.

Transcript

Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Sponsors


Have you been thinking you’d be happier at a new job? If you’re dreaming about a new job and have been waiting for the right time to make a move, go to hired.com/sedaily. Hired makes finding work enjoyable. Hired uses an algorithmic job-matching tool in combination with a talent advocate who will walk you through the process of finding a better job. Check out hired.com/sedaily to get a special offer for Software Engineering Daily listeners–a $600 signing bonus from Hired when you find that great job that gives you the respect and salary that you deserve as a talented engineer. 


Bugsnag is an automatic error-monitoring platform that helps developers understand the impact of application errors and fix the ones that matter in a time-efficient and enjoyable way. Bugsnag’s open source error reporting libraries automatically capture errors and provide in-depth diagnostic reports across all major programming languages and frameworks. Diagnostic reports consolidate the information you need to reproduce errors in one place, including the stacktrace. Errors are grouped by root cause and mapped to users so you can easily identify which errors are the most widespread and affect the greatest amount of users. Integrate with Slack and PagerDuty to get notified in real-time of new errors and spikes in your error rate. Then, integrate with your issue tracker or use Bugsnag’s workflow to help you move errors through your debugging process and get them fixed for your users.  Airbnb, Lyft, and Shopify all use Bugsnag for error-monitoring. Get up and running in three minutes. Try all features free for 14 days at bugsnag.com/sedaily.


Toptal is the best place to find reasonably priced, extremely talented software engineers to build your projects from scratch or scale your workforce. Get a free pair of Apple Airpods when you use Toptal.com/sedaily to work with an engineer for at least 20 hours.


VividCortex is the best way to improve your database performance, efficiency, and uptime. It’s a cloud-hosted monitoring platform that eliminates your most critical visibility gap, providing insights at 1-second granularity into production database workload and query performance. It measures the execution and resource consumption of every statement and transaction, so you can proactively fix future database issues before they impact customers. To learn more, visit vividcortex.com/sedaily and find out why companies like Github, DigitalOcean, and Yelp all use VividCortex to see deeper into their database performance. Learn more at vividcortex.com/sedaily, and get started today!

The post Self-Driving Deep Learning with Lex Fridman appeared first on Software Engineering Daily.

Jul 28 2017
59 mins
Play

Rank #8: Hedge Fund Artificial Intelligence with Xander Dunn

Podcast cover
Read more

A hedge fund is a collection of investors that make bets on the future. The “hedge” refers to the fact that the investors often try to diversify their strategies so that the direction of their bets are less correlated, and they can be successful in a variety of future scenarios. Engineering-focused hedge funds have used what might be called “machine learning” for a long time to predict what will happen in the future.

Numerai is a hedge fund that crowdsources its investment strategies by allowing anyone to train models against Numerai’s data. A model that succeeds in a simulated environment will be adopted by Numerai and used within its real money portfolio. The engineers who create the models are rewarded in proportion to how well the models perform.

Xander Dunn is a software engineer at Numerai and in this episode he explains what a hedge fund is, why the traditional strategies are not optimal, and how Numerai creates the right incentive structure to crowdsource market intelligence. This interview was fun and thought provoking–Numerai is one of those companies that makes me very excited about the future.

Transcript

Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view or download the transcript for this show.

Sponsors


To understand how your application is performing, you need visibility into your database. VividCortex provides database monitoring for MySQL, Postgres, Redis, MongoDB, and Amazon Aurora. Database uptime, efficiency, and performance can all be measured using VividCortex. You can learn more about how VividCortex works at vividcortex.com/sedaily.


Good customer relationships define the success of your business. Zendesk helps you build better mobile apps and retain users. With Zendesk Mobile SDKs, you can bring native, in-app support to your app quickly and easily. If a user discovers a bug in your app, that user can view help content and start a conversation with your support team without leaving your app. Keep your customers happy with Zendesk. Check out zendesk.com/sedaily to support Software Engineering Daily, and get $177 off.


Exaptive simplifies data application development for the web. Work with the tech you know. Leave the other stuff and the blue code to the platform. Go to exaptive.com/sedaily to learn more and get a free account.

The post Hedge Fund Artificial Intelligence with Xander Dunn appeared first on Software Engineering Daily.

Apr 03 2017
58 mins
Play

Rank #9: Model Training with Yufeng Guo

Podcast cover
Read more

Machine learning models can be built by plotting points in space and optimizing a function based off of those points.

For example, I can plot every person in the United States in a 3 dimensional space: age, geographic location, and yearly salary. Then I can draw a function that minimizes the distance between my function and each of those data points. Once I define that function, you can give me your age and a geographic location, and I can predict your salary.

Plotting these points in space is called embedding. By embedding a rich data set, and then experimenting with different functions, we can build a model that makes predictions based on those data sets. Yufeng Guo is a developer advocate at Google working on CloudML. In this show, we described two separate examples for preparing data, embedding the data points, and iterating on the function in order to train the model.

In a future episode, Yufeng will discuss CloudML and more advanced concepts of machine learning.

Transcript

Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Sponsors


Simplify continuous delivery with GoCD, the on-premise, open source, continuous delivery tool by ThoughtWorks. With GoCD, you can easily model complex deployment workflows using pipelines and visualize them end-to-end with the Value Stream Map. You get complete visibility into and control of your company’s deployments. At gocd.org/sedaily, find out how to bring continuous delivery to your teams. Say goodbye to deployment panic and hello to consistent, predictable deliveries. Visit gocd.org/sedaily to learn more about GoCD. Commercial support and enterprise add-ons, including disaster recovery, are available.


Digital Ocean Spaces gives you simple object storage with a beautiful user interface. You need an easy way to host objects like images and videos. Your users need to upload objects like pdfs and music files. Digital Ocean Spaces is modern object storage with a modern UI that you will love to use–it’s like the UI for Dropbox, but with the pricing of a raw object storage; I almost want to use it like a consumer product. To try Digital Ocean Spaces, go to do.co/sedaily and get 2 months of Spaces plus a $10 credit to use on any other Digital Ocean products–and you get this credit even if you have been with Digital Ocean for awhile. It’s a nice added bonus just for trying out Spaces. If you become a customer, the pricing is simple:  $5 per month price and includes 250GB of storage and 1TB of outbound bandwidth. There are no costs per request and additional storage is priced at the lowest rate available: $0.01 per GB transferred and $0.02 per GB stored. There won’t be any surprises on your bill. Digital Ocean simplifies the cloud–they look for every opportunity to remove friction from a developer’s experience. I love it, and I think you will too–check it out at do.co/sedaily.


The octopus: a sea creature known for its intelligence and flexibility. Octopus Deploy: a friendly deployment automation tool for deploying applications like .NET apps, Java apps and more. Ask any developer and they’ll tell you it’s never fun pushing code at 5pm on a Friday then crossing your fingers hoping for the best. That’s where Octopus Deploy comes into the picture. Octopus Deploy is a friendly deployment automation tool, taking over where your build/CI server ends. Use Octopus to promote releases on-prem or to the cloud. Octopus integrates with your existing build pipeline–TFS and VSTS, Bamboo, TeamCity, and Jenkins. It integrates with AWS, Azure, and on-prem environments. Reliably and repeatedly deploy your .NET and Java apps and more. If you can package it, Octopus can deploy it! It’s quick and easy to install. Go to Octopus.com to trial Octopus free for 45 days. That’s Octopus.com


Who do you use for log management? I want to tell you about Scalyr, the first purpose built log management tool on the market. Most tools on the market utilize text indexing search, which is great… for indexing a book. But if you want to search logs, at scale, fast… it breaks down. Scalyr built their own database from scratch: the system is fast. Most searches take less than 1 second. In fact, 99% of their queries execute in <1 second.  Companies like OKCupid, Giphy and CareerBuilder use Scalyr. It was built by one of the founders of Writely (aka Google Docs). Scalyr has consumer grade UI, that scales infinitely. You can monitor key metrics, trigger alerts, and integrate with PagerDuty. It’s easy to use and did we mention: lightning fast. Give it a try today. It’s free for 90 days at softwareengineeringdaily.com/scalyr.

The post Model Training with Yufeng Guo appeared first on Software Engineering Daily.

Oct 18 2017
49 mins
Play

Rank #10: Machine Learning is Hard with Zayd Enam

Podcast cover
Read more

Machine learning frameworks like Torch and TensorFlow have made the job of a machine learning engineer much easier. But machine learning is still hard. Debugging a machine learning model is a slow, messy process.

A bug in a machine learning model does not always mean a complete failure. Your model could continue to deliver usable results even in the presence of a mistaken implementation. Perhaps you made a mistake when cleaning your data, leading to an incorrectly trained model.

It is a general rule in computer science that partial failures are harder to fix than complete failures. In this episode, Zayd Enam describes the different dimensions on which a machine learning model can develop an error. Zayd is a machine learning researcher at the Stanford AI Lab, so I also asked him about AI risk, job displacement, and academia versus industry.

Show Notes

Why ML is hard

Transcript

Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view or download the transcript for this show.

Sponsors


GoCD is an on-premise, open source, continuous delivery tool. Get better visibility into and control of your teams’ deployments with GoCD. Say goodbye to deployment panic and hello to consistent, predictable deliveries. Visit gocd.io for a free download. 


Apica System helps companies with their end-user experience, focusing on availability and performance. Test, monitor, and optimize your applications with Apica System. Apica is hosting an upcoming webinar about API basics for big data analytics. You can also find past webinars, such as how to optimize websites for fast load time.


Couchbase is a document database with the flexibility of NoSQL and the power of SQL. With Couchbase Server, you can build a fast, powerful NoSQL database that scales. Running Couchbase in containers on Kubernetes, Mesos, or OpenShift is easy, and at developer.couchbase.com you can find tutorials on how to build out your Couchbase deployment.

The post Machine Learning is Hard with Zayd Enam appeared first on Software Engineering Daily.

Feb 16 2017
54 mins
Play

Rank #11: Deep Learning Topologies with Yinyin Liu

Podcast cover
Read more

Algorithms for building neural networks have existed for decades. For a long time, neural networks were not widely used. Recent changes to the cost of compute and the size of our data have made neural networks extremely useful. Our smart phones generate terabytes of useful data. Lower storage costs make it economical to keep that data. Cloud computing democratized the ability to do large scale machine learning across deep learning hardware.

Over the last few years, these trends have been driving widespread use of deep learning, in which neural nets with a large series of layers are used to create powerful results in various fields of classification and prediction. Neural networks are a tool for making sense of unstructured data–text, images, sound waves, and videos.

“Unstructured” data is data with high volume or high dimensionality. For example, an image has a huge collection of pixels, and each pixel has a color value. One way to think about image classification is that you are finding correlations between those pixels. A certain cluster of pixels might represent an edge. After doing edge detection on pixels, you have a collection of edges. Then you can find correlations between those edges, and build up higher levels of abstraction.

Yinyin Liu is a principal engineer and head of data science at the Intel AI products group. She studies techniques for building neural networks. Each different configuration of a neural network for a given problem is called a “topology.” Engineers are always looking at new topologies for solving a deep learning application–such as natural language processing.

In this episode, Yinyin describes what a deep learning topology is and describes topologies for natural language processing. We also talk about the opportunities and the bottlenecks in deep learning–including why the tools are so immature, and what it will take to make the tooling better.

Transcript

Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Sponsors


Segment allows us to gather customer data from anywhere and send that data to any analytics tool. Segment is the customer data infrastructure that has saved us from writing duplicate code across all of the different platforms that we want to analyze. And if you’re using cloud apps such as – Mailchimp, Marketo, Intercom, AppNexus, Zendesk–you can integrate with all of these different tools and centralize your customer data in one place–with Segment. To get a free 90-day trial, signup for Segment at segment.com and enter SEDaily in the “How did you hear about us box?” during signup.


Azure Container Service simplifies the deployment, management and operations of Kubernetes. You can continue to work with the tools you already know, such as Helm, and move applications to any Kubernetes deployment. Integrate with your choice of container registry, including Azure Container Registry. Also, quickly and efficiently scale to maximize your resource utilization without having to take your applications offline. Isolate your application from infrastructure failures and transparently scale the underlying infrastructure to meet growing demands—all while increasing the security, reliability, and availability of critical business workloads with Azure. Check out the Azure Container Service at aka.ms/sedaily.


LiveRamp is one of the fastest growing companies in data connectivity in the Bay Area, and they are looking for senior level talent to join their team. LiveRamp helps the world’s largest brands activate their data to improve customer interactions on any channel or device. The infrastructure is at a tremendous scale: a 500-billion node identity graph generated from over a thousand data sources, running an 85PB hadoop cluster; and application servers that process over 20 billion HTTP requests per day. The LiveRamp team thrives on mind-bending technical challenges. LiveRamp members value entrepreneurship, humility, and constant personal growth. If this sounds like a fit for you, check out softwareengineeringdaily.com/liveramp.


GoCD is a continuous delivery tool created by ThoughtWorks. GoCD agents use Kubernetes to scale as needed. Check out gocd.org/sedaily and learn about how you can get started. GoCD was built with the learnings of the ThoughtWorks engineering team, who have talked about building the product in previous episodes of Software Engineering Daily. It’s great to see the continued progress on GoCD with the new Kubernetes integrations–and you can check it out for yourself at gocd.org/sedaily.

The post Deep Learning Topologies with Yinyin Liu appeared first on Software Engineering Daily.

May 10 2018
1 hour
Play

Rank #12: Machine Learning Deployments with Kinnary Jangla

Podcast cover
Read more

Pinterest is a visual feed of ideas, products, clothing, and recipes. Millions of users browse Pinterest to find images and text that are tailored to their interests.

Like most companies, Pinterest started with a large monolithic application that served all requests. As Pinterest’s engineering resources expanded, some of the architecture was broken up into microservices and Dockerized, which make the system easier to reason about.

To serve users with better feeds, Pinterest built a machine learning pipeline using Kafka, Spark, and Presto. User events are generated from the frontend, logged onto Kafka, and aggregated to build machine learning models. These models are deployed into Docker containers much like the production microservices.

Kinnary Jangla is a senior software engineer at Pinterest, and she joins the show to talk about her experiences at the company–breaking up the monolith, architecting a machine learning pipeline, and deploying those models into production.

Transcript

Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Sponsors


Sumo Logic is a cloud-native, machine data analytics service that helps you Run and Secure your Modern Application. If you are feeling the pain of managing your own log, event, and performance metrics data, check out sumologic.com/sedaily. Even if you have tools already, it’s worth checking out Sumo Logic and seeing if you can leverage your data even more effectively, with real-time dashboards and monitoring, and improved observability – to improve the uptime of your application and keep your day-to-day runtime more secure. Check out sumologic.com/sedaily for a free 30-day Trial of Sumo Logic, to find out how Sumo Logic can improve your productivity and your application observability–wherever you run your applications. That’s sumologic.com/sedaily.


There’s a new open source project called Dremio that is designed to simplify analytics. It’s also designed to handle some of the hard work, like scaling performance of analytical jobs. Dremio is the team behind Apache Arrow, a new standard for in-memory columnar data analytics. Arrow has been adopted across dozens of projects – like Pandas – to improve the performance of analytical workloads on CPUs and GPUs. It’s free and open source, designed for everyone, from your laptop, to clusters of over 1,000 nodes. At dremio.com/sedaily you can find all the necessary resources to get started with Dremio for free. If you like it, be sure to tweet @dremiohq and let them know you heard about it from Software Engineering Daily. Thanks again to Dremio, and check out dremio.com/sedaily to learn more.


Amazon Redshift powers the analytics of your business–and Intermix.io powers the analytics of your Redshift. Intermix.io gives you the tools you need to analyze your Amazon Redshift performance and improve the toolchain of everyone downstream from your data warehouse. The team at Intermix has seen so many Redshift clusters, they are confident they can solve whatever performance issues you are having. Go to intermix.io/sedaily to get a free 30-day trial. Intermix collects all your Redshift logs and makes it easy to figure out what’s wrong so you can take action. All in a nice, intuitive dashboard. Go to intermix.io/sedaily to start your free 30-day trial.

The post Machine Learning Deployments with Kinnary Jangla appeared first on Software Engineering Daily.

Feb 14 2018
47 mins
Play

Rank #13: Real Estate Machine Learning with Or Hiltch

Podcast cover
Read more

Stock traders have access to high volumes of information to help them make decisions on whether to buy an asset. A trader who is considering buying a share of Google stock can find charts, reports, and statistical tools to help with their decision. There are a variety of machine learning products to help a technical investor create models of how a stock price might change in the future.

Real estate investors do not have access to the same data and tooling. Most people who invest in apartment buildings are using a combination of experience, news, and basic reports.

Real estate data is very different from stock data. Real estate assets are not fungible–each one is arguably unique from all others, whereas one share of Google stock is the same as another share. But there are commonalities between real estate assets.

Just like collaborative filtering can be applied to find a new movie that is similar to the ones you have watched on Netflix, comparable analysis can be used to find an apartment building that is very similar to another apartment building which recently appreciated in asset value.

Skyline.ai is a company that is building tools and machine learning models for real estate investors. Or Hiltch is the CTO at Skyline.ai and he joins the show to explain how to apply machine learning to real estate investing. He also describes the mostly serverless architecture of the company. This is one of the first companies we have talked to that is so heavily on managed services and functions-as-a-service.

Show Notes

The post Real Estate Machine Learning with Or Hiltch appeared first on Software Engineering Daily.

Sep 11 2018
58 mins
Play

Rank #14: Python Data Visualization with Jake VanderPlas

Podcast cover
Read more

Data visualization tools are required to translate the findings of data scientists into charts, graphs, and pictures. Understanding how to utilize these tools and display data is necessary for a data scientist to communicate with people in other domains. In this episode, Srini Kadamati hosts a discussion with Jake VanderPlas about the Python ecosystem for data science and the different attempts at creating a data visualization library.

Jake VanderPlas is the Director of Research for Physical Sciences at the University of Washington’s eScience institute, where he also received his PhD in Astronomy. In addition to contributing to many Python data science libraries like scikit-learn, scipy, numpy, and matplotlib, he’s written multiple books that have been published by O’Reilly and has given many talks on data science tools and techniques. He’s also the co-creator of the Altair project, which is a declarative data visualization library for Python built on the Vega-Lite visualization grammar.

Sponsors


Dice.com helps you manage your career in tech.  Dice.com has a huge index of tech job opportunities that it has developed from 20 years in the business of connecting tech professionals with job opportunities. To check out Dice and support Software Engineering Daily, go to dice.com/sedaily.


Saagie is an end-to-end data platform that lets you focus on deriving business value from data. Saagie helps you take control of your wide variety of data sources, and gets them in one place. Check it out at Saagie.com


SnapCI is a continuous integration tool built by Thoughtworks. Go to snap.ci/softwareengineeringdaily to check it out.

The post Python Data Visualization with Jake VanderPlas appeared first on Software Engineering Daily.

Jan 16 2017
48 mins
Play

Rank #15: Drishti: Deep Learning for Manufacturing with Krish Chaudhury

Podcast cover
Read more

RECENT UPDATES:

Podsheets is our open source set of tools for managing podcasts and podcast businesses

New version of Software Daily, our app and ad-free subscription service

Software Daily is looking for help with Android engineering, QA, machine learning, and more

FindCollabs Hackathon has ended–winners will probably be announced by the time this episode airs; we will be announcing our next hackathon in a few weeks, so stay tuned

Drishti is a company focused on improving manufacturing workflows using computer vision.

A manufacturing environment consists of assembly lines. A line is composed of sequential stations along that manufacturing line. At each station on the assembly line, a worker performs an operation on the item that is being manufactured. This type of workflow is used for the manufacturing of cars, laptops, stereo equipment, and many other technology products.

With Drishti, the manufacturing process is augmented by adding a camera at each station. Camera footage is used to train a machine learning model for each station on the assembly line. That machine learning model is used to ensure the accuracy and performance of each task that is being conducted on the assembly line.

Krish Chaudhury is the CTO at Drishti. From 2005 to 2015 he led image processing and computer vision projects at Google before joining Flipkart, where he worked on image science and deep learning for another four years. Krish had spent more than twenty years working on image and vision related problems when he co-founded Drishti.

In today’s episode, we discuss the science and application of computer vision, as well as the future of manufacturing technology and the business strategy of Drishti.

The post Drishti: Deep Learning for Manufacturing with Krish Chaudhury appeared first on Software Engineering Daily.

Apr 17 2019
59 mins
Play

Rank #16: Deep Learning Hardware with Xin Wang

Podcast cover
Read more

Training a deep learning model involves operations over tensors. A tensor is a multi-dimensional array of numbers. For several years, GPUs were used for these linear algebra calculations. That’s because graphics chips are built to efficiently process matrix operations.

Tensor processing consists of linear algebra operations that are similar in some ways to graphics processing–but not identical. Deep learning workloads do not run as efficiently on these conventional GPUs as they would on specialized chips, built specifically for deep learning.

In order to train deep learning models faster, new hardware needs to be designed with tensor processing in mind.

Xin Wang is a data scientist with the artificial intelligence products group at Intel. He joins today’s show to discuss deep learning hardware and Flexpoint, a way to improve the efficiency of space that tensors take up on a chip. Xin presented his work at NIPS, the Neural Information Processing Systems conference, and we talked about what he saw at NIPs that excited him. Full disclosure: Intel, where Xin works, is a sponsor of Software Engineering Daily.

Transcript

Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Sponsors


Azure Container Service simplifies the deployment, management and operations of Kubernetes. Eliminate the complicated planning and deployment of fully orchestrated containerized applications with Kubernetes. You can quickly provision clusters to be up and running in no time, while simplifying your monitoring and cluster management through auto upgrades and a built-in operations console. Avoid being locked into any one vendor or resource. You can continue to work with the tools you already know, such as Helm, and move applications to any Kubernetes deployment. Integrate with your choice of container registry, including Azure Container Registry. Also, quickly and efficiently scale to maximize your resource utilization without having to take your applications offline. Isolate your application from infrastructure failures and transparently scale the underlying infrastructure to meet growing demands—all while increasing the security, reliability, and availability of critical business workloads with Azure. Check out the Azure Container Service at aka.ms/sedaily.


Your company needs to build a new app, but you don’t have the spare engineering resources. There are some technical people in your company who have time to build apps–but they are not engineers. OutSystems is a platform for building low-code apps. As an enterprise grows, it needs more and more apps to support different types of customers and internal employee use cases. OutSystems has everything that you need to build, release, and update your apps without needing an expert engineer. And if you are an engineer, you will be massively productive with OutSystems. Find out how to get started with low-code apps today–at OutSystems.com/sedaily. There are videos showing how to use the OutSystems development platform, and testimonials from enterprises like FICO, Mercedes Benz, and SafeWay. OutSystems enables you to quickly build web and mobile applications–whether you are an engineer or not. Check out how to build low-code apps by going to OutSystems.com/sedaily.


Simplify continuous delivery with GoCD, the on-premise, open source, continuous delivery tool by ThoughtWorks. With GoCD, you can easily model complex deployment workflows using pipelines and visualize them end-to-end with the Value Stream Map. You get complete visibility into and control of your company’s deployments. At gocd.org/sedaily, find out how to bring continuous delivery to your teams. Say goodbye to deployment panic and hello to consistent, predictable deliveries. Visit gocd.org/sedaily to learn more about GoCD. Commercial support and enterprise add-ons, including disaster recovery, are available.

The post Deep Learning Hardware with Xin Wang appeared first on Software Engineering Daily.

Jan 29 2018
57 mins
Play

Rank #17: Deep Learning with Adam Gibson

Podcast cover
Read more

Deep learning uses neural networks to identify patterns. Neural networks allow us to sequence “layers” of computing, with each layer using learning algorithms such as unsupervised learning, supervised learning, and reinforcement learning. Deep learning has taken off in the last few years, but it has been around for much longer.

Adam Gibson founded Skymind, the company behind Deeplearning4j. Deeplearning4j is a distributed deep learning library for Scala and Java. It integrates with Hadoop and Spark, and is specifically designed to run in business environments on distributed GPUs and CPUs. Adam joins the show today to discuss the history and future of deep learning.

Transcript

Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view or download the transcript for this show.

Sponsors


Exaptive simplifies your data application development. Exaptive is a data application studio that is optimized for rapid development of rich applications. Go to exaptive.com/sedaily to get a free trial and start building applications today.


Couchbase is a document database with the flexibility of NoSQL and the power of SQL. With Couchbase Server, you can build a fast, powerful NoSQL database that scales. Running Couchbase in containers on Kubernetes, Mesos, or OpenShift is easy, and at developer.couchbase.com you can find tutorials on how to build out your Couchbase deployment.


GoCD is an on-premise, open source, continuous delivery tool. Get better visibility into and control of your teams’ deployments with GoCD. Say goodbye to deployment panic and hello to consistent, predictable deliveries. Visit gocd.io for a free download. 

The post Deep Learning with Adam Gibson appeared first on Software Engineering Daily.

Feb 10 2017
50 mins
Play

Rank #18: Training the Machines with Russell Smith

Podcast cover
Read more

Automation is changing the labor market.

To automate a task, someone needs to put in the work to describe the task correctly to a computer. For some tasks, the reward for automating a task is tremendous–for example, putting together mobile phones. In China, companies like FOXCONN are investing time and money into programming the instructions for how to assemble your phone. Robots execute those instructions.

FOXCONN spends millions of dollars deploying these robots, but it is a worthwhile expense. Once FOXCONN pays off the capital investment in those robots, they have a tireless workforce that can build phones all day long. Humans require training, rest, and psychological considerations. And with robots, the error rate is lower. Your smart phone runs your life, and you do not want the liability of human imperfection involved in constructing that phone.

As we race towards an automated future, the manual tasks that get automated first depend on their economic value. The manual labor costs of smartphone construction is a massive expense for corporations. This is also true for truck driving, food service, and package delivery. The savings that will be reaped from automating these tasks are tremendous–regardless of how we automate them.

There two ways of building automated systems: rule-based systems and machine learning.

With rule-based systems, we can describe to the computer exactly what we want it to do–like following a recipe. With machine learning, we can train the computer by giving it examples and let the computer derive its own understanding of how to automate a task.

Both approaches to automation have difficulties. A rule-based approach requires us to enumerate every single detail to the machine. This might work well in a highly controlled environment like a manufacturing facility. But rule-based systems don’t work well in the real world, where there are so many unexpected events, like snowstorms.

As we reported in a previous episode about how to build self-driving cars, engineers still don’t quite know what the right mix of rule-based systems and machine learning techniques are for autonomous vehicles. But we will continue to pour money into solving this problem, because the investment is worth figuring out how to train the machine.

The routine tasks in our world will be automated given enough time. How soon something will be automated depends on how expensive that task is when it is performed by a human, and how hard it is to design an artificial narrow intelligence to perform the task instead of a human.

Manual software testing is another type of work that is being automated today.

If I am building a mobile app to play podcast episodes, and I make a change to the user interface, I want to have manual quality assurance (QA) testers run through tests that I describe to them, to make sure my change did not break anything. QA tests describe high level application functionality. Can the user register and log in? Can the user press the play button and listen to a podcast episode on my app?

Unit tests are not good enough, because unit tests only verify the logic and the application state from the point of view of the computer itself. Manual QA tests ensure that the quality of the user experience was not impacted.

With so many different device types, operating systems, and browsers, I need my QA test to be executed in all of the different target QA environments. This requires lots of manual testers. If I want manual testing for every deployment I push, that manual testing can get expensive.

RainforestQA is a platform for QA testing that turns manual testing into automated testing. The manual test procedures are recorded, processed by computer vision, and turned into automated tests. RainforestQA hires human workers from Amazon Mechanical Turk to execute the well-defined manual tests, and the recorded manual procedure is used to train the machines that can execute the same task in the future.

Russell Smith is the CTO and co-founder of RainforestQA, and he joins the show to explain how RainforestQA works: the engineering infrastructure, the process of recruiting workers from mechanical turk, and the machine learning system for taking manual tasks and automating them.

Show Notes: Andrej Karpathy Turk Story

Transcript

Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.

Sponsors


Dice helps you accelerate your tech career. Whether you’re actively looking for a job or need insights to grow in your role, Dice has the resources you need. Dice’s mobile app is the fastest and easiest way to get ahead. Search thousands of tech jobs – from software engineering to UI/UX to product management. Discover your worth with Dice’s Salary Predictor based on your unique skill set. Uncover new opportunities with Dice’s new career pathing tool which can give you insights about the best types of roles to transition to – and the skills you’ll need to get there. Manage your tech career and download the Dice Careers app on Android or iOS today. So check out Dice and support Software Engineering Daily, go to Dice.com/sedaily. Thanks to Dice for being a sponsor of Software Engineering Daily.


Incapsula can protect your API servers and microservices from responding to unwanted requests. To try Incapsula for yourself, go to incapsula.com/2017podcasts and get a free enterprise trial of Incapsula. Incapsula’s API gives you control over the security and performance of your application–whether you have a complex microservices architecture or a WordPress site, like Software Engineering Daily. Incapsula has a global network of over 30 data centers that optimize routing and cache your content. The same network of data centers that are filtering your content for attackers are operating as a CDN, and speeding up your application. To try Incapsula today, go to incapsula.com/2017podcasts and check it out. Thanks again, Incapsula.


Simplify continuous delivery with GoCD, the on-premise, open source, continuous delivery tool by ThoughtWorks. With GoCD, you can easily model complex deployment workflows using pipelines and visualize them end-to-end with the Value Stream Map. You get complete visibility into and control of your company’s deployments. At gocd.org/sedaily, find out how to bring continuous delivery to your teams. Say goodbye to deployment panic and hello to consistent, predictable deliveries. Visit gocd.org/sedaily to learn more about GoCD. Commercial support and enterprise add-ons, including disaster recovery, are available.

The post Training the Machines with Russell Smith appeared first on Software Engineering Daily.

Nov 17 2017
1 hour
Play

Rank #19: Bridging Data Science and Engineering with Greg Lamp

Podcast cover
Read more

Current infrastructure makes it difficult for data scientists to share analytical models with the software engineers who need to integrate them.

Yhat is an enterprise software company tackling the challenge of how data science gets done. Their products enable companies and users to easily deploy data science environments and translate analytical models into production code.

Greg Lamp is the Co-founder and CTO of Yhat and previously worked as a product manager in financial services. Yhat was part of the Y Combinator winter 2015 class.

Questions

  • At a software company, what is the typical relationship between data scientists and software engineers?
  • Does Yhat turn data scientists into HTTP endpoints?
  • What was the most counterintuitive advice you received at Y-Combinator?
  • What is the moonshot goal for Yhat?
  • Is it easier to teach data science to an engineer or engineering to a data scientist?

Links

The post Bridging Data Science and Engineering with Greg Lamp appeared first on Software Engineering Daily.

Oct 05 2015
47 mins
Play

Rank #20: Machine Learning Deployments with Diego Oppenheimer

Podcast cover
Read more

Machine learning models allow our applications to perform highly accurate inferences. A model can be used to classify a picture as a cat, or to predict what movie I might want to watch. But before a machine learning model can be used to make these inferences, the model must be trained and deployed.

In the training process, a machine learning model consumes a data set and learns from it. The training process can consume significant resources. After the training process is over, you have a trained model that you need to get into production. This is known as the “deployment” step.

Deployment can be a hard problem. You are taking a program from a training environment to a production environment. A lot can change between these two environments. In production, your model is running on a different machine–which can lead to compatibility issues. If your model serves a high volume of requests, it might need to scale up. In production, you also need caching, and monitoring, and logging.

Large companies like Netflix, Uber, and Facebook have built their own internal systems to control the pipeline of getting a model from training into production. Companies who are newer to machine learning can struggle with this deployment process, and these companies usually don’t have the resources to build their own machine learning platform like Netflix.

Diego Oppenheiner is the CEO of Algorithmia, a company that has built a system for automating machine learning deployments. This is the second cool product that Algorithmia has built, the first being the algorithm marketplace that we covered in an episode a few years ago.

In today’s show, Diego describes the challenges of deploying a machine learning model into production, and how that product was a natural complement to the algorithms marketplace. Full disclosure: Algorithmia is a sponsor of Software Engineering Daily.

The post Machine Learning Deployments with Diego Oppenheimer appeared first on Software Engineering Daily.

Jul 13 2018
1 hour
Play

Similar Podcasts