Computational Biologist and Founder of Protocols.io, Lenny Teytelman (Part two)
Security Tools Podcast
Reminder: it's not "your data".It's the patients' dataIt's the taxpayers' dataIt's the funder's data-----------------If you're in industry or self-fund the research & don't publish, then you have the right not to share your data. Otherwise, it's not your data.— Lenny Teytelman (@lteytelman) July 16, 2018 We continue our conversation with Protocols.io founder Lenny Teytelman.In part two of our conversation, we learn more about his company and the use cases that made his company possible. We also learn about the pros and cons of mindless data collection, when data isn’t leading you in the right direction and his experience as a scientist amassing enormous amount of data. Transcript Lenny Teytelman: I am Lenny Teytelman, and I am a geneticist and Computational Biologist by training. I did graduate school in Berkeley and then postdoctoral research out at MIT. And since 2012, I have been the co-founder and CEO of Protocols.io, which is a GitHub Wikipedia-like central repository of research recipes. So for science methods detailing what exactly scientists have done. Cindy Ng: Welcome Lenny. Why don't you tell us a little bit more about what you do at Protocols and some of the goals and use cases? Lenny Teytelman: So I had no entrepreneurial ambitions whatsoever. Actually, I was in a straight academic path as a yeast geneticist driven just by curiosity in the projects that I was participating in. And my experience out at MIT as a postdoc was that literally, the first year and a half of my project went into fixing just one step of the research recipe of the protocol that I was using. Instead of a microliter of a chemical, it needed five. Instead of an incubation for 15 minutes, it needed an hour and the insane part is that at the end of the day, that's not a new technique. I can't publish an article on it because it's just a correction of something that's previously published and there is no good infrastructure. There's no GitHub of science methods. There's no good infrastructure for updating and sharing such corrections and optimizations. So the end result of that year and a half was that I get no credit for this because I can't publish it and everybody else was using the same recipe is either getting completely misleading results or has to spend a year or two rediscovering what I know, what I would love to share, but can't. It led to this obsession with creating a central open access place that makes it easy for the scientist to detail precisely what the research steps were, what are the recipes, and then after they've published, giving them the space to keep this current by sharing the corrections and optimizations and making that knowledge discoverable. Cindy Ng: There's a hole in the process and you're connecting what you can potentially do now with what you did previously and not lose all the work. That's brilliant. Lenny Teytelman: I shouldn't take too much credit for it because a lot of people have had this same idea over the last 20 years and there have been several attempts to create a central place. One of the hard things is that this isn't just about technology and building a website and creating a good UI, UX for people to share. One of the hard things is that it's a culture change, right? So if we are used to publishing a scientist's made brief methods that have things like context author for details, or we roughly follow the same procedure as reported in another paper and then good luck figuring out what that roughly means, what are the slight modifications, but then one of the hard things as the culture change and getting scientists to adopt platforms like this. Cindy Ng: So it sounds like the scientists prior who wanted to create something like Protocols, they were ahead of their time. Lenny Teytelman: I think yes. I know of a number of efforts to create exactly what we've done. Some of the people from those have actually been huge supporters and advisors, partners helping us avoid the mistakes and helping us succeed. So, it's a long quest, a long journey towards this, but a lot of them I give them credit for the same idea and it's exactly what you said, being ahead of your time. Cindy Ng: Because you're a scientist and have a lot of expertise collecting enormous amount of data, a lot of companies nowadays because data's the new oil, they think that, "Oh, we should just collect everything. Well, we might be able to solve a new business problem or we might be able to use it much later on." Then actually research has been done about that, that that's not a good idea because then you end up solving really silly problems. What is your approach? Lenny Teytelman: There are sort of two different camps. One argues that you should be very targeted with the data that you collect. You should have a hypothesis, you should have a research question that's guiding you towards an experiment and towards the data that you're collecting. And another one is, let's be more descriptive. Let's just get data and then look inside and see what pops out. See what is surprising. There are two camps and I know both types of scientists. I was more in one camp than another, but there is value to both. The tricky part in science is that you are not aware of the statistics and e-hacking and just what it means to go fishing in large datasets, particularly in genomics, particularly now with a lot of the new technology that we have for generating massive datasets across different conditions, across different organisms, right? And you can sort of drown in data and then if you're not careful, you start looking for signal. If you're not thinking of the statistics, if you're not thinking almost of multiple testing, correction, you can get these false positives in science where something looks their usual, but it really is just by chance, it's because you're running a lot of tests and slicing data in 100 different ways and one out of 100 times just by chance, you're getting something that looks like an outlier, that looks very puzzling or interesting, but it's actually chance. So, I don't know about in industry particularly, it seems to me if you're a business and you are just trying to grab everything and feeling that something useful will come out of it. If you're not in the business of doing science, but you're in the business of actual business, it seems to me, intuitively, that you will become very distracted and probably is not the best use of your time or resources. But in science, both approaches are valuable. You just have to be really careful if you are analyzing data without a particular question and you're trying to see what is there that's interesting. Cindy Ng: If you're collecting everything, do you have a team or a group of people that you're working with to suss out the wrong ideas? Lenny Teytelman: I see more and more journals, I see more and more academics becoming aware that, "Oh, I need to learn something about statistics, or I need to collaborate with biostatisticians who can help me to be careful about this." There are journals that have started statistics reviews. So it might be a biology paper, but depending on the data and the statistics that are in it, it might need to go to an expert statistician to review to make sure that you've used the appropriate methods and you've thought through the pitfalls that I'm discussing, but there's a lot more to do on this side. And again, there is the spread…there are teams that are collaborating. And you know they have data scientists or computational biologists and statisticians who are more used to thinking about data. Then you also have people like me who used to do both. And I wasn't a great computational biologist and I wasn't a great geneticist, but my strength was the ability to do both. So, again, it's all over the map and there's a lot of training, a lot of education that still needs to happen to improve how we handle the large data sets. Cindy Ng: Do you think that data, it's about getting the numbers right, working with statisticians, or the more qualitative side of things where even if the data showing one thing, your, let's say, experience says otherwise? Lenny Teytelman: Oh, I've been misled by data that I've generated or had access to nonstop. As a scientist, I've given talks on things that I thought were exciting and turned out to be an artifact of how I was doing the analysis and I've experienced that many times. Think at the end of the day, whether you try to be careful or not, we always have a scientist and we always will make mistakes. And that's why I particularly feel that it's so essential for us to share the data because we think we're doing things correctly, but reviewers and other scientists who are reading your papers really can't tell unless they have access to the data that you've used and can run the analysis themselves or use different tools to analyze, and that's where problems come up, that's where mistakes are identified. So I think science can really improve more through the sharing and less through trying to be perfectionist on the people who are generating the data and publishing the stories. I think both are important, but I think there's more opportunity for ensuring reproducibility and that mistakes get fixed by sharing the data. Cindy Ng: Yeah. And when you're solving really complicated and hard problems, it helps to have many people work on it too, even though it might seem like they're too many chefs in the kitchen, but that it can only help, I imagine. Lenny Teytelman: Absolutely. That's what peer review is for. It's getting eyeballs with people who have not been listening to you give this presentation evolving over time for the last five years. It's people who don't necessarily trust you the same way or have different strengths. So it does help to have people from the outside take a look. But even reviewers, they are not going to be rerunning all of your analyses. They're not going to be spending years digging into your data. They're going to read the paper and kind of mostly trying to tell is it's clear? Do I trust what they're saying? Have they done the controls? At the end of the day, figuring out which papers are correct and which hypotheses and conclusions stand the test of time, it really does require time. And that's where sharing the data shortens the time to see what is and isn't true.
Geneticist and Founder of Protocols.io, Lenny Teytelman (Part one)
Security Tools Podcast
Reminder: it's not "your data".It's the patients' dataIt's the taxpayers' dataIt's the funder's data-----------------If you're in industry or self-fund the research & don't publish, then you have the right not to share your data. Otherwise, it's not your data.— Lenny Teytelman (@lteytelman) July 16, 2018 A few months ago, I came across Protocols.io founder Lenny Teytelman’s tweet on data ownership. Since we’re in the business of protecting data, I was curious what inspired Lenny to tweet out his value statement and to also learn how academics and science-based businesses approach data analysis and data ownership. We’re in for a real treat because it’s rare that we get to hear what scientists think about data when in search for discoveries and innovations. Transcript Lenny Teytelman: I am Lenny Teytelman and I'm a geneticist and computational biologist by training. I did graduate school in Berkeley and then post-doctoral research out at MIT. And since 2012, I have been the Co-founder and CEO of Protocols.io, which is a GitHub Wikipedia-like central repository of research recipes, so for science methods detailing what exactly scientists have of found. Cindy Ng: Welcome, Lenny. We first connected on Twitter through a tweet of yours, and I'm going to read it, it says, "Reminder: it's not 'your data.' It's the patient's data, it's the taxpayers' data. It's the funders' data. And if you're in an industry or self-funded the research and don't publish, then you have the right not to share your data. Otherwise, it's not your data." So can you tell us a little bit more about your point of view, your ideas about data ownership, and what inspired you to tweet out your value statement? Lenny Teytelman: Thank you, Cindy. So this is something that comes up periodically, more so particularly, in the past 5, 10 years in the research community as different funders and publishers starting more and more intentions of reproducability challenges and published research, and including guidelines and policies that encourage or require the sharing of data as a prerequisite for publication or as a condition of getting funding. So we're seeing more and more of that, and I think the vast majority of the research community, of the scientists, are in favor of those then this time that it's important, then this time that it's one of the pillars of science to be able to reproduce and verify and validate out the people's results and not just to take them at their word. We all make mistakes, right? But there is a minority that is upset about these kinds of requirements and I, periodically, either in person or someone on Twitter will say, "Hey, I've spent so long sailing the oceans and collecting the data. I don't want to just give it away. I want to spend the next 5, 10 years publishing and then it's my data." And so that's the part that I'm reacting to it. There are some scientists that forget who's funding them and who actually has the rights to the data. Cindy Ng: Why do they feel like it's their data rather than the patients' data or the taxpayers' data or the funder's data? Lenny Teytelman: So it's understandable because, particularly when the data generation takes a long time, so imagine you go on an own expeditions two, three months away from family, sampling bacteria in oceans or digging in the desert, and it can take a really long time to get the samples, to get the data, and you start to feel ownership, and it's also the career, your career, the more publications you get on a given dataset, the stronger your resume, the higher the chances of getting fellowships, faculty positions, and so on. People become a little bit possessive and take ownership of the data, if you like, put so much into it, "It's mine." Cindy Ng: Prior to digitalizing our data, who owned the data? Lenny Teytelman: Well, I guess, universities can also lay some claim to the intellectual property rights. I'm not an attorney so it's tricky. But I think there was always the understanding in the science world that you should be able to provide the tables, the datasets that you're publishing on request. But then we got paper journals, there really just wasn't space to make all of that available. And we're now in a different environment where we have repositories, there's GitHub focal, there are many repositories for the data to be shared. And so, with the web, we're no longer in that contact author for details and we're now in a place where journals can say, "If you want to publish in our journal, you have to make the data available." And there are some that have put in very stringent data requirement policies. Cindy Ng: Who sets those parameters in terms of the kind of data you publish and the stringency behind it? Do a bunch of academics come together, chairman, scientists decide best practices, or they vary from publication to publication? Lenny Teytelman: Both. So it depends on the community. There are some communities, for example, the genomics community, back when the human genome was being sequenced, there were a lot of...and I mean before that, there were a lot of meetings of the leaders in the field sort of agreeing on what are the best practices, and depositing the DNA sequences in the central repository GenBank run by the U.S. government became sort of expected in the community and from the journals. And so, that really was community-led best practices, but more recently, I also see just funders putting out mandates, and when you agree to getting funding, you agree to the data-sharing policies of the foundation. And same thing for journals. Now, journals, more and more of them are putting in statements requiring data, but it doesn't mean that they're necessarily enforcing it, so requirements are one thing, enforcement is another. Cindy Ng: What is the difference between scientific academic research versus the science-based companies? Because a lot of, for instance, pharmaceuticals hire a lot of PhDs and they must have a close connection between one another. Lenny Teytelman: So there is certainly overlap. You're right that, I think, in biomedicine particularly, most of the people who get PhDs actually don't stay in academia and then outside of it. Not all of it is in industry. They go through a broad spectrum, all for different careers, but a lot do end up in industry. There is some overlap where you will have industry funding some of the research. So, Novartis could give a grant to UC Berkeley, or British Petroleum could be doing ecological research, and those tend to be very interesting because there may be a push from the industry side to keep the data private, like you can imagine tobacco companies sponsoring something. So there's some conflict of interest then usually universities try to frame these in a way that gives the researchers the right to publish regardless of what the results are, and to make it available so that the funder does not have a yea or nay vote. So those are on collaboratives side when there's some funding coming in from industry but, in general, there is basic science, there is academic science, and there is expectation there that you're publishing and making the results open, and then there is the industry side, and, of course, I'm broadly generalizing. There are things you will keep private in academia, there's competitiveness in academia as well, you're afraid of getting scooped. But broadly speaking, academia tends to publish and be very open, and your reputation and your career prospects are really tied to your publications. And on the industry side, it's not so much about the publications as about the actual company bottom line and the vaccines, drug targets, right, molecules that you're discovering, and those you're not necessarily sharing, so there's a lot of research that happens in industry. And my understanding is that the vast majority of it is actually not published. Cindy Ng: I think even though they have different goals, the thread between all of them really, is the data because regardless of what industry you're in, I hate this phrase, "data is the new oil," but it's considered one of the most valuable assets around. I'm wondering is there a philosophy around how much you share amongst scientists regardless of the industry? Lenny Teytelman: In academia, it tends to be all over the place. So I think in industry, they're very careful about the security, they're very, very concerned about breach and somebody getting access to the trials, to the molecules they're considering. The competition is very intense and they take the intellectual property and security very seriously. On the academic side, it really varies and there are groups that, even long before they're ready to publish their intel on science, they generate data, they feel like we've done the sequencing of these species or of these tissues from patients, and we're going to anonymize the patient names and release the information and the sequences of the data that we have as soon as we've generated it even before the story is finished so other people can use it. There are some academic projects that are funded as resources where you are expected to share the data as they come online. There might be requests that you don't publish from the data before we did if they're the ones producing it, so there can be community standards, but there are examples in academia, many examples in academia where the data are shared and simply as they're produced even before publications. And then you also have kind of groups that are extremely secretive. Until they're ready to publish, no one else has access to the data and sometimes even after they publish, they try to prevent other people from getting access to the data. Cindy Ng: So it's back to the possessiveness aspect of it. Lenny Teytelman: My feeling just anecdotally from the 13 years that I was at the bench, as a student, post-doc, is that the vast majority of scientists are open and are collaborative in academia and that it's a tiny minority that try to hoard the data, but I'm sure that that does vary by field. Cindy Ng: In the healthcare industry, it's been shown that people try to anonymize data and release it for researchers to do research on, but then there are also a few security and privacy pros who have said that you can re-identify the anonymized data. Has there been a problem? Lenny Teytelman: Yes, this is something that comes up a lot in discussions. Everone does when you're working with patient data, every one does go through concerted effort to anonymize the information, but usually, when people opt in to participating in these studies and these types of projects, the disclaimers do warn the patients, do warn the people participating that, yes, we'll go through anonymizing steps, but it is possible to re-identify, as you said, the anonymized, the data and figure out who it really is no matter how hard you try. So there are a lot of conversations in academia about this and it is important to be very clear with patients about it. There are concerns, but I don't know actual examples of people re-identifying for any kind of malicious purpose. There might be space and opportunity for doing that, and I'm not saying the concerns are not valid, but I don't know of examples where this has happened with genomic data, DNA sequencing, or individuals. Cindy Ng: What about Henrietta Lacks where she was being treated for...I can't remember what problem she had, and then it was a hospital... Lenny Teytelman: Yes, that's a major...there's a book on this, right, there's a movie. That's a major fiasco and a learning opportunity for the research community where there was no consent. Cindy Ng: Did you ever see this movie called the "Three Identical Strangers" about triplets who found each other? Lenny Teytelman: No, I haven't. Cindy Ng: And then they found that all three of those triplets were adopted, and then they thought, "Hmm, that's really strange." So then they had a wonderful reunion and then, later down the line, they realized that they're being used as a study. There were researchers that went in every single week to their homes, to the adoptee's homes, to do research on the kids, and knew that they're all brothers, but neglected to tell the families until they found each other by chance. And then they realized they're part of a study and they refused to release the data. And so, I found the Henrietta Lacks and this new movie that came out just really fascinating. I mean, I guess that's why they have regulations so that you don't have things like these scenarios happen, where you find out after you're an adult, that you're a part of a strange experiment. Lenny Teytelman: That's fascinating. So I don't know this movie, but on a related note, I'm thinking back…I don't remember the names, but I'm thinking back on the recent serial killer that was identified, not through his own DNA being in the database, but the relatives participating in ancestry sequencing, right, submitting personal genomics, submitting their cells for genotyping, and the police having access, tracing the serial killer through that. There certainly are implications of the data that we are sharing. I don't know what the biggest concerns are, but there are a lot of fascinating issues that the scientific community, patients, and regulators have to grapple with. Cindy Ng: So, since you're a geneticist, what do you think about the latest DNA testing companies working with pharmaceuticals in potentially finding cures with a lot of privacy alarms coming up for advocates? Lenny Teytelman: Yeah, so it has to be done ethically. You do have to think about these issues. My personal feeling is that there's a lot for world and humans to gain from sharing the DNA information and personal information. The positives outweigh the risks. That's a very vague statement, so I do, you know, I think about the opportunity to do studies where a drug is not just tested whether it works or not, but depending on the DNA of the people, you can figure out what are the percolations, what are the types of the drugs that will have adverse reactions to it, who are the ones who are unlikely to benefit from it. So there is such powerful opportunity for good use of this. Obviously, we can't dismiss the privacy risks and the potential for abuse and misuse, but it would be a real shame if we just backed away from the research and from the opportunity that this offers altogether, instead of carefully thinking through the implications and trying to do this in an ethical way.
06 Speeding up biomedical research by sharing protocols with Lenny Teytelman
Bio2040 - Bottlenecks & Future of Science, Healthcare & Drug Discovery
Lenny is the founder and CEO of protocols.io, where scientists can freely share scientific protocols with each other. Similarly to github, others can then try them out in their own lab, comment on them, improve them or fork them. This leads to many benefits for both the original author as well as the scientists who get access to new protocols. 11,000 scientists are already signed up and more are joining each day.
Protocol Reproducibility and Protocols.io with Lenny Teytelman
Talkin Immunology with BioLegend
We return to the Talkin' Immunology Podcast with guest Lenny Teytelman to discuss his website, protocols.io, and the challenges of protocol reproducibility, journal paywalls, and more. Topics Protocols.io BioLegend supports The Reproducibility Initiative BioLegend protocols Why I, a founder of PLOS, am forsaking open access Cuckoo for Cocoa Puffs publication Keywords: Lenny Teytelman, protocols, reproducibility initiative, journals, paywalls, troubleshooting, reagents, antibodies, protocols.io, PLOS, Michael Eisen
I interview Dr. Lenny Teytelman of the protocols repository Protocols.io. We discuss grad school stress, including that pesky bugger imposter syndrome (the feeling many grad students get that they aren't good enough), and how to deal with it. Lenny also shares why grad school is great preparation for many non-faculty positions, including in start-ups. Find Lenny: @lteytelmanProtocols.io Find BoldAdulting: @BoldAdultingBoldAdulting.com Sponsored by the BoldAdulting online class: How to deal with the 3 most stressful parts of grad school: AKA All grad students think they suckFree preview at bit.ly/how-to-deal-with-grad-school-stress-slides