The Biorevolution Podcast

Transcript

Back to episode

00:00:04: The Biorevolution podcast.

00:00:06: Your hosts?

00:00:07: Luise von Stechhoff

00:00:08: and Andreas Roichler, the biorevolutions podcast.

00:00:14: welcome to this show again.

00:00:16: easy as always.

00:00:17: of course we start with quotes but only after I've told everybody.

00:00:21: title off today's story AI driven protein design beyond nature's blueprint.

00:00:27: This is what we have at stake.

00:00:34: But let's start with a quote.

00:00:35: Yes Let's Start With A Quote.

00:00:37: And I would start with one quote from Heiner Linke.

00:00:47: the Nobel Prize in chemistry to John Jumper and Damis Hassabis.

00:00:53: And David Baker, one of the discoveries being recognized this year concerns the construction of spectacular proteins.

00:01:01: The other is about fulfilling a fifty-year old dream predicting protein structures from their amino acid sequences.

00:01:08: Both these discoveries open up vast possibilities.

00:01:11: Despite that fact we're not going talk structure so much.

00:01:16: The hype surrounding AI-based protein design comes a little bit from the successes of being able to predict structure based on sequence as recognized by the Nobel Prize committee and buy quite large amount funding for biotechs in this space.

00:01:35: And we have Biotech founder, CEO of Provolute today with us Ingmar Schuster who also brought us a quote about maybe the AI hype in protein design and biotech per se.

00:01:48: So yeah, currently I think what most people perceive AI for biology and biotechnology as is it's like a windshield wiper when its raining heavily.

00:02:01: so biology super complex AI helps a bit.

00:02:04: but then next problem comes around And i think everybody understands But it is not as fast to realize the potential.

00:02:16: As compared to areas where there's vast amounts of data like text and images in video,

00:02:23: And this I think is a nice starter to dive into this episode.

00:02:28: Ingmar just to introduce you very briefly You are as i said founder and CEO of Provolute A biotech concerned with AI driven protein design and we'll dive into what.

00:02:41: And I think you have, and really like the word on your home page that you're multi-passionate.

00:02:46: That don't only care about AI but also in computer science linguistics psychology?

00:02:52: And i think psychology is a good point because we'll dive into how do actually get people together to speak the same language when trying to conduct these kinds of projects!

00:03:05: Your background is in Computer Science as I take it.

00:03:08: so biotech space, right?

00:03:11: So you are by nature not in the Biomedicine field but have transitioned there.

00:03:17: Maybe we can actually start with that.

00:03:19: how does it feel?

00:03:20: is data better on other side of the fence or do like biomedical data?

00:03:26: I studied two things computer science and linguistics.

00:03:30: Actually i did a diploma at Magister And my PhD was on modeling the meaning of natural language and I wasn't very good in like, i didn't have revolutionary results.

00:03:45: In my phd but they came later obviously and uh...i used machine learning for this.

00:03:51: it's a thing that many natural language processing people say proteins are basically like human natural language.

00:04:00: there is some truth to that.

00:04:02: its Proteins are represented by a few letters that are ordered in sequence.

00:04:07: So it's simple data structure.

00:04:09: if you have this data science machine learning viewpoint, the big problem I think for many researchers is there not lot of data and the data acquiring one data point extremely costly as compared to natural language.

00:04:26: Why we so excited about new proteins?

00:04:28: why do need more protein?

00:04:30: nature has gifted us A lot of proteins.

00:04:32: I asked chat GPT yesterday how many proteins there are and i did not get a satisfactory answer across species, but we have a lot of protein right?

00:04:41: We have our proteome all the proteomes for plants animals microbes fungi on the planet.

00:04:48: isn't that enough?

00:04:49: so think this two reasons.

00:04:52: one reason is the simplicity off how proteins built at least in terms of sequence And then, of course they come structure prediction and whatnot.

00:05:00: But in principle statistically speaking all the information is in sequence and may glossing over a few complexities.

00:05:08: All that information it's in this sequence So its simple space to design.

00:05:15: when you look at general molecules those are not sequences so complexity multiplies.

00:05:22: I think thats one reason.

00:05:24: As for your question don't we have enough proteins?

00:05:26: Proteins in nature have not evolved to do the things we want them to.

00:05:30: One very simple example of this is pharma production, a field that Provolute has worked on as consulting company partnering with Big Pharma.

00:05:41: So Big Phama wants to produce their therapeutic compounds and Nature typically catalyzes reactions.

00:05:52: so proteins that catalyze reactions enzymes They catalyze reactions from one tiny molecule to another tiny molecule.

00:06:01: and quoting a scientist from Zanofi in pharma production, it's bulky substrate, bulky substrate.

00:06:09: And that never happens in nature.

00:06:10: so you are dealing with big molecules that your trying to assemble basically through bio-catalysis.

00:06:18: It is just not something the nature ever wanted to do.

00:06:22: its'nt interested

00:06:25: Beyond what nature has given us because the tasks that we as humans want to do with proteins are not captured by nature.

00:06:33: So

00:06:34: beyond pharma production, where would you?

00:06:36: I mean?

00:06:37: there is a lot of potential across industries right from crop science too?

00:06:43: I don't know carbon fixation in battling climate change and biomaterials in any area so I try to figure out a little bit how big this space is.

00:06:55: It's a little but fractionated, But there is massive economic potential in designing better proteins right?

00:07:02: Yes yes In principle There's massive potential.

00:07:06: in reality the incentives push protein design towards certain areas and that would be farmer very clearly And in Farmer mostly designing new therapeutics, where protein design is a small part in the beginning really.

00:07:26: Pharmaproduction and then other high-priced products like baby formula because people are willing to pay a lot for that.

00:07:35: cosmetics... Collagen!

00:07:39: Which

00:07:40: does by big matter analysis not improve your skin just as a disclaimer

00:07:45: And you shouldn't eat it because its a bad source of protein.

00:07:48: But the reality in incentives is that, for example, carbon fixation... ...is extremely difficult to do from an economic perspective without a CO² tax and big one Because biological technologies are very expensive to develop so it rarely makes economic sense.

00:08:14: It's one of the things that you have to give to EU administration, it sees something like this and then does it.

00:08:22: The problem is we don't have economic firepower in order for it to matter right now And thats why there isn't a price in Europe if nobody else follows.

00:08:34: But would say lowering the thresholds of protein production so making cheaper more accessible will also make it more accessible to other areas.

00:08:44: So I think probably the principles that you would apply for farmer production are applicable across the board, right?

00:08:51: It's just improving the properties of protein...

00:08:54: Yeah and do see this happening.

00:08:55: like agriculture is doing stuff that farmers have done fifteen years ago because now its cheap.

00:09:03: same with chemistry because chemical companies operate in very low margins BASF, for example where my dad worked operating very low margins.

00:09:14: They can't afford any fancy stuff.

00:09:17: so they're using what farmer used like a decade or two ago.

00:09:30: In general For those who are not in the field and scientific field would describe The path from the wet lab to insertical...in general kind of What is an initial game changer?

00:09:42: or to

00:09:45: replace parts of what love with Encilico?

00:09:49: In part, people have just realized that this is possible and they start believe it.

00:09:54: Overall I've been in machine learning before.

00:09:56: It's been a hype And overall when it became a hype... ...I was looking at what people said oh you can do this with AI & Machine Learning.. ..and i was like yeah we were able to do these five years ago in exactly the same quality.

00:10:12: You're now realised that you can't.

00:10:16: So that's one part, it is just perception.

00:10:19: The other part is a combination of the parts where you have lot data namely you had a lot sequences and don't know what they are doing but big databases of protein sequence people trained to predict one part given another part which is kind of a brain debt task.

00:10:41: It's the brain debt tasks that also chatbots are fulfilling, but as we have come to realize this brain-debt task if you give it enough data You will draw some knowledge out of it.

00:10:53: and now there's knowledge That you can use to represent a sequence.

00:10:58: And If you then combine these for example with wet lab data?

00:11:08: a thousand.

00:11:10: You get twenty different production sequences and you can predict what would happen if you put at twenty first or twenty second in the wet lab without having to do it, which gives you the power of then prioritizing so...you have twenty that are actually measured.. you predicted this property for twenty million and filtered down into most promising I don't know ninety

00:11:37: six.

00:11:38: So in the end it's sheer numbers, a multitude of zillions and zeroes.

00:11:45: You are trying to replace expensive wet lab work with much cheaper computations?

00:11:53: I feel like pay off would be dual on one hand just really saving time for money which is huge in pharma

00:12:01: especially at

00:12:02: times... On other

00:12:04: hands

00:12:05: Would you say, with the help of AI... ...you are able to design proteins that have not been able to reach via directed evolution.

00:12:17: Absolutely!

00:12:20: You would never even dare.

00:12:21: I mean in all papers people are extremely cautious because it's so easy to pack up a protein.

00:12:31: It is okay we can put an E

00:12:35: So it's really easy to ruin a protein.

00:12:38: If you don't have guardrails, You're trying to be conservative and that is also what we've seen with people who worked there were super-conservative as they recommended us do.

00:12:52: sometimes we are fresh but why not introduce many more mutations?

00:12:57: We failed actually.

00:12:59: so its tricky how to best explore the design space.

00:13:06: What I like to say is, okay you have a great predictor of your properties but if you look in the wrong part of space then you just predict what's the best out of bad sequence and that doesn't help

00:13:20: much.

00:13:21: Do think it also influences training data?

00:13:24: In past researches has been quite conservative so probably more exciting or more deviant sequences would also not be covered, right?

00:13:34: In part.

00:13:35: But I think what's been worse is that people did not report negative data so they only reported sequences that worked for the purpose that there were interested in and not ones that didn't work.

00:13:48: And when we started out We told everybody please keep bad sequences because it helps machine learning to weed them.

00:13:59: kept this data around, just didn't think of it.

00:14:03: Yeah there's...of course I mean..this is something that we discuss quite often and they're a bias towards reporting positive data.

00:14:10: This is very clear!

00:14:12: You mentioned right?

00:14:13: That there was hype in the field.

00:14:15: where would you say when look at what people are doing like large language models for designing new proteins like profluent, for example is doing or having more the Rosetta Alpha fold-based approaches.

00:14:30: Or also what you guys are doing of course but do feel there's a certain model that will prevail?

00:14:37: Do think we're just in an influence space where models develop further and have to keep open mind which direction moving?

00:14:48: So I used to tell everybody we're not interested in structure.

00:14:51: Structure is completely captured by sequence, which is statistically speaking true... ...I had a guest on my podcast on machines and molecules, Martin Steinegger who was very well known for his bioinformatics toolkits.

00:15:05: And he said yeah!

00:15:06: I thought sequence captures everything but structure's really useful i found out And I think it will not be a certain approach that would prevail, but people who want to solve problems and don't care so much how they solve them.

00:15:21: As long as you solve the problem those will prevail.

00:15:24: It's not about being religious only doing this or that.

00:15:28: Is there one thing where we say this is clearly overhyped in range of what do you wanna share?

00:15:34: I wouldn't say a method That was overhype.

00:15:40: Again, overhyped is a certain type of business model where companies would say okay our Business Model Is not.

00:15:48: we develop a therapeutic or we developed some final product but We are A pure digital service provider.

00:15:55: This has been in recurring theme Over the past few decades that people came again and again.

00:16:00: I thought oh will just do The Digital part.

00:16:03: We want to develop actual products.

00:16:05: sometimes Companies came out that are profitable and doing okay, but it's overall It's not been a recipe for success.

00:16:13: And also this time around it seems

00:16:15: like Can we talk about your company little bit where you come from?

00:16:27: Where the starting point was in This expertise in data and biology?

00:16:33: biotech and proteins were?

00:16:35: did they come from?

00:16:37: how many people Are you?

00:16:38: what are your current projects?

00:16:41: to learn more about them

00:16:43: sure.

00:16:44: So we are six right now.

00:16:46: The co-founders have backgrounds in machine learning.

00:16:49: that's me, then business and biochemistry.

00:16:52: that's Philippe and Jelena.

00:16:54: Jelana was a cancer researcher for her academic career... So she is

00:17:01: an MD?

00:17:02: No!

00:17:03: She has a PhD in Biochemistry.

00:17:04: Oh okay got it all

00:17:05: right?

00:17:06: yeah basically she had to put up with me saying let's do a purely digital company.

00:17:11: until we hit several walls.

00:17:13: We've tried to go through them and in part, we did working with big pharma companies.

00:17:19: now as for Zeneca which here is quite the achievement because it's super difficult than that I can still get work at all.

00:17:30: but yeah In the end probably we should do therapeutic development.

00:17:35: So

00:17:37: you're investing in your own pipeline?

00:17:39: Yeah, exactly.

00:17:40: We are working on our

00:17:40: own pipeline.

00:17:41: Which field?

00:17:43: Cancer, colorectal and breast right now so epithelial cancers.

00:17:47: we are working a new mode of action basically trying to make the application of mustard gas for patients more specific.

00:17:57: I recently read that it seems like our chemotherapy is all descendants.

00:18:03: Is this because of the guardrails?

00:18:05: Because of the protective way Big Pharma acts.

00:18:09: I mean, we had big pharma on this podcast as well in the past and Of course our daily work have an Impression of the actions And the mode of big phama.

00:18:20: a little bit A glimpse at least.

00:18:23: So was it a decision like intrinsic that you made

00:18:29: to

00:18:29: feed your own pipeline?

00:18:31: or is it more out of a sense of frustration when it comes to dealing with big pharma?

00:18:38: Frustration in the sense of learning, yes.

00:18:40: That

00:18:42: plays a role but I would rather say we were just excited about their approach

00:18:49: that

00:18:50: we were approached with.

00:18:51: so we were excited about this new modality and thought they really could work.

00:18:58: It would be very flexible if things go wrong, and there's one thing in the therapeutic development that for sure is things going wrong.

00:19:06: So yeah... That was a large part of it!

00:19:09: And this frustration in the sense of learning-parts you realize among other things You will never have an outsized impact by being pure service provider.

00:19:24: Thankyoufortheopenness

00:19:26: Makes sense.

00:19:27: Do you feel that your background in linguistics, but also your interest and psychology helps you navigate these kind of interactions?

00:19:37: And because I mean... You have founded the company.

00:19:42: But you've gone through different business models probably many client discussions.

00:19:47: Is there anything maybe can share with our listeners who work in biotech?

00:19:54: That could be a good hint or tip something maybe for yourself, how to deal with something that doesn't go so well but also how to bring people together who I think speak very different languages right in this

00:20:06: space.

00:20:07: Has it helped me?

00:20:09: yes on a personal level?

00:20:12: I'm the one always obsessing over wording all of time leading team and I think their interest is super important good and nurturing relationships.

00:20:28: Something that I had to learn over the course of this company is how should i put it?

00:20:38: It's sometimes something that feels hard on a personal level, what you have to do because of reality of economics... ...and cash flow basically!

00:20:50: Coming up with your question about bringing people together My input is super important there.

00:20:58: as to I don't know moderating how people talk rather getting the right People together that want to solve problems first and foremost That try to see the bigger picture and adapt To it And always take a step back to see The big picture takes step closer to these.

00:21:18: you see the details of them.

00:21:20: Yeah, that's A very typical balance that has to be

00:21:23: struck.

00:21:26: Do you feel like there is any, when you're dealing with people that... There's any misconceptions people have about AI-based protein design?

00:21:34: So something where people think this should be possible.

00:21:37: I've read it on the news and then its not possible or also other way around.

00:21:41: they are too conservative in their thinking a lot more as possible.

00:21:47: People don't realize what these systems look like.

00:21:49: how do work?

00:21:51: so basically When tell them limits

00:21:56: of

00:21:56: an approach.

00:21:57: They say, yeah I understand it's fine if the results don't look great but then when you come with a result and they do not look great is like why didn´t this work?

00:22:07: I think what comes in there are really different viewpoints or... The fact that biochemists molecular biologists never do these kind data science works.

00:22:19: What counts their and thats why they're so conservative getting something that works.

00:22:26: And there's a hundred million things that can go wrong and you never know which problem it was, that ruined your results.

00:22:32: When your service provider especially is very easy to be blamed as the service provider sometimes its difficult then get next gig of course but I think theres big difference inside company where we learn how speak each others language.

00:22:51: Mama has been very disappointed.

00:22:54: AI people promise so much, now they don't deliver.

00:22:57: it's like yeah if AI people hadn't promised so much you would never have worked with them.

00:23:02: but I also see this frustration even in a small sense that many of the biotech or farmer clients We come in contact with very high expectations of how AI can solve things.

00:23:15: that AI has nothing to do.

00:23:16: It's more like digitization for their company, which would also be a nice thing.

00:23:23: but then I cannot solve it without an AI model.

00:23:28: That needs to be implemented.

00:23:30: Talking about AI really quickly... I'd be interested if you could take my friend who is a philosophy professor and data professor at AI A few years back in the conversation we had, maybe at next stage AI will be not to rely on data dumps but make it more autonomous.

00:23:51: Solve problems with smaller datasets and get more innovative if you will.

00:23:58: Leaving

00:23:58: less data making sense?

00:24:01: I mean yeah that was our specialty when work in farmer production like... really worst thing that we had was thirteen data points of thirteen different protein.

00:24:13: That a farmer client came to us with and we have make sense of it, big limitations on how much you can make from this.

00:24:21: but after quite some work there is information.

00:24:27: You developed the algorithm in a fashion that it was able to draw more sense of this small data set.

00:24:36: Can you share anything about your model or is that completely secret behind the curtain?

00:24:42: I mean,

00:24:42: C-I-A

00:24:44: depends on.

00:24:44: what do we need with the model?

00:24:46: there's lot things like.

00:24:47: we have worked and meantime so they can say something but initial thing work on.

00:24:53: so their initial model networked on was prediction of measurement values for new proteins, be it stability binding catalytic rate for enzymes.

00:25:05: So

00:25:05: functional assay outcomes?

00:25:08: Yes in the very beginning I massaged an algorithm that i had from type series prediction which seems very different.

00:25:22: So one of the most important things was to get a prediction algorithm that also allows you two design-off experiments.

00:25:30: You don't want to measure.

00:25:32: if you have one sequence and very similar sequence, they both are supposed be good then do we put them in the lab?

00:25:41: Maybe not because they're so similar but you learn too much from second ones or leave it out.

00:25:46: That's what is main thing for design off experiment.

00:25:50: In the meantime,

00:25:52: we've worked

00:25:52: on very specialized things that use other modules like for structure prediction, stability prediction and expressibility.

00:26:03: All of them in a mix to be able to predict.

00:26:07: if I split a protein will i be able reassemble it later at the tumour site so that it activates mechanism basically killing the tumor.

00:26:18: This is super-specialized to our particular approach.

00:26:34: How important do you feel are collaborations in your space, like some way of sharing data or bringing together different models?

00:26:44: Or learning from one another?

00:26:45: Okay collaboration is extremely rare.

00:26:49: Farm companies don't tend to share data.

00:26:52: unless they want you to

00:26:53: do something

00:26:54: with it.

00:26:55: They don´t share it with each other.

00:26:56: typically Would be important.

00:26:59: I'm not a hundred percent sure right now at the current state because data that is in pharma companies, it's in not-a good shape typically.

00:27:10: They don't even make sense of themselves.

00:27:12: oftentimes there are lots of data.

00:27:15: curation needs to be done like going into clinical trials.

00:27:20: you're encoding gender off patient incompatible ways.

00:27:25: I think once we get to a very general mechanism like language models are, of making sense rather unstructured data then it makes sense to share the data.

00:27:37: And if you say that the data and pharma companies is not in good shape how do you feel about there?

00:27:44: some companies who have taken on themselves really produce data now?

00:27:48: Like Recursion Azara or others which says they're part built these huge data sets to train our models, would you say this is something that makes sense or... Is it more of a rule?

00:28:03: If you have the right thing that we are measuring then yes and the right things might not be what do think.

00:28:10: I don't know there's lot companies bringing up around designing binders for certain receptors but i because it's kind of a low-hanging fruit.

00:28:22: It is not so difficult to predict this anymore, I think there are probably more difficult things to predict but that also are more valuable.

00:28:30: One thing for biologics would be probably immunogenicity.

00:28:36: There're much more complex properties which are harder to measure But they could very well be able to

00:28:42: predict.

00:28:43: Also Ten years from now what will become the pipeline?

00:28:47: for your company, what will you be working on then?

00:28:50: The next therapeutics

00:28:52: I guess.

00:28:52: Do we stick to cancer?

00:28:54: do you think or do you there's other exciting diseases in that sense?

00:29:00: We'll stick everything that can make a difference.

00:29:03: What answer?

00:29:03: That is really the driver.

00:29:06: There are lot of diseases that ruin people lives From idea it.

00:29:13: regenerative medicine also very interesting.

00:29:16: It didn't bring many results, but it's interesting to think about how to keep the body in a good shape so that it doesn't develop illnesses.

00:29:26: Interesting!

00:29:27: We've been talking about that many times on this podcast and I mean... ...it is also fancy stuff not only at Silicon Valley but

00:29:34: worldwide as

00:29:34: well.

00:29:35: all of what we're talking about are lab produced proteins.

00:29:38: basically Are there any approaches to put AI-designed proteins into organisms or in human gene therapies?

00:29:46: In human genes therapy, yes.

00:29:48: We've worked

00:29:48: on one.

00:29:50: Avongard is an Italian English

00:29:55: startup.

00:29:55: They are working on gene therapy for diseases of the eye.

00:30:00: we have been working with them in the past and yeah definitely For Gene Therapy.

00:30:05: it's a big promise.

00:30:08: Other than that People like Toby Abb are working on putting engineered proteins in plants.

00:30:16: He's working at a very low level, so he is interested in the biological mechanisms.

00:30:24: that has been when we worked with him.

00:30:26: I'm not sure how far his gone into getting this production quote unquote one of big problems being

00:30:39: How far, I think.

00:30:40: maybe as a final question.

00:30:42: Do you think?

00:30:44: This is a little bit philosophical but how far do you think we'll deviate in our proteomes and the near future?

00:30:52: so if...

00:30:54: We're humans.

00:30:55: While we are a planet of different species, will start to accelerate evolution through AI-driven protein design or this more like a small gimmick and will not have any influence on our evolution

00:31:10: at all.

00:31:11: If you ask Chris

00:31:12: Baal,

00:31:12: the friend of mine from AI Proteins he says yeah in twenty forty years we'll be all different like we as humans would've changed ourselves.

00:31:21: let me think about it because its'nt question I actually asked myself so much.

00:31:27: probably...probably i guess..I think certain things are no brainer to change genetically.

00:31:36: Also, like I don't know something super simple is if fetus has a genetic coding problem where they can make certain enzyme and you could do gene therapy to the fetus so that maybe when it comes out actually produces an enzyme rather than needing enzyme replacement therapy.

00:31:56: That's no brainer big win.

00:31:58: And there are other things that aren't genetic diseases.

00:32:03: There's certain forms of breast cancer that are genetically predisposed, and you could just change it when you have a daughter.

00:32:13: It doesn't change mankind though I think?

00:32:16: In that sense its been diplomatic answer because there is deeper stuff than is thinkable.

00:32:24: not only if your Georgia Orwell fan

00:32:27: i

00:32:28: mean honestly we would lose much.

00:32:31: Everybody was as smart they can be and healthy, I think that's a good idea.

00:32:37: With chip implant or without?

00:32:39: I am the believer we'll soon have neuroimplants to sync with

00:32:44: my friend Peter Schleich.

00:32:45: Thank you very much, Imai.

00:32:46: Can i ask one extra question for people who are actually viewing our podcast?

00:32:50: What is on your sweater?

00:32:53: Oh yes,

00:32:55: yeah it's an analyzed protein And then maybe this is up for interpretation.

00:33:01: Okay,

00:33:02: so there's an alpha helix

00:33:03: and some globular sequences... And the nematode!

00:33:09: Yeah maybe exactly.

00:33:16: Many many thanks.

00:33:17: thank you

00:33:18: for being our guest today in The Biorevolution podcast.

00:33:22: please leave a comment on Spotify if you want to request topics we might talk about here at the biorevolutions podcast

00:33:32: science-tales.com.