ML in PL Workshop, Generative methods in drug discovery, a practical introduction

Last weekend, I had the opportunity to attend a fascinating workshop on generative methods in drug discovery. The workshop was led by two experts in the field - Stanislaw Jastrzebski, CTO and Chief Scientist at Molecule.one, and Tomasz Danel, lead machine learning scientist at Insitro.

It was run as part of the ML in PL conference

Overview of the Workshop

The day started with an overview of the traditional drug discovery process and how ML can accelerate certain steps. The presenters talked about:

Virtual screening using molecular fingerprints and QSAR modeling
Retrosynthesis prediction to figure out how to synthesize molecules
Molecular generation with deep learning

We then jumped into hands-on tutorials in Google Colab notebooks. We practiced basic workflows like:

Sampling random compounds from ZINC (a database of readily purchasable compounds that can be used for virtual screening.) and testing their activity against a target protein
Mutating top compounds by tweaking their SMILES strings
Building QSAR models to predict activity using Scikit-Learn and PyTorch

Next we leveled up to more advanced ML techniques:

Implementing an active learning loop that iteratively collects data, retrains models, and proposes new compounds. This was super cool to see in action.
Using graph neural networks on molecular graphs to build predictive models. I’m excited to explore GNNs more.

The workshop finished up with a friendly competition to find high-affinity binders for two drug targets. It was a race against the clock to come up with the best compounds.

Key Takeaways

Here are some of my biggest takeaways from the workshop:

The end-to-end process of hit finding, lead generation, and optimization
How to intelligently search chemical space by iterating on active compounds
The power of combining machine learning with wet lab experiments
How far deep learning has come in modeling molecular properties and activities

And we got to work with awesome Python libraries:

RDKit for cheminformatics features
Selfies for representing and mutating molecular graphs
Scikit-learn for QSAR modeling
PyTorch for graph neural networks

The instructors were phenomenal and I really appreciated the hands-on, practical nature of the content. I’m amazed at the high quality of the workshop and am immensely grateful to Stanislaw and Tomasz for their time and effort putting together an awesome workshop!

Links to content

Codebase (server, solutions, notebook)

Slides

ML in PL Workshop, Generative methods in drug discovery, a practical introduction

Overview of the Workshop

Key Takeaways

Links to content

Further Reading

Using GPT4 to generate git logs for OpenSource projects in the style of conventional commits via a terminal

Deploying Llama2 on A100 GPUs using vLLM

Code and Coffee Meeetup - Notes on LLM tokenizers