How to keep track of phages

Issue 167 | March 11, 2022

17 min read

Capsid and Tail — Photo by Andrea Leopardi on Unsplash.

As your phage collection grows, keeping track of your phages and associated data becomes increasingly important. This week, Jan Zheng explores how to rein in all that data before it gets out of control.

What’s New

Paper: Digital phagograms: predicting phage infectivity through a multilayer machine learning approach.

Paper: A multiwell-plate Caenorhabditis elegans assay for assessing the therapeutic potential of phages against clinical pathogens.

Paper: Interactions between mobile genetic elements: An anti-phage gene in an integrative and conjugative element protects host cells from predation by a temperate phage.

Review: Tall tails: cryo-electron microscopy of phage tail DNA ejection conduits.

Podcast: Listen in to Steffanie Strathdee MD, PhD (US) explain the challenges and opportunities of Bacteriophage Therapy at the recent ICMM World Congress.

Latest Jobs

All Jobs Post a job

Join BioNTech as Director, Discovery to lead the development of the early endolysin pipeline! In this position you can advance your career of leadership in science, push the boundaries of synthetic biology and work on the next frontier of antimicrobial therapy.

Senior Project Manager, Phage Therapy: Creative Biolabs is seeking a highly motivated talent to join our R&D team as a Senior Project Manager with a focus on phage discovery and cocktail optimization. The desired individual should have a passion to develop and pursue multiple avenues to complete research objectives. The ideal candidate needs to have significant experience in independently designing and implementing novel research approaches and prioritizing multiple concurrent projects.

Research Fellow: Join the Department of Microbiology and Immunology to develop a rapid bacteriophage isolation methodology using flow cytometry and cell-sorting, in combination with multiplexed cultures of target bacteria.

Community Board

All posts Post a message

Anyone can post a message to the phage community — and it could be anything from collaboration requests, post-doc searches, sequencing help — just ask!

iVoM Season 2, episode #5: Environmental impact of virus-host interactions
Date: 2 pm CET, Tuesday, March 15th, 2022
Register at https://ivom.phage.directoryy

Talks
Coupling metagenomics to correlative microscopy for identification of novel viruses in the deep biosphere

Prof. Alexander Probst, University of Duisburg-Essen, Germany

From phage to shark: microbial and macrobial predation govern state transitions on coral reefs

Dr. Cynthia Silveira, International Center for Genetic Engineering and Biotechnology (ICGEB)

Harnessing virus for alleviating plant stress

Dr. Neeti Sanan-Mishra, International Center for Genetic Engineering and Biotechnology (ICGEB)

Chairs

Dr. Matthias Fischer , Max Planck Institute for Medical Research, Heidelberg, Germany
Prof. Corina Brussaard , Royal Netherlands Institute for Sea Research, Netherlands

Phage Directory’s new structured peer feedback platform, Instill Science, is live!

Would you jump on a 30 min zoom call to help a fellow phage researcher talk through a problem? Or provide second set of eyes on someone’s work?

Check if there’s anything you can help with this week

Submit your own request for help

How to keep track of phages

Product designer and co-founder of Phage Directory

Co-founderProduct Designer

Iredell Lab, Phage Directory, The Westmead Institute for Medical Research, Sydney, Australia, Phage Australia

Email [email protected]

Twitter @yawnxyz

Website https://janzheng.com

Skills

Bioinformatics, Data Science, UX Design, Full-stack Engineering

I am a co-founder of Phage Directory, and have a Master of Human-Computer Interaction degree from Carnegie Mellon University and a computer science and psychology background from UMBC.

For Phage Directory, I take care of the product design, full-stack engineering, and business / operations aspects.

As of Feb 2022, I’ve recently joined Jon Iredell’s group in Sydney, Australia to build informatics systems for Phage Australia. I’m helping get Phage Australia’s phage therapy system up and running here, working to streamline workflows for phage sourcing, biobanking and collection of phage/bacteria/patient matching and monitoring data, and integrating it all with Phage Directory’s phage exchange, phage alerts and phage atlas systems.

As our collections of phages grow, keeping track of our phages will become increasingly important. As we run more experiments across our growing phage collections, the trove of data we collect about our phages will increase exponentially. In this highly informal, conversational series, I’ll explore ideas on how we could rein in all that phage, microbial, and experimental data before it gets out of control.

In this first piece, I’ll introduce myself, and cover the idea of why we should track phages, what to track, and roughly how to track them.

Let me start by briefly introducing my background:

I come from the world of computer science and “information architecture” (Read: What is Information Architecture?). I’ve worked on many projects at companies like L’Oreal to make complex data problems more manageable and understandable.

Recently, I’ve been building systems that help Phage Australia manage biobanking and clinical/characterization/experimental phage data. This data will in turn help clinicians and microbiologists make decisions that relate to phage therapy.

Throughout the year, I will write a series of informal posts about designing and building databases and web applications for managing phage data.

As more labs are starting their own biobanks and phage databases, I hope that my explorations could prove useful. Much of the work is iterative and works-in-progress, and I’ll probably get things wrong along the way. I will also probably have more questions than answers. I welcome any suggestions, tips and corrections — send them to me at [email protected].

Why track our phages?

When we work with just a few phages, we might not need a “system” — a lab notebook will suffice. Even then, our phages should have names, and retrieving experimental and bioinformatics data should be effortless.

Once we have more than a few dozen phages, or once we’ve decided to create a phage biobank, the amount of data and experiments we collect about our phages will grow exponentially. Once we start running a large number of experiments, it might get easier to lose track of our phages, along with all the experimental data.

Knowing where our phages and data exist is also important when students graduate and fresh ones arrive. Without a system, our lab could lose several weeks of lab time. Experimental data collected by former students from months to years ago, might take several hours to several days for new students to find. Some data might even be lost forever.

Knowing where experimental data exists on our phages also helps speed up our data analysis, lab work, and publishing cadence. If anything, it would free up time to conduct more experiments. Without a system, we could easily spend much of our time tracking down images of plates/TEMs, charts and graphs, collections of sequencing data, and multiple versions of Excel files all labeled “final”, spread across several lab computers… for every single phage. If we had trouble finding data for 50 phages, imagine how difficult it could get to find data for 500 or 5,000 phages.

Keeping phage data clearly organized could also speed up onboarding collaborators. Team members could quickly find and share relevant experimental data, without back-and-forth emailing and requesting for permissions.

How to think about “how to think about” our phages

In computing, we have this concept of “data”. Data just means a collection of information about some thing. There’s also the concept of “metadata” which is the data that describes the data. A passport will for example have attributes like first, middle and last names. It might have also have gender, hair length, or hair color. These attributes make up the metadata of a passport.

In the phage world, metadata describes what we collect about a phage, like “plaque size” and “host range”. Data describes what we know about the phage, like ”1 mm”, and ”kills S. aureus”.

When we consider “what do we need to know?” about our phages, we are thinking about what metadata we want to collect about our phages. We call the collection of metadata that we choose to describe our phages as our “phage schema”.

Where did they come from, where do they go?

The most fundamental qualities we want to know about our phages are probably “what do we call our phage” and “where do we find our phage” — the “name” and “location” of the phage. We might also want to know “who’s the person in charge of the phage”, “is it currently expired? (e.g. does it need to be re-amplified?)” and “where and when was it isolated?”. We probably will want to know either “where is the data on our phage” or at least, “who has the experimental data on our phage”.

We probably also want to know which bacterial strain was used to isolate the phage, and the number (and results) of experiments that have been conducted on the phage.

Going deeper, we might consider the intellectual property, material transfer agreements and usage rights of the phage (e.g. what can and can’t we do with a phage, which we received from another lab).

Oh, and we should probably also track things like “who sent us the phage” and “who we sent our phage to” — along with the agreements and restrictions we put in place for the receiving labs. And also “where did we publish about this phage”. The list of metadata that we could collect goes on and on.

The most crucial piece of information we need about our phage however is its identity. What makes a phage a phage, and when should we consider a “slightly different phage” an altogether different phage? How should we think about how these two phages relate? Are they ancestors, twins, siblings, or cousins twice removed?

How we decide on how we identify a phage will depend on what aspects of our phages we care about.

What do we track about our phages?

What a lab tracks about its phages depends on what the lab cares about. An ecology lab’s interests and needs will differ from those of a plant pathogen lab or a phage therapy lab.

Every lab should divide what they know about their phages into two main categories: Core Identifying Characteristics and Conditional/Mutable Characteristics.

Core Identifying Characteristics define “what makes this phage unique”. These should answer questions like “what are its defining characteristics”, “where did it come from”, “where is it", "where/how can I get it”, “who is responsible for it”, and “is it expired”.

Core Identifying Characteristics are the things printed on our passports, driver’s licenses and social security / tax ID cards. These are essential, as they help us find the phage, which we could use to derive all of its other characteristics. Clearly being able to identify a phage is also necessary for communicating about it in papers and social media, and for describing/comparing/sharing the phage.

Conditional or Mutable Characteristics are attributes that might or might not hold true under all circumstances. This is like my current hair color, or my current friends and coworkers. These can change over time. They can have different magnitudes. They can be different in various conditions, be temporary or permanent, or only exist relative to other objects like strains, phages, and antibiotics.

When establishing our phage’s identity, there’s a “gray area” around what actually defines a phage’s identity. For example, my hair color is on my passport. If I dye it blue tomorrow, does that make me a different person? Does it make my passport invalid? Similarly, my friends and coworkers would never appear on my passport — as those would fluctuate over time.

Similarly, if I learned a new skill tomorrow, does that make me a different person? If I know how to ski, but I’m at the beach and haven’t been observed skiing — would I still be a skier? Would “skier” be a part of my identity?

In the phage world, we classify phages based on what we observe. These observations are sometimes conditional, mutable characteristics. Sometimes they’re abilities that depend on our observation conditions. Other times, they could be core identifying characteristics.

What we define as a phage’s identity depends on what we care about. If we cared about friends, family and coworkers, we’d absolutely define that as part of a person’s identity (Facebook and LinkedIn do this). If we cared about someone’s skiing abilities, we’d define that as part someone’s identity.

When building out our phage data system for Phage Australia, I think of a phage’s name, isolation host, origin (e.g. where was it found, or what lab did it come from), and sequencing information as Core characteristics.

If a phage is engineered or evolved to be “significantly different” from another phage, I would try to determine its degrees of similarity to other phages in the database. This could potentially phenotypic data, but most likely I would rely on the differences in the genome sequence data. And of course, “significantly different” is subjective.

Any observed characteristics and abilities that can change based on conditions and relationships (e.g. host bacteria) I would classify as “conditional or mutable characteristics”. Some characteristics or abilities could be considered “temporary” or “impermanent”. These would also be considered “conditional” and not part of a phage’s core identity.

Characteristics like host range, growth curves, antibiotic synergy, and even morphology data like plaque size would depend on a phage’s observed conditions. Some conditions like temperature, media, and presence of bacteria and antibiotics could change our observed characteristics. It’s important to note that many of these observations occur relative to other entities like antibiotics and bacteria.

Take host range for example. Instead of a list of strains that a phage could lyse, I would want to know its plaque morphologies on the strains it had been tested against. This gives me a better picture of the full range of hosts, and the various degrees of potency against strains that phage has been tested against.

Core Identifying Characteristics	Conditional/Mutable Characteristics
Name	Life cycle
Phage Identifier / Accession ID	Host range
Isolation Strain	Plaque info (e.g. plaque morphology)
Origin / discovery info	Antibiotic synergy
Genomic characteristics*	Genomic characteristics*
Morphotype	Propagation info
Publication info	Annotation info
Genbank info	Other experimental data
Material transfer agreements
Intellectual property info

What do we care about?

How we identify our phages depends entirely on what we care about. For Phage Australia, our host ranges will always be derived from successful and unsuccessful plaque results, and the underlying conditions.

Just like listing “natural hair color” might or might not make sense on a passport, “natural host range” might or might not make sense on a phage passport. It just depends, and there are no obvious answers.

However, how we define a phage’s identity affects how we record data about a phage. For example, Sydney is known as a “sunny” city. Except it’s the rainiest city I have ever lived in. Since moving here almost three weeks ago, it’s rained almost every single day. Flash floods have washed away roads and caused evacuations (thankfully we’re fine). Should I classify Sydney as “Rainy” or “Sunny”? Or do I instead not call it anything, and just show the number of days that it’s rained vs. the number of days it’s been sunny? Do I record the amount of rain per day? Or do I not talk about the weather at all?

As we develop better sequence analysis tools and characterize more phage genes, we’ll get a better understanding of both core and conditional/mutable phage characteristics. While genome size is probably an identifying characteristic, a phage’s known antibiotic resistance genes will probably change over time.

Additionally, how do we classify engineered, mutated and synthetic phages?

Looking Ahead

Creating identities around phages is really hard. There are many characteristics we could use to define the identity of our phages. Which ones we choose will depend on our lab and funding needs.

I previously mentioned that establishing the “identity” of our phage is crucial. And the most crucial characteristic of a phage’s identity is its Name. A name enables us to access, discuss and communicate our phage. But most importantly, a name lets us compare our phage against other phages.

In our next post, we’ll geek out on the importance of both generic and memorable phage names, the many ways we could name our phage, and explore why comparing phages is necessary.

——

Special thanks to Jessica Sacher, Evelien Adriaenssens, and the Phage Australia team (Ruby Lin, Nouri Ben Zakour, Stephanie Lynch, Jon Iredell) for helping me hash some of these ideas out.

More special thanks to various phage labs and biobanks we’ve spoken to over the years about data management. Some of these labs include: Queen Astrid Military Hospital, Sciensano, the Félix d’Hérelle Reference Center for Bacterial Viruses, DSMZ, ATCC, NCTC, TAILOR, Israeli Phage Bank, The Bacteriophage Bank of Korea, Fagenbank, Citizen Phage Library, Japan Phage Bank, and many more, throughout the years. Thanks so much for putting up with my incessant questioning!