Software Development with Darren Platt - how to design a good lab software?
Updated: Sep 21, 2020
Hi everyone, it’s Amber Shao, founder and CEO of AduroSys, a laboratory data management software company. Welcome to AduroSys Lab Software podcast. Joining me today is a very special guest Darren Platt. Darren is Chief Information Officer and President at Demetrix. One of his roles is to oversee the team that handles data management and analysis for Demetrix’s cell engineering platform. Prior to Demetrix, Darren was VP of Data Science at Amyris, Head of Research at 23andMe, and led the computing efforts at Joint Genome Institute from DOE and Exelixis, overseeing several multi-million dollar software projects and large teams of software developers in biotech settings covering DNA sequencing, consumer genomics, and Synthetic Biology.
Amber: Great to have you back Darren.
Darren: Absolute pleasure to be here.
Amber: In the last episode, we talked about some general questions around buying commercial software or building in-house for labs. Usually, people think buy versus build is a hard question, but once it's decided, let's say you decide to build it in-house, actually, the hard part just starts right, because then you have to find the right resources, you have to gather a requirement, do analysis and design it and develop and test, etc. But those are all like pretty much standard procedure. Doesn't matter if you're building software for a lab or for a consumer app. But I think what's unique for us, like the lab type of software, is the environment. The environment, the software we release to. It's a lab environment, lab user. It's not an environment that most people interact with on a daily basis, Right? Also the type of software that we're developing, it's not something that has simple functions where you can just give it to the users and they play around with it, they figure out. A lot of time it involves calculations and very sophisticated features. So it's hard to figure out. At the same time, there are no like millions of users in that lab. Typically, there are just like groups of users with dedicated functions and roles. And some do white lab and some do dry labs and sometimes they do both. And also, we have users who don't like to use software. But they were told that they have to use software, right? So there's like this mix of different types of users where you have to make sure they are all happy with your software. So this is why I think it's really important to before you develop lab software, you have to really think carefully about your design. Making sure the design is suitable for the lab environment. So today, I think we're going to talk about, basically how to design the software correctly so it's suited for the lab environment. Once you know the requirements, so how do you come up with the right design? What are the different factors that we should consider during the design of our lab software?
Darren: I think that is a wonderful question, and I've built so many of these things and I would say I've seen things go really well, I've seen things go really badly. And you always sort of afterward try to work out like, you know, basically where do they go right or wrong on that project? I think the good news is most people implementing these systems have a captive audience of, you know, maybe a few dozen, to maybe a few hundred users and they can go and talk to them. Your audience is right there. So it's like selling software anonymously through the Internet. So you get to know them really well and watch them. You can learn. If you make a mistake, you can fix it really quickly. There's an enormous amount of flexibility. I think the bad news is there's also an enormous amount of flexibility. Most bio labs are a fast-moving target where the requirements are constantly changing and people are expecting the software to sort of follow them around. And when you roll that software, you're actually dictating basically a whole lot of people's lives. You're controlling how that lab is going to run. I'm a big fan of building software after there's been a little bit of process for a while. Maybe with Excel spreadsheets, people actually know what they're doing. You really want to make sure that they've standardized the process. You're not trying to build four different workflows. You're building the one that they've decided on. And then you really want to build a minimum viable product to get it in their hands and see if they like it and then sort of iterate on the design.
If you can take some shortcuts, if you're going to have things that are down the command line for a while or uploading spreadsheets or maybe even SQL, that's fine. If you get something to them a little bit more quickly. The area I never like to compromise on is the data structure piece. So if you're going to spend time sort of working one piece out how you represent the data in the database will have a really long term consequences for all sorts of things. It's very, very hard to change actual user interfaces, how you collect things you can kind of constantly tweak and improve, but definitely understand what they're trying to represent and make sure you get a really solid representation. You know, if you're dealing with Strain's, make sure you have a really good data model for representing that.
The other thing when you're designing it is you should challenge every single assumption when they tell you they always do this. You want to ask about the unhappy paths, about when things go wrong in the lab. Do they ever vary things? When they say that we will always have a name, make sure that that is really true. That's basically the core of the design. Challenge assumptions. It's good if you can actually watch a working process and then build the smallest thing possible, get in their hands and then constantly add features to it. A lot of people back into that sort of accidentally. They just sort of start with the spreadsheet and then just some months it gets built on top of it without a lot of thought.
Amber: Right. I couldn't agree more. So basically, it's better for the lab to kind of decide on their process for us, right? So they understand the process, then you can build something that's really tangible rather than just kind of trying to tackle everything, right?
Darren: I'd almost go as far as saying if the process of rolling out a LIMS system doesn't actually force them to standardize a little bit and really become a little bit more forward with their process, you're probably not doing it the right way. If the contract is, they can continue doing whatever they like and you'll try to track it, then that's going to end badly for them, and you probably.
Amber: Right. So what are some of the common mistakes that you've seen people made?
Darren: Yeah, I think it sort of related to my last point. It was almost the worst thing you can do is actually build exactly what they ask for. If you just sort of blindly go down a riot and say it must do this, must do that, and you just implement it without challenging the assumptions, you build something they don't like. So remember, building a process for freezer checking where they want to say which freezer rack, row, position, every single thing went in. We built the system. We rolled it out, and then they complained that they had to click too much. They said, well, you have to click too much because you have to write all that down. And well, turns out we don't really want to write all that down. So, again, you probably can change the lab a little bit as you do this and make sure they're up for it. Codifying an unscalable process or process is non-standard, isn't always a mistake. Push them on primary keys for things. If the primary key is just a free text field to describe the thing and they're not prepared to do things like roll-out numbers and stuff like that, unique numbers, then it's pretty dangerous to build a sort of a database on top of that.
Challenge them on scale. You might get to build a beautiful interface that enables them to check in one plate at a time when they're hoping that the LIMS will enable them to go really fast or they're suddenly doing a hundred of these a day. And the thing you designed around is now sort of hopeless for little scale or operating at so sort of anticipate sort of success and getting to larger scales and then really help them understand this is a contract. Once that software's there, even though we say software and soft implies that it's sort of flexible. Often the software pieces one of the least flexible elements of running a lab. They can go in and do a different experiment every day. But can the database actually capture and record it? So there's usually some implicit contract there that you're actually building a pipeline that is meant for high throughput operations. That doesn't mean you can do anything you like. Sort of in the software will be infinitely flexible.
Amber: Yeah, it's really interesting, though, you brought up the point about scaling because that's in my experience, a lot of people don't think of our scale either at the very beginning. So they always think, oh, you know, once you are able to process five plates and that's the throughput that we need, and that's all we need right now. A lot of time, you know, a few months later or you realize I need to process 10 plates or 100 plates, so that's where fails. And this kind of goes back to the point that you made about the right design, about the data structure. You know, often enough when you couldn't scale from 10 plates to 100 plates, it has something to do with your database design, your, you know, their structures. And there are some fundamental issues behind them. That's why it's better to get that right in the first place and think about the scaling right from the beginning.
Darren: The other I got a good quote actually this week from somebody who just said, I'm never going to quote a timeline for a piece of software without looking at the customer's data again. So we build something. You think the data's in great shape. And then when you try to actually load the data in the data system, you realize that you've got three months of pretty aggressive curation, just to put existing information to the database, because it's not consistent with the new data model everybody agreed on. There's misinformation and wrong information, things like that.
Amber: So I also like to share something while we're talking about design. So I like to share a secret sauce that I don't even know if you remember, but I learned from you in the early days. That is the best software doesn't have to be complicated. Sometimes the most elegant solution is really the simplest solution, right? So believe it or not, that's actually a motto I lived by in my everyday work. I think as a part of human nature I do find most of us tend to make things very, very complicated, overly complicated. So how do we simplify things when it comes to designing software? Because I find you really have the eyes and the mind to really simplify things and finding the most essential elements. So how do you do that?
Darren: I think the first thing you've got to do is really be quiet, you have to be pushy with the users. And this is probably the hardest bit about being a good LIMS architect, is basically you're serving somebody at the same time, you really sometimes have to argue with them. And you've got to tell them, no, I'm not going to do that, this is just too complicated. I'll give you one example, which is that I found actually biologists often make artificial distinctions between things. So, for example, my line of work, they have bacteria and they have yeast and they like to number them differently. So as a number for the bacteria, a number for the yeast, I'm a computer scientist and woke up to that and just say, look, they're all cells. Why don't we just call them cells and we could just number them the same way. So if I have my way, everything will have just one numbering scheme, you know, one to end. And bacteria and yeast mixed together. But sometimes you'll have an argument. The users will say no, no, it has to be two tables, one for that, one for that, because they're a little bit different. Second one is always pushing for numeric IDs for things. People love writing what I call un-novel in the label for something. This is the thing that came from the thing that began the thing that was the third version of the second was like, no, it's two two one one seven. And we can put some human text next to it. So you'll fight over primary keys.
I think sometimes also if you're building a small scale process, that's going to be a high throughput process at some point. It's almost worth skipping straight to the automation. Sometimes it's cheaper to actually build the system. Talk to the robot. Robot data-collection is usually the cheapest way to get data into a system, whereas if you allow humans to enter things, you know, you have to do very complex user interfaces. There's a couple of other examples of simplification in the DNA world. You can trade every piece of DNA as a part, everything from a link to a plasma to chromosome. You can come up with very, very simple representation. Same thing for the vessels. You can just have a vessel that can be everything from a fermentation tank to an oligo tube. And a computer scientist will say that's you know, it's a nice uniform representation. I love ontologies. Like one standard ontology, which has been widely used in biology as a sort of a hierarchical system for labeling things. If you have one of those, you'll find you can use that over and over and over again for the descriptive part.
The final bit of advice is to convince a customer to live with the minimum viable product for a while. They will want a million features, but sometimes living with that MVP, they realize actually that's most of what they needed. And that's actually more important to build something else and add all the bells and whistles or in the original set of requirements.
Amber: Ok, once you have a design, then what do you do next? Do you actually spend the time to try to figure out if your design is correct or not? Do you write prototype or a mockup? Do you get users involved doing some testing on your design or do you go straight to the development?
Darren: I, you know, I think once I understand the requirements, I'm pretty comfortable going to the actual writing of the software and building it, but I'm usually it takes me a long time before I'm convinced I understand the requirements. And I always tell my software engineers, I would rather you spend an extra week or two just really nailing down what the customer wants and that can be sharing them Balsamiq mockups of user interface. Sometimes it’s a brutal two or three-hour meetings with them where they go over how they label their test tubes on Monday as well as Fridays and stuff like that. And we write it all down. There is a lot of concept boards, things like that, sketching photographs of whiteboards. Really making sure everybody understands and particularly the data structure. That's like, again, thing I wouldn't sort of compromise around because you can fix the user interface very, very hard to fix a schema once you've sort of built everything on top of it. Once we have that and we've actually even trained some of our customers to use Balsamiq. They're comfortable actually mocking up the user interface designs how I want it to look. So once a conversation is really good, then we basically race to build the minimum viable product. We have our customers are actually able to pull the code and build it rather locally so they can actually look at the software as it comes together. We can push it out to a staging environment. We try to show it to them really early and often so we don't catch any things we've missed. And then once it goes live, you just race to fill in all the things that you forgot, forgot about. So The race doesn't stop once the software arrives. It's usually okay now tell my software engineers that now you've caught the bus and you've attached yourself to the side of it and that bus is going to continue to tear down the road and you just need to keep up with it once dependent on the software operating all the time.
Amber: So those are excellent points in design. Thank you so much for sharing with our listeners.
Darren: Yup, my pleasure.
Amber: So we have another episode coming up. And we'll be talking about questions around development resources, for example, like how to find the right people to develop your own software or what are the challenges facing in finding the right people. So for listeners who would like the episode, do check back with us in two weeks. I look forward to talking to you again, Darren.
Darren: Absolute pleasure, Amber. Thanks.