Speaker 1  0:05  
Uh, thanks. Don Gail, nice to see you all. As she mentioned, we're going to talk about open source and technology transactions. So this presentation is geared for folks licensing and IP folks primarily. But you know, everyone's welcome who are handling software licensing for universities, and are you know, wondering how the open source licenses that are related to the software they're licensing impact the license the technology transactions they're trying to execute. So let me first introduce the people who will be presenting to us today. So first I have Megan Forbes. She is the program manager for the Open Source Programs Office here at Johns Hopkins University, the Open Source Programs Office, or ospo, is tasked with being like the front face and interface for open source at the university. And Megan could tell you more about what that entails, but we're very lucky to have her. It is not a position that's common in universities, and so we're lucky to have her expertise on this call I'll be talking second I do IP and licensing work for Johns Hopkins Technology Ventures, which is the tech transfer organization here at Johns Hopkins University. You know, we do normal tech transfer stuff. And then we also do, you know, venture creation and corporate relations and and all that good stuff to help translate technology out of the university into the real world. And then we also are fortunate to have with us Fiona Kaufman. Fiona is the deputy general counsel for the company data stacks. Data stacks is a database provider to enterprise, largely supporting AI solutions for those enterprise customers. Data stacks is also a major contributor to the Cassandra open source project, and I think we'll hear more from Fiona, but it's fair to say that data stacks was founded as an open source business related to that Cassandra project. So it's good to hear from her, from outside academia, about how these types of companies can operate, because that's a large sector of the software world, and understanding how those companies work is important, I think, for structuring our transactions appropriately. So let me just talk real quick about the agenda, and then I'll turn it over to Megan. Me go back one slide, Megan, sorry. So first, like Megan's going to talk about, you know, as it says, establishing open source projects on campus. So that's where they come from. What you can do to support those projects, how you know what type of resources those projects need, and the type of work she does to help build, help you open source on campus is, you know, both organic projects here and then, like, you know, helping faculty and students meaningfully contribute to public projects or projects that extend beyond campus. I'll then talk about kind of the nitty gritty of tech transactions when we're dealing with open source. So you know, a lot of that is how intellectual property works with open source, both, you know, patents and copyrights, and how those are handled in software. When you're you know, you're licensing software that has open source components or or open sourcing a project itself. And then I think it's also important to think about the business models that these types of open source businesses can have. You know, a lot of what I'll say about IP applies to any sort of software licensing, regardless of whether they're an open source they're an open source business or not, but, but particularly if you're an open source business, I think it's important to think about the business model you want to pursue, because a lot of the IP decisions you make can can help or hurt that you know that particular business model. And then then Fiona will talk about data stacks and give us kind of an inside view about how these companies operate and how they think about protecting their intellectual property. So with that, I will turn it over to Megan, and she can take you through open source on campus.

Unknown Speaker  4:12  
Thanks, Andrew,

Speaker 2  4:14  
and thank you all for joining us today. It's nice to see so many folks not just wait for the recording, but actually come to the presentation. So as Andrew said, I'm the pm for the Open Source Programs Office at Hopkins, and I'm just going to talk a little bit about what open source software is, who is creating it on our campuses, and why, and what types of support ospos and tech transfer offices can provide for open source creators. So first, it's always good to confirm that we are talking about the same thing when we talk about open source software. So the official definition is that open source software is software with publicly available source code that allows use, inspection, modification. And distribution by anyone. The rights to do all that are granted by the open source license, which is an IP license and legal agreement. So this is not just, you know, I've put it out there publicly, and therefore it is open source and anyone can use it. You do actually have to license it to then give those permissions to folks. So there's a long list of open source licenses that are approved by the Open Source Initiative, but they generally fall into two buckets. One is copy left or reciprocal licenses. Those are going to require users to release any modifications made to the software under the same license. So if I download and install something, I decide I want to make a change, and so I create some more code and want to contribute it back that all has to be licensed under that same copy left or reciprocal license. The other bucket is permissive licenses, and those really allow significant freedom to use the software for any purpose, including within commercial projects. Some examples of that might be Apache or BSD and MIT licenses. And Andrew will talk a bit more about licensing in his presentation. I will say copyleft licensed code can be used in commercial projects, but many will avoid it. So many for profit organizations will avoid copyleft licensed code. So where is all this open source coming from on campus? So obviously it's coming from our faculty, from our researchers. So there are software ventures, right? Just like folks are starting companies around biomedical tools or all kinds of things, people are writing software intended for wide adoption, intended for potential commercialization. There is also a huge swath of software being created that is research software, and so that's going to be specifically tools or code or code libraries that allow researchers to generate data or analyze and make meaning of existing data. So that's going to be a really big tranche of of the open source coming out of campuses. Students, of course, are writing tons of code and lots of big OS projects that you might be familiar with started as student projects. Drupal and Linux are two great examples. University staff is using contributing to creating open source to manage administrative tasks of the universities. So if your university might have an open source repository for things like theses and dissertations or publications, there are open source learning management systems, integrated library systems, so big enterprise tech in use of the university, a lot of that is also open source. And you know your librarian might be contributor to one of those. So obviously the answer is, everyone, right. Everybody on campus who's writing code is probably writing some open source code, or certainly using open source code. So as folks are creating this code and choosing to add an open source license to it. There are a number of intersecting policies and mandates that affect their choices, and so I'll have some examples up for each of the things I'm going to talk about. But I would strongly encourage you to look up your own University's policies, because, as you see, it's very different from university to university, and so you really want to make sure that you know you're following the the rules of your own I just want to ask my fellow panelists, I'm I've got some other screens popping up. Can you see those? Or no, you can still see the slides. Okay. Thanks. Thanks. Jenna, so the first is university IP policies, right? Who owns what, and so some like Hopkins, you'll see that the university owns the IP, right? That's coming out of Hopkins. If you're using university resources, if you're employed by the university, Hopkins owns your IP. On the other side, we have Georgia Tech, where creators own their own copyrights, right? They own their own copyrights, but they're granting a universal license to Georgia Tech. So again, you really want to make sure that you understand what your university IP policy is. That's contrasted with lots of universities have open access policies. So at Hopkins, the university expects that every scholarly article produced by a full time faculty member be accessible in an open access repository. Open Access meaning no pay wall, no. You know, as soon as it's published or even a pre print, you have to go ahead and put it in that open access repository. Versus George Washington GW, where they strongly encourage you, right? It's not a policy, it's not a mandate. It's strongly encouraged. Georgia Tech, again, looking for that positive open access policy code is not yet included in many open access policies, data is often becoming included. Codes not there yet. So that's something that we're working on, and that's something that a lot of hospitals are really interested in, is expanding. Those open access policies so that it's not just the publication, but for example, if there's a piece of software required to replicate the results of that publication, that that code itself should also be open access. Unsurprisingly, the federal government has a big hand in this. So much of the research on campus is funded by federal grants, and if it is, it's subject to things like the Nelson memo from the Office of Science and Technology Policy, which requires that all publications resulting from federally funded research must be deposited into open access repositories. So again, sort of regardless of what your University says, if it's federally funded research, the Nelson memo says you have to make it available, open access, and most recently, no embargo. Even it used to be you could wait 12 months. Now it's as soon as it's published, right? And so each division, so the National Science Foundation and the National Institutes of Health, NASA, everybody's releasing their own specific guidance. What repository should you put it in? Does it include data? You know, things like that. But we're seeing all of those roll out this year. Those specific recommendations, again, most at this time are only requiring the publication, and that is all the guidance from the OSTP says. But some agencies have gone a little further. So NASA does require your that your code being included. And again, I think that as we move along here, we're going to start seeing data and code required, just like publications are, because they are an integral part of that of that publication. So individual universities also have mandates around open source disclosure. So some examples here are the Applied Physics Lab at Hopkins, so because so much of the work that they do, it's either Department of Defense, Department of Energy, or a private contractor, you can't just make something open source. You have to disclose it first, and then they will let you know if you can make it open source. MIT is another example where they have an open access policy and they request that authors submit software disclosure forms. Of course, as we all know, you can say authors must submit software disclosure forms all you want, but in reality, we all need to be proactive about going out, finding the work that has been done, and encouraging disclosure. So how might we, either the ospo or the tech transfer office, encourage those disclosures? It's by offering support for open source creators, right? We want them to come to us because we can help them with what they're trying to do. And so I just want to talk through a couple of ways that osbos and tech transfer offices separately, together can help open source creators again, to form that welcoming place where they want to come and they want to disclose because we're helping them improve, improve their software. So one way is to offer support for open scholarship, right? That includes open access publications, data, OERs, open source code, all other forms of openness in the scholarly and research ecosystem. So support might look like helping folks with software citations, right? We're trying to get to a place where people can get credit for the code that they write. So just helping them write good citations, helping faculty create software management plans you know to include with grant proposals. Might be some examples. We can also help by providing technical infrastructure if we want people to share code in public repositories. We can provide support for platforms like GitHub. We can also provide support for skills that they might not have research software engineers might not have things like continuous integration or security scans.

Speaker 2  13:48  
We can help open source code creators choose appropriate licenses for their work so different projects have different long term goals, so their license choices should support their goals. And I know Andrew is going to talk about this a bit more. So on the tech transfer front, for example, some licenses specifically call out patent rights, and for others, you know, those are just implied. So if you're looking to commercialize your open source software, the license you choose, you want to make sure it's compatible with your business model, and that's a place where a tech transfer office or an ospo can really provide some help and support. We can help creators plan for the long term sustainability of their code. It's relatively easy to get that first grant and write some code. It is harder to turn it into a program with end users, with a governance model with sustainable technology. So ospos and tech transfer offices can bring that strategic planning expertise to software creators, and then Last up, we can help with community development. So there's plenty of open source software out there. And again, a lot of the research software really falls into this, that it's like a potted. Plan. You know, it was created for a research project. No one's ever really going to look at it again. Nobody's trying to reuse it. But for the projects with bigger ambitions, we can help with healthy community engagement. So one place tech transfer offices can lend expertise is with contributor license agreements. So these are going to be legal agreements between projects and contributors that explicitly grant those projects the right to use contributed code. So I should say that not everyone likes CLAS, but especially for projects that are thinking about commercialization, whether or not to have a CLA should definitely be a conversation, and again, something that the tech transfer office can help facilitate. So with that, I'm going to turn things over to Andrew to talk more about IP and business models.

Unknown Speaker  15:47  
Great.

Speaker 1  15:50  
Thank you. And think to put this in context the way I think about a lot of what Megan is doing is getting at the problem that access to code is necessary for doing good science and computational research. And so having a function that does the things that Megan described just enables sort of the fundamental mission of Universities, which is, you know, doing good research and and and sharing the output of that a subset of that research that happens on the university campus might be commercializable. And I think more broadly, we should think about, you know, there's, you know, a real problem in open source in general. But you know, also, you know, for for technology transfer offices is we're all really trying to solve the problem of making the research and the software that's created sustainable, reliably usable by the outside world for whatever it's doing. And so one way that Megan's getting at that problem is helping building better open source and helping people build communities around that open source that are self sustaining, whether through volunteer contributions, but also through, you know, seeking foundational support or other types of sponsorship or or collaborations with industry. The tech transfer office gets at that problem in a related way, which is, you know, we're helping people think through business models and come up with, you know, a commercially, you know, viable pathway to sustain that software and and both are really getting at the same goal. So in order to do that, though, I think there's, there's some details about how intellectual property works within software and open source that that need to be part of your analysis when you're in a tech transfer office. And then, as I said before, I think thinking through what the model for that that software can be earlier on, or at least not, you know, not out of context of thinking about the IP, I think is important, you know, when trying to decide what you want to do with some technology that's disclosed to your Office. So let's get into the IP side of things. Next slide, please. Megan, thank you. So you know almost any any computational research that comes across your office will have an open source component, and oftentimes, like many and in deep reliance on existing open source projects. And all these open source projects come with their own licenses. You know, there's there are terms to using them. Some of them have terms that, you know, make it very easy. These are generally the permissive licenses that Megan referenced. And then some of them have more onerous terms that limit or or put requirements on how you can use that code for research purposes, they're usually all fine, you know, just just doing the research and being able to see what the software can do usually doesn't require a whole lot more thought. But if you intend to move that project out of the university and software parlance, a distribution a lot of the obligation of these, like licenses, get triggered, and then you have to worry about, you know, what the implications of that are. So when you're thinking about how you want to license your project, or some of the licenses within the project that you you know you want to make subject to a like a normal tech transfer, proprietary software license, you have to kind of understand these implications. You may not need to do a super deep dive, certainly nothing compared to like what industry does. But I think you want to know that there's, you know, something is licensable or not, before you go about spending time and energy trying to find a commercial partner. So that's why I say, you know, we need to have some understanding of the project dependencies and the overall project license and the impact of those licenses on what you want to do, before you start spending time trying to, you know, trying to find licensees, or do a startup, or whatever, whatever the circumstances dictate. So next slide please. So a little bit more. Uh, detail on the two general categories of licenses that Megan described. I think this is just some of this. Is this fundamental information need to have in your back pocket when you're thinking about this. So I think, for our purposes, there's, you know, there's two broad categories. There's reciprocal or copyleft licenses are sometimes called, and then there's permissive licenses. And generally speaking, those names imply what the obligations are. So reciprocal licenses mean that there's some reciprocal obligation on you for getting to use that code, and that those obligations differ depending on what license you use. And then permissive licenses mean, you know, there's still some obligations, but they're usually pretty easy to comply with, and so they don't they don't cause much concern. When you're thinking about the software license, one of the main obligations that tech transfer offices worry about is, is how these licenses deal with patents. So a lot of you know tech transfer offices are founded on filing patents and licensing them to the outside world. And so when you start doing that with something that is also granting a license to that patent, you know you can undermine the value of that license, or certainly affect the perceived value of that license, because you know the code that someone wants to use under that license is is free for anyone else to use, regardless of your patent. So the top boxes there talk about our examples of licenses. The first is called the GPL family of licenses is broadly like what we call reciprocal and then there's a subset of older versions of them, the Lesser General Public License, or LGPL, and the standard GPO license, both of which mentioned patents in their in the introduction or are outside the license language discussion. But the actual licenses Don't, don't expressly grant patent right to the recipient. Similarly, the MIT license and the VSD license are generally considered some most permissive licenses, and don't, don't talk about patents at all, whereas the latter category, the newer generation of GPL family of licenses, and then one of the most popular licenses, the Apache foundation licenses, they have Express patent grants. And so you want to sort of broad level understand these things because, again, you're licensing software that you're relying on patents to drive the IP value in that but the code that you're licensing is also granting anyone else the right to practice that code without a separate patent license that can really affect the value of the patent license you're trying to pursue. Quick note that even the licenses that don't express to grant patent rights, there's a largely academic argument about whether a patent will be enforceable by someone using using the code under that license. Basic idea is, if you're giving someone software to use, and you're not telling them you have a patent, you're you're you're forgoing your ability to then later enforce that patent against them. I don't want to get into any further detail about that. None of these things have been definitively litigated, which makes open source very difficult area to say anything for certain about. But that's the general thinking in the field about how these licenses operate. So next slide. Megan, one more. Yeah. So we talked about, like, what is in your code? This is, I took this diagram from Wikipedia. It's as good as any other that I've seen. It basically talks about how all the licenses for the dependencies can relate to each other and which licenses are compatible and which are not. This might look a little intimidating as far as the number of different licenses out there, largely, you want to focus on the things all the way to the left, MIT, BSD, Apache, and then all the way to the light right, the GPL family of licenses, the lgpls and Mozilla, MPL licenses. You don't see that as often, but the basic gist of this is, if something's available to you under a certain license, any further license that you offer that that same thing under needs to not contradict the license that you got. It, you know, you can't change the terms of the thing that you got. So the way this diagram works is, as you go to the right, you know, anything you start with in the left, you can go to, you know, go to any further license that adding further restrictions to the right, and you're going to be compatible left to right, but you can't go right to left. So a lot of, for example, statistical software is available under GPL family of licenses and a lot of GPL version two licenses, it makes it very difficult to re license that under any of their open source or proprietary license, because GPL is one of the family of reciprocal licenses that that say, you know, if I'm using this, this license, you know, this software under this license, any further modifications or new version or redistribution I have. Out of that same software, I have to include the same terms of the GPL license, and so you can't go to some more permissive or easier licenses on the left.

Speaker 1  25:10  
Big picture, I think you know, if you're a corporate software company, you're doing a deep dive on this all the time. You're paying for code based analytics tools like black duck or Fauci or any of the other types of things out there, and you're having a good understanding that all the licenses that make up your code are compatible and are going to work for what you want to do. I think for us, we just want to look out for stuff that's problematic, because we're not. Universities aren't selling software. We're largely selling research that has code attached to it, and so we just want to make sure that some important parts of it aren't going to be unavailable for commercialization. So you're looking for largely GPL problems with your code, or anything where you're purporting to be able to use it over something on the left, when major components of it are listing licenses on the right. So next slide, please. Megan, so that's kind of backward looking. This is another nice diagram that is taken from Wikipedia that describes kind of how, when you combine a project, the licenses interact, and what flexibility that does or does or does not give you going forward. So you know when, when? When faculty take existing open source software and they do something new with it. So they'll take a machine learning model that's under an open source license and they'll build, they'll, they'll adapt it to a particular use case and build some some implementation that makes it useful for, I don't know, like diagnosing cancer or what have you, they'll be adding new code, and they'll be combining new code with this existing open source code. And at the top is, you know, the new stuff you're doing is represented in orange, and then the open source code is represented in varying shades of green, and if you look at the bottom, you know they'll talk about once you combine that work, what are you doing? Are you creating what's considered a new work under copyright law that triggers certain obligations under the reciprocal licensing regimes? Or are you merely combining those works in a way that doesn't do so, I would say, for the GPL family of stuff there, it's very hard to determine when a work is a derivative work versus when it's combined in a sense that doesn't form a derivative work. Some people will tell you they know the answer to that. None of that has been definitively tested. And in general, I think the default is you're going to assume that the reciprocal obligations are going to attach. If there's any copyright software experts in the call who have more, stronger opinions about that, I'm interested to hear them. But I think that's largely how universities, you know, have a capacity to think about it. So we're looking more at the stuff on the right, as far as giving you freedom to license it as you see fit. So you'll see on the far right, when you have permissive components, whether you're creating a new work, a derivative work, or you're just combining works in different ways that they be considered separate works, you still have a lot of flexibility on how you license that and how you can think about different ways of distributing the software that you create. So I wanted to touch on that. So next slide, please. Megan, regardless of the open source license, I think it's important to think about what patents do versus what code does, as far as how much you're going to worry about the implications of all this. So, you know, as a university, we file lots of patents. We file lots of patents on software. We also have a lot of open source software. When this open source software that you're using comes with a patent grant, whether express or implied any any licensee of those patents in that code are also getting permission, you know, getting permission to practice those patent rights. So when you go to proprietary license something to a third party and tell them, Well, you have the exclusive right to practice this patent if practicing that patent is covered by what the code does, or if you know the valuable stuff that that code does is, is what you know what the patent is focused on. You're not really giving them a lot of exclusive rights. However, there's this whole field to the left of what a lot of patents do versus what code does. And I think this happens a lot more than we realize. You know, when you write a patent claim, it covers a lot of different things, sometimes, or it recovers implementations or certain methods or applications in the real world that are broader than what the code actually does. The code a lot of times for stuff I see is a, you know, machine learning model that claims to give some indication of of whether, you know, a material has a crack. In it, or if you know if a person has a cancer cell in them, but it doesn't tell you what to do with that. It doesn't report that out. It doesn't make a diagnosis, or it doesn't compare that to other work, or give some sort of output. And a lot of those other activities are covered in a patent claim, but that the software doesn't do, and so that stuff outside of what the software actually does can still be valuable patent rights. And so when you, I guess, to wrap that side of it up, like a lot of times, you'll get a package of technology, and there'll be some stuff will be really like, patentable subject matter. You'll think there's a valuable patent there. And then other stuff will be, like, minor implementations, or maybe some scoring method or or, you know, separate components of the software that makes up the technology that the faculty want to open source license because they're useful, but they're not really driving the product. And I think simply because there are components or parts of it that you know they want to open source doesn't mean the patents that you could file on it aren't also valuable, but you have to understand what the software does and what the different components does, so it raises the level of investigation and consideration when you get a disclosure in for this type of project. So there's a lot more I could talk to about this, but I just wanted to highlight kind of how patents and open source can work together and logically, sort of what happens when you have both patent rights and an open and copyrights in an open source code at the same time. So next slide, please. So I wanted to say one thing about this, and I'm going to quickly talk about some business models. So the other thing that's difficult on universities is because we don't, oftentimes have a lot of control over what is done with code and where it sits and who uses it. When you license or purport to exclusively license a project you are, you know, promising that outside party that they're the only one who gets to use this. And there are, you know, there are circumstances where faculty may not be aware of that or may not understand that. And some technology package that you exclusively give to some outside company could include code that is valuable and useful to them that the faculty later, whether because of their grant obligates them to, which is a lot of cases, or their publication obligates them to, they then share that code under an open source license, which could be giving, could be removing the exclusivity for some aspects of what you promised exclusively, you know, or faculty, just over time forget, or you know, don't, don't understand, and as a matter of course, are publishing the research open source, so exclusive licenses and and when you're exclusive, proprietary licensing, when you have software involved, that's an important part of what you're doing, they require, like, a degree of management of, or at least understanding by all the parties and the faculty in the lab that's building it, about what the ongoing obligations are of that product. And I think, as a result, what, what, you know, I see a lot is, is really not a willingness to license copyrights exclusively. You know, the probably the one you know, the case where it might make the most sense is where you have, like, a fact, a startup that's tightly tied to the faculty, and they have a real, under good understanding about why open source this code is or is not beneficial to them, and the interests are aligned with that faculty, and they're managing that project closely. And you know that if you exclusively give these copyrights to that startup, but the faculty are not going to take it and use it for something else later and publish it open source and and you can manage it that way, but, but outside of those narrow contexts, I think it's really hard for tech transfer offices to manage this kind of stuff, and so it's generally difficult to grant open to grant exclusive licenses and copyrights and code. All right, Megan, can we talk about business models real quick? I'm mindful of our time. So a couple just high level things, merely open sourcing your code doesn't doesn't create sustainable code regardless of your model. So if you, if your goal is to have impact with your code, slapping an open source license on it without more doesn't do a whole lot you either you need to have some model for how that's going to be sustained and maintained and provided, so that you get the impact that you're looking for.

Speaker 1  34:30  
Everything I just said about normal software licensing is important because all of our software that we're getting at this point has open source components. But if you have a project where, you know, they're proactively thinking about, you know, starting a business around an open source project and and want to create a commercial plan for open source software, you really need to think about the business model even more carefully, and think about some of this diligence more carefully. The, you know, the last. I would say, is that a way to sum this up, that you hear a lot, and the open source word is, world is your project, is not your product. So what you sell is your product, and what the open source aspects of what you sell is your project, and they can coexist, that they are different, and they need to be treated differently, and IP and business models and your contracting structure and all those sorts of things need to work together to maintain that distinction. Next slide Megan, so I'm going to go through real quickly some of the kind of known business models for for open source software, and then I'm going to turn it over to Fiona, who can talk about her actual company that pursues one or more of these models, and give you a little bit more of the background. The most general and sort of well, like established model, or historically established model, is you hear it called different things, but basically services and support and the way this works is the general problem with open source software is it's free to use. So when you are trying to sell something to someone that's free to use, you need to be selling them something else besides what they can already get for free. And that's something else can take lots of different forms, but generally that all resolves into what you define as your product. So the services and support model, largely is you are going to build and provide an open source project and support that, along with the rest of the community of people who are using it, and make that open source. But then you're also going to because whether you are the world's expert in it, or you have particular resources that make you more efficient, you are providing services and expertise and technical support to people that are relying on it a lot of times. That comes with normal business warranties or other types of promises that the industry likes to see when they're using a product. So you're professionalizing an open source product, and then you're charging for that. So the paradigm example that is Red Hat, you know, probably still one of the largest, is not the largest open source company, and this is largely what they do next slide, please. Megan, so related to that, in a way, is software as a service model, where you can still take an open source project and you're still largely like professionalizing delivery of it, but you're using technology to support that. So generally, you're hosting some software service that someone is logging into through the internet and using, or you're also maybe providing data storage or some type of compute for that data as well as the outputs. But you know, largely, you're providing access to computations that you're running and controlling. You're not giving software to someone else. And so there's not really a license. You're not, not really selling software. You're selling the benefits of software to someone else. And so again, like this is how you can find revenue from something that is otherwise freely available. Next slide, please, dual licensing is something that universities can do, probably more easily than other types of models. A lot of this is like the models we would advocate for our startups. This is something that can happen on campuses themselves. So you see a lot of a lot of computational research comes out of universities. Sometimes you can attach these reciprocal copyleft licenses, or you can attach universities often have non commercial, like academic use licenses that they'll use, which basically says you can use this, but you can't do commercial things with it. And if you want to do commercial things with you got to come back to us for a separate license. So in this case, you are selling software. So there is, like, a license to code, and it's done like a normal tech transfer license, but you're, you know, you're selling commercial rights, basically, to use code that is otherwise available for non commercial purposes for free. And there are, there are lots of companies that have pursued this as business models. And I imagine a lot of universities on this call have projects that have some reciprocal or non commercial license that is published, published and used, but are also gaining revenue from normal software licenses to commercial customers. Next slide, real quickly, there's, there's a broad category called Open core, which I think of as a mix of different types of things I've already talked about. Other folks are more serious about it this. This model is basically thinks of like you have an open source project. There's a component of it at its origin that is open source, that is useful and that is maintained by the company, but the company builds proprietary parts on top of it, and the value that they are getting back is they may distribute a project that has open source components, but the priority components are valuable enough that people are willing to pay them for access to that. And so there's different versions or different degrees to which you are having a closed source project versus open source project, but. But again, the idea is, you're you're building on proprietary parts that other people will pay for on top of what is otherwise freely available to folks. Next slide. Lastly, something that I think is intuitive, but I don't think people recognize enough that it's really a business model, which is, you sell normal, be a normal software company, sell closer software, or sell, you know, a SaaS product that does something valuable that people will pay for. But it gets better, or it is, it is optimized by an open source project, and you get revenue from one and you support the other. And the reason companies do this is oftentimes the open source project either gains customers for them, provides like an access point where you have to pay anything to use this open source project. It's doing something useful for you. And if you really want to use this reliably, we have this other, you know, proprietary thing that really amps it up or does something special or makes it work better, in some sense. You can think of Kubernetes by Google as doing this, where they basically made containerization of the cloud very efficient and useful, and it's been wildly successful, and it's totally free for everyone to use, and a large portion of those people are also paying for Google Cloud services. And there's a bunch of revenue that comes to Google as a benefit of that next slide, please. Again, I'm gonna skip over consortiums. I think it's really important, but it's kind of almost a presentation to itself. But in the interest of time, let's go one more slide. That's it. Okay, so that was a quick dive on IP and software licensing and business models, and now I turn it over to Fiona to talk about data stacks.

Speaker 3  41:46  
Thanks. Andrew Fiona Kaufman, I'm Deputy General Counsel of data stacks, and happy to be here today. Megan do my book The next slide. So what is data stacks? And what do we do at a high level, we say we provide data driven solutions to enable customers and our developers to create generative AI applications at scale. We offer both an on prem solution and a hosted cloud solution for organizations to build these applications. And we also provide support for both open source projects and a proprietary software and we are a missed contributor to open source projects on the code we we work on next slide, please. So how did data stack start? One of our founders initially found, found the Apache Cassandra project very fascinating, and at the beginning they felt that there was something there that they could really dig into. So initially, we started as a company providing support and training for Cassandra. The goal, and our goal continues to be to find pain points in open source software and then figure out what is something someone would pay for and add that to either our product or to the community. So we feel like we're in a really, really unique position where we can add value to open source projects by solving for pain points, but also create generate revenue, generating products for our company ourselves. And the main goal of that is, what we're driving is adoption. You know, we're we at our core, we offer database software, and proprietary database software is very expensive. So when a company can use an open source component to that, it becomes appealing because of the cost differential. So when we provide solutions to the pain points of using open source software, we're really appealing to a very larger market, and it allows customers to lower their costs for database software. Next slide, please. So what is our philosophy of data stacks around open source? Our main projects that we contribute to today are Apache Cassandra, Apache pulsar. We do have a couple open source projects that we particularly own as projects, and we continue to be very, very active in the communities. You know, as a as a company, our philosophy is we want to be a good open source citizen, our King component that we don't want to be adversarial, and we want to be able to give back to the community. So we really encourage our engineers and developers to publish tools, publish scripts and demos and other projects to the open source community project itself, and it's really part of our business strategy to donate the code and contribute features back to projects. So one of the key aspects for us, though, is when we work with open source projects, we end up learning how to increase and build our own products, but we have the flexibility to do that. We're able to test with customers. We're able to see what works and doesn't work. We're able to push that out, as say, our own project, project product, but then, over time, we're able to give them something back to the community, once it's tested and solid and ready to go back to the project. Next slide please. Yes. So how do we work with open source projects? We actually have very committed resources within our company that are committed to both Apache, Cassandra and Patrick, pulsar. You know, we publish testing papers. We have a lot of employees, our own employees, that work exclusively on the open source projects. And we also have a very strong advocacy program in the community. We do a lot of community development. In fact, today we're hosting an event for AI agents and development of a agent. So we're very committed to building foundational aspects from the developer community on up, and that's really important to us. Next slide. And then, how do we use open source within our project? As Andrew said, we at our core, our project products are based on open source, and we build around it, so the proprietary software is built on top of the open source, and our value add to that is we build the code, but we also test it and certify it, and we package it to customers so they're able to use the open source components along with our proprietary software components. We found that customers are very okay with using a mix of open source software, proprietary software, hosted software on prem software, so we are able to really cater to a specific customer's needs, whether they need an on prem solution, a cloud solution, and then also provide the support for both the open source and the proprietary software. One aspect that's really important to us, though is we want to try to maintain synchronization with the open source project. You know, we don't want want our proprietary piece of that to deviate too far from the core. I think we've seen other companies deviate too far, and they can't come back to the core of the open source software. So it's actually very pointed to us that we still maintain a lot of synchronization with the, you know, basic open source part of our products. Next slide, and then what do we do to keep our software, proprietary software, safe? And I think this all really comes down to having internal procedures and policies in place to ensure what is happening within the company is able to stay as proprietary software. You know, some examples are really having those strong policies around security and security coding, secure coding practices, secure configuration management, annual trainings, and obviously making sure at some level, legal is brought in to review any license or legal requirements on what our developers and engineers are doing. Next slide. And then lastly, I just thought I'd share, or put up here some customer stories on how, you know, large enterprise companies use our software. Um, these were all our DSA product. These are all on prem solutions. What that are deployed to these? These companies own infrastructure. Um, I'll just point out I actually really like the Home Depot story. Um, it launched during the pandemic, where they used our product, DSC, and were able to, within their own app, launch curbside pickup within 30 days, and ended up having 100% increase in their digital platform sales. So to me, that's just a real light example of what our software does. You know, for a non engineer, saying, Oh, we host, you know, database software, people don't know what it is, but you're like, oh, you know the Home Depot app, they were able to do this. It's very, very concrete. So some of the other ones up there, there's a lot more on our website. If anyone's interested, I would encourage you to go and check out the customer stories. And I think with that, we'll open up for questions, if there's any left.

Speaker 1  48:41  
Yeah, that was great. So whoever put the first question in the chat, I apologize. I think I killed it before I answered it. So if you want to repost it, I forgot what it was, we can try to answer

Unknown Speaker  48:59  
something happened there.

Speaker 2  49:04  
I think they're all still there. Andrew, I think if you just go to the answered questions, they'll still be there. So the first one was the offices that require disclosure have a special disclosure form. So yes. So most often I've seen web forms. Some others have, you know, a Word template that you can download, and you know, depending on what sort of information you're trying to get from the creators, and then new versions, right that that's going to be office to office, depending on how significant the changes are to an application, whether it requires an entirely new disclosure. So a new version, for example, that's just security update, or even just adding new features and functionality,

Speaker 1  49:47  
probably wouldn't. Yeah, yeah, I'd add to that. I know several universities have separate disclosure forms, or, you know, submission processes for software. Software versus other technology. I think it's a good idea. I think we might do that eventually, largely because I think you need additional or different information than you typically get under a normal disclosure form. To the question about exclusivity, language, reps and warranty, I think you certainly don't warranty anything more. You know, we are still when we license stuff or completely buyer beware, hands off like we don't guarantee anything in that, you know, we that it works, that we own it, that there's any IP or no IP in it. I think, I think that kind of gets to what we talked about as far as looking at dependencies and things like, I don't think you want to say anything about what is or isn't included in your licenses, but I do think you want to understand it, because any sophisticated licensee is going to have a look at it before you license and I think you would want, you want to encourage them to have a good understanding of what the state of the code is before they pay for it, because that only creates problems down the line. So you're doing a lot of that diligence, just to make sure there's no surprises when they look at it that are going to be game breakers. Researchers may not have a good understanding of the provenance of code components. Are there good practices or approaches? So Megan can probably talk a little bit more about this. GitHub has a built in, I think it's called dependency graph function that does a decent job of if someone has their code in a GitHub repository. It'll give you a good start at understanding what the dependencies are and what the licenses attached to those dependencies are. Megan, do you have anything to add to that?

Speaker 2  51:54  
Yeah, there's also a good tool or website, and I can put it in the chat called DEPs dot Dev. And so they go and scrape all public repositories, like on GitHub and GitLab and so on. And so you can go on there and find both a dependency listing, and then also all of you know, security updates and things like that for your code. One thing I will say, coming from the university context, as Andrew recently discovered, is that our packages do not have a standard way of mapping dependencies. So that's a little bit more of a manual process. And in that case, you know, we would just really strongly encourage your faculty are building our packages that they dynamically link out to other packages, rather than, you know, having them be part of their code and pulling them in, again, you know, just to sort of avoid getting into the reciprocal license situation. So with our packages, it's a little more complex, but, but again, depth dot Dev, DPS. Dot Dev is a great site where you can put in your repo and see a list of of dependencies and vulnerabilities.

Speaker 1  53:10  
Yes, second, if you want to quickly learn about dependencies, try to untangle someone trying to re license an our, our project under something outside of the GPL ecosystem is basically impossible. Aiden, I don't think I understand your question. Can you elaborate? Aiden asks, How are you thinking about commercially licensing, non commercial open access data? You mean just generally, like treating a data license as like a dual licensing context. So you've published data under a non commercial open act, you know, for open access purposes, but someone wants to commercialize it. What you're thinking, I'm gonna guess that's what you're saying. I mean, I think, you know, so long as all the other data, you know, concerns are met by institutions, so institutions are very careful about how data moves, and as long as you've done the diligence on all that, I think the terms of a data license will look different than a software license, and I think you're going to want, you know, stronger you know contractual basis for enforcing what the licensee can or can't do with that data. And you're of course not going to rely on any copyright interest that might exist in it, because that's fairly thin in most cases. But in general, like the concept, I agree with we I think I always just jump right to human subjects, data, which becomes is just a mess for us. It's just very difficult to do that safely and ethically and so. Well, I tend to, I tend to react poorly to thinking about data licensing, but I'm assuming you've overcome all those hurdles. I think that makes sense.

Speaker 1  55:17  
Okay. I think Do we have more questions? Oh, there you are, okay.

Speaker 1  55:34  
Oh, is that the I haven't used that site, Megan. It's interesting. I

Unknown Speaker  55:46  
Okay, I think we're done with questions. Donial, do we have anything else to do here? Or

Speaker 4  55:54  
surely so. Thank you. Andrew. Behalf of autumn, I would like to thank our panelists for the informative presentation today, and thank you again to our sponsor, Marshall Gerstein. You.

Transcribed by https://otter.ai