News update
  • If the US Nuclear Umbrella Collapses, Will it Trigger a Euro-Bomb?     |     
  • Israeli restrictions on UN bodies in Gaza highlighted at ICJ     |     
  • Lightning strikes kill 11 in six Bangladesh districts     |     
  • Prof Yunus back home after his week-long Doha, Rome visits     |     
  • 5 of a family burn injuried in Gazipur gas cylinder blast      |     

Open Genetic Databases Undermine Access and Benefit Sharing

Genetics 2023-03-07, 10:26pm

genetic-resources-5-33c10c598188b6bd55f26865225f0d3d1678206395.jpg

Genetic resource 5. Pic by Neil Palmer (CIAT). Creative Commons.



We are pleased to share with you a new Briefing Note on the nature, role and impact of databases and their current practices on the fair and equitable sharing of benefits arising from the utilisation of genetic resources. These issues are particularly important in the discussions around digital sequence information (DSI) on genetic resources, for which the Parties to the Convention on Biological Diversity (CBD) have explicitly agreed to share the benefits arising from its use.

Further issues for consideration will be taken up by the Ad Hoc Open-ended Working Group on Benefit-sharing from the Use of Digital Sequence Information on Genetic Resources, including the issue of “principles of data governance”. This Briefing Note attempts to show why discussing the same is important. The CBD Secretariat has also called for written inputs from Parties, other Governments, indigenous peoples and local communities and relevant organizations before 31st March 2023.

- Third World Network

“Open” Databases Undermine Access and Benefit Sharing

by Nithin Ramakrishnan and Chetali Rao

Introduction

Parties to the Convention on Biological Diversity (CBD) adopted, at their 15th meeting of the Conference of the Parties (COP 15) in December 2022, a decision on “digital sequence information on genetic resources”. Decision 15/9 calls for a solution to the problems and challenges of fair and equitable sharing of benefits arising from the utilization of digital sequence information (DSI) on genetic resources, without prejudice to the rights and obligations under the CBD and its Nagoya Protocol on Access and Benefit Sharing. It establishes a multilateral mechanism for such benefit sharing, including a global fund, but recognizes that exceptions to the solution may be identified during the course of further analysis.

In this regard, an Ad Hoc Open-ended Working Group on Benefit-sharing from the Use of Digital Sequence Information on Genetic Resources has been established to look into issues for further consideration, and to develop and operationalize the multilateral mechanism in line with certain parameters established in the decision. The issues for further consideration are listed in the Annex to Decision 15/9 (see Annex). The CBD Secretariat has also called for written inputs from Parties, other Governments, indigenous peoples and local communities and relevant organizations before 31st March 2023.

One of the issues for further consideration is “principles of data governance”. In the run up to COP 15, there had been very little discussion regarding the operations of current databases in the previous Working Group negotiations on DSI, and in the Informal Co-Chair’s Advisory Group on DSI on Genetic Resources (IAG). The policy options contained in the IAG report had also been developed and analysed without taking into account the role of databases in providing access to DSI and in sharing the benefits arising therefrom.

Decision 15/9, on the other hand, makes some references to databases.

First, it recognizes “further the value of depositing data in public databases”.

Second, it welcomes “the efforts of databases, including the International Nucleotide Sequence Database Collaboration, to encourage the tagging of records with information on geographical origin”.

Third, it acknowledges the “the FAIR and CARE principles, the framework for data governance provided by the Organisation for Economic Co-operation and Development “Recommendation on Enhancing Access to and Sharing of Data”, and the recommendations set out in the United Nations Educational, Scientific and Cultural Organization “Recommendation on Open Science”.

Fourth, it notes that “the differences between public and private databases should be considered in the development of a solution on benefit-sharing from the use of digital sequence information on genetic resources”.

Finally, in an operative paragraph, it“encourages the depositing of more digital sequence information on genetic resources, with appropriate information on geographical origin and other relevant metadata, in public databases”.

Nonetheless, the nature, role and impact of databases and their current practices on the fair and equitable sharing of benefits arising from the utilisation of genetic resources are still not well understood or clarified. This briefing note seeks to address some of the issues by examining the current practices of databases and their implications for fair and equitable benefit sharing.

Current practices of databases undermine access and benefit sharing

Access to genetic resources provided to both commercial and non-commercial researchers and users from different parts of the world is governed by domestic legislation which are generally in line with the provisions of the CBD and the Nagoya Protocol. Researchers and users who access these genetic resources are obliged to fairly and equitably share the benefits they generate by utilising the genetic resources, with the country of origin of the genetic resources, usually through a benefit sharing agreement signed with them at the time of access.

The same is applicable when specific access and benefit sharing (ABS) mechanisms are developed by international organisations such as the World Health Organization (WHO).  For example, pharmaceutical companies that develop pandemic influenza vaccines are required to share a specified quantity of doses of vaccines with the WHO, which facilitated access to virus strains for these companies, based on standard material transfer agreements. This is governed by the Pandemic Influenza Preparedness (PIP) Framework adopted by the World Health Assembly on 24th May 2011.

Although several Parties to the CBD and the Nagoya Protocol have domestic legislation governing access to genetic resources and its physical transfer from one country to another, several of them do not currently address the use of digital technology and bioinformatics explicitly. Such technologies allow researchers, academia and industries to utilise genetic resources simply by using the genetic sequence information digitally, in other words, DSI. Developments in the field of molecular biology, fuelled by high throughput sequence screening approaches, have led to a proliferation of a large number of databases which are being continuously accessed for free.

Due to the growing scientific advancements and developments in the field, cross border physical transfer of genetic resources is no longer required in many cases. Consequently, scientific and academic researchers and their commercial counterparts escape requirements to sign benefit sharing agreements, which they would otherwise undertake during the physical transfer of genetic resources.

Much of this free information also lands in the hands of ventures with commercial interests, which then profit from these shared resources. The lack of any tracking mechanism allows them to eschew fairly and equitably sharing any benefits derived from the use of such data. Similar concerns were noted by the PIP Framework Advisory Group in 2018. Its meeting report explicitly records that the Advisory Group finds this circumventing of contractual agreements a concern.

Databases and their workings

There are different kinds of databases, such as primary, secondary and collaborative databases. Primary databases are mostly free, for example the Global Initiative on Sharing Avian Influenza Data (GISAID). Then, there are secondary databases which curate the information mostly received from the free databases and provide a little more value-added content and services. Many of them obtain monetary benefits from such services. For example, databases like PROSITE, a protein database curated by the Swiss Insittute of Bioinformatics, charge their commercial users a subscription fee. There are also database collaborations such as the International Nucleotide Sequence Database Collaboration (INSDC), which combine resources from various other databases. In addition, there are “in-house” databases hosted by various biotechnology industry players.

A major portion of genetic resources or knowledge associated with them, including DSI, is accessed through three primary database systems. These databases collate information based on the data types, hereogenity and scope. They collaborate with each other and together form the INSDC, i.e., (1) DNA Databanks of Japan (DDBJ), based at the National Institute of Genetics, Japan; (2) GenBank based at the National Center for Biotechnology Information (NCBI), United States; and (3) European Nucleotide Archive based at the European Bioinformatics Institute, European Molecular Biology Laboratory (EMBL-EBI), United Kingdom. (EMBL-EBI is an intergovernmental organization with 21 Member States).

These primary databases mostly store raw sequences and structural information provided to them through scientific research, and the data are mostly archival in nature. There is a continuous exchange of DSI among the three databases. The datasets contained in all the three databases are identical but may differ in terms of the user platforms or in the analysis they provide, which users choose based on the scientific analyses to be carried out.

These primary databases, sponsored by developed country governments, for-profit corporate organisations, and philanthropic organisations, share with their users, DSI from different countries. However, this is done without cross checking in the first place whether the DSI is uploaded in the database by persons or entities with authority to give or receive data under their respective national laws. For instance, the databases make no effort to ensure whether the DSI is sourced from genetic material which is accessed in accordance with rules and regulations governing such materials, including the CBD and the Nagoya Protocol. They do not require users to sign user agreements. Some of them do not even require user registration, i.e. the data is freely available without any registration, login or user fee. They do have general terms and conditions for the use of their websites but they do not conduct any due diligence to ensure that their platforms do not become instruments of illegality that potentially violate national biodiversity laws and regulations.

The bulk of information stored with these databases is thus widely disseminated on the pretext of open public access for scientific advancements, allowing scientific users, as well as their industrial counterparts and commercial entitites, to freely access the information and exploit the gaps in national laws and lack of data governance rules. The “anonymity” and sometimes “confidentiality” provided to users allow data downloading, transfer and use of the dataset, and processing by several kinds of users, including those who may be infringing their own national laws, knowingly or unknowingly. It must be noted that when the WHO PIP Advisory Group’s Technical Working Group on the sharing of influenza genetic sequence data conducted a survey amongst these databases, GISAID EpiFlu Database and OpenFluDB admitted they have unidentified users. In fact, GISAID specifically pointed to “unidentified not permitted” users.

Secondary databases, which are essentially commercial databases, process and produce curated information. They largely collect their input data from primary databases. These databases employ complex algorithms and computational and manual analyses to derive information and provide them in a curated format. Access to such databases is not “free of cost” but comes with monetary considerations in the form of user fees or licenses. They may also be in the form of specialised databases hosting specialised features like model organism-specific information, among others.

Tentative annual subscription rates for single academic users of some of these commercial databases are recorded below (data does not reflect the current pricing):

Databases                    Tentative Cost

Strand NGS $4500

CLC Genomics Workbench  (Qiagen) $5500

Full Lasergene Suite  (DNASTAR ) $5950

Sequencher $2500

NextGENe $4049

In addition, there are also in-house databases which are owned privately by companies. They use the information available in public databases like INSDC and integrate these into their in-house databases. Such in-house databases download bulk data from public databases at regular intervals and curate them according to their organizational needs. All such private in-house databases are only accessible to their own personnel or affiliates with whom they apply specifc terms and conditions for processing of the data.

Primary databases, like GISAID, and collaborations, like INSDC, claim a high moral ground by highlighting their contribution to open access to knowledge. They also excuse themselves from all responsibilities by arguing that they are only depositories and are not utilising the genetic resources themselves. They contend that they do not violate any rights and obligations since they declare in their terms and conditions that there may be third parties who hold rights over the data such as patent rights, copyright, and other intellectual property rights, biodiversity-related access and benefit-sharing rights, etc.

Nevertheless, these databases ingeniously obscure the fact that by retrieving the genetic sequence data or information from the genetic materials, they have in fact already accessed the DSI. This accessing or sharing of DSI is largely done without the authorization of the respective authorities of the countries of orgin of the source material. Scholars note that “obtaining digitalized information about genetic resources and their genetic or biochemical composition” is one of the practical ways of accessing genetic resources.

No commitment on, or conditions for, benefit sharing

It is clear that these databases are, in effect, accessing genetic resources without complying with the conditions of access as regulated by national authorities. The access they make and provide to genetic information may also be facilitating illegal exploitation of the genetic resources, especially from developing countries, who have limited abilities to monitor the uploading of data into the databases.

For example, several secondary databases tap into the free databases and then sell the data after some improvements and curation. It is not clear whether they share the monetary gains with the actual providers and their countries even though this is arguably within the scope of “commercial and other utilisation of the genetic resources” under Article 15(7) of the CBD and Article 5 of the Nagoya Protocol. Nevertheless, since there is no agreement with databases for sharing the benefits, returns seldom accrue to provider countries. According to officials from the Brazilian Department of Genetic Heritage, who spoke at a side event on ABS initiatives at COP 15 in Montreal in December 2022, no database currently shares any monetary benefits with the department.

Even in this scenario, INSDC, currently supported by the governments of the United States, Japan and 21 Member States of Europe, asserts:

“The INSD has a uniform policy of free and unrestricted access to all of the data records their databases contain. Scientists worldwide can access these records to plan experiments or publish any analysis or critique. Appropriate credit is given by citing the original submission, following the practices of scientists utilising published scientific literature.

The INSD will not attach statements to records that restrict access to the data, limit the use of the information in these records, or prohibit certain types of publications based on these records. Specifically, no use restrictions or licensing requirements will be included in any sequence data records, and no restrictions or licensing fees will be placed on the redistribution or use of the database by any party.”

Similarly, GISAID’s Data Access Agreement highlights to the user that he/she:

“grant (i) GISAID and (ii) all users and Data providers that have agreed to be bound by the GISAID EpiFlu™ Database Access Agreement and that continue to abide by its terms (collectively “Authorized Users“) a non-exclusive, worldwide, royalty-free, and irrevocable licence to collect, store, reproduce, access, modify, display, distribute, coordinate, arrange, and otherwise use the Data submitted by You (user) as contemplated by this Agreement.”

This means the Parties to the CBD such as Japan and E.U. Member States, which fund these databases, are acting inconsistently with their benefit sharing obligations under the CBD and its Nagoya Protocol.

“Open” or “illusively open”?

Furthermore, the so-called open access to DSI through the existing databases is not really “open access”, since these databases use their terms and conditions to carve out a unilateral right to suspend access to any user for any reason whatsoever, without even disclosing it to the user. More significantly, most of these databases are hosted in developed countries, which often exercise high sovereignty over their digital space. The monopolistic and cartelistic nature of the rights exercised by the developed countries may have long term effects on data access. For example, a political fallout between countries may result in complete cessation of data access.

INSDC’s website is maintained by EMBL-EBI and the terms of use read as follows:

“EMBL-EBI will make all reasonable effort to maintain continuity of these online services and provide adequate warning of any changes or discontinuities. However, EMBL-EBI accepts no responsibility for the consequences of any temporary or permanent discontinuity in service.”

Similarly, GISAID also reserves its rights as follows:

“Without limitation of any other term or condition of this Agreement, You (users) acknowledge and agree that GISAID may, subject to any applicable laws, suspend access to all or any part of the GISAID EpiFlu™ Database and/or Data without any prior notice or liability to You (user).”

GISAID is a database hosted by the Federal Republic of Germany with technical facilities provided by its Federal Office for Food and Agriculture through a public-private partnership with Freunde von GISAID e. V. (GISAID), a registered non-for-profit association. Access and transparency concerns have been raised with regard to GISAID.

Therefore, it is clear that neither “open access” nor fair and equitable sharing of benefits arising from the use of DSI is guaranteed by the current practices of databases. Instead, these databases undertake a large-scale collection of genetic information and resources, maintain control over its storage and distribution, with an obvious right to retain perpetual access to the data submitted to them. Infact, EMBL-EBI explicitly retains a right to permanently store data submitted to it. Its terms of use state that “when you (a user) contribute scientific data to a database through our website or other submission tools this information will be released at a time and in a manner consistent with the scientific data and we may store it permanently.”

Inability to exercise control over downloadable data?

Due to growing industry and academic partnerships, the line distinguishing scientific or academic research from commercial research has become obfuscated. The fluidity of information exchange from the publicly available databases makes it impossible to determine the nature of a final product or research use the information will result in. As an example, a sequence which is accessed purely for academic research purposes may eventually pass through multiple users and ultimately end up being used by multiple commercial ventures.

There is a very convenient and popular excuse that the databases and scholars preferring unconditional access to DSI make, i.e., the inability of technology to exercise control over downloadable data and its uses. While this may be true to an extent, these databases currently utilise outdated data technology models which allow data downloads by all users without conditions on data destruction or on commercial and non-commercial benefits derived from such shared data.

In reality, science and technology are far more advanced than this in the field of data access or sharing. It is even possible that data access can be executed without options to download. Interestingly, the industrial and commercial proponents of open access and open science, when they share data amongst themselves, implement better terms and conditions for avoiding misappropriation of data, protection of confidentiality, subsequent utilisation of data other than for the purposes mentioned in their agreements, and the sharing of benefits etc.

Nothing prevents or makes it difficult for databases to enter into agreements with data uploaders at the time of uploading the data, and also to enter into agreements with data downloaders before providing the download options. Thie would ensure minimum legal certainty for fair and equitable sharing of benefits arising from the utilisation of data.

The Technical Working Group on the Sharing of Influenza Genetic Sequence Data has submitted recommendations/options underlining optimal characteristics of an influenza genetic sequence data sharing system under the PIP Framework. It calls for databases to provide “users with a data access and use agreement that contains a statement that outlines the expectations of the PIP Framework, and requires users to agree to the terms of the data access and use agreement.” This means the technical possibilities of data governance have already been studied and accepted by the experts serving important institutions like the WHO, and similar efforts are necessary in the CBD context.

Neglect of data governance

The IAG, which was requested to look at the issue of data governance, delegated its work to the CBD Secretariat. The Secretariat undertook a study on the principles of data governance and attached a note to the report of the IAG Co-leads as Annex II. The terms of reference for the Secretariat’s work were not clear, however, and going by the explanation in the Annex, the Secretariat was asked to study the principles of data governance. However, the annexed document is titled “data management”.

According to leading authors and practitioners in data management and data governance, the concepts are different. “Data management” is about doing the right thing with the data and “data governance” is about ensuring the right thing is done therewith. In other words, management is about managing the data in order to achieve certain goals and governance is for ensuring management is done properly.

In fact, the Secretariat’s note primarily deals with two sets of principles, i.e. FAIR and CARE. The former is identified as data management principles while the latter is identified as data governance principles but solely from indigenous peoples’ perspective. FAIR stands for characteristics such as “Findable”, “Accessible”, “Interoperable” and “Reusable”. CARE stands for “Collective benefit”, “Authority to control”, “Responsibility”, and “Ethics” characteristics in the context of indigenous peoples’ data governance. The former, developed by a consortium of authors, merely asks the users and other stakeholders involved in data exchange to ensure four qualities, i.e. findable, accessible, interoperable (between various digital platforms), and reusable. They do not address data governance principles.

CARE principles, on the other hand, deal with questions on who controls data, who benefits from it, and who is responsible, but are limited to the extent that they impact indigenous peoples or communities. These are not intended to serve the larger public purpose, but are a good basis for further discussions on the issue of data governance.

The Secretariat’s note also failed to take into consideration other data protection, usage or processing rules from a public perspective or from the perspective of data sovereignty of States.

The IAG report has thus created a grey area when it comes to data governance – an inevitable capacity which each Party must have if they want to retain sovereignty over their data and also to obtain value or benefit from their data. As such, the discussions on DSI preceeding and at COP 15 did not make serious attempts to discuss data governance relating to DSI sharing and storage. Yet, meaningful solutions on DSI cannot be generated if principles of data governance are not seriously taken into account. COP 15 however did not have time to enter into in-depth negotiations on this front. The focus then was to unlock the political deadlock on sharing of benefits arising from the utilization of DSI – which has been successful.

Now is the time therefore for the Ad Hoc Open-ended Working Group on DSI to seriously take into account the issues relating to digital sovereignty of Parties, as well as the digital rights of peoples as it develops and operationalizes the solution to DSI. Equally, it would be important to conceptualize what it really means to be identified as a public database.