How to Perform Bulk Sequence-Based FTO Searches in a Cost-Effective Manner

Biological sequences (nucleotides and amino acids) are included in patents filed in technologies related to new drug targets, recombinant proteins, and gene therapy, etc. Such patents are focused on molecules expressing biological sequences which are included as sequence listings while filing a patent.

For determining the scope of protection for sequences or molecules derived from specific sequences, computational search and analysis is required. There are different sequence alignment algorithms, which can be accessed through some open and paid databases. Like a prior art search for a product is performed to identify relevant documents disclosing the matter of interest, sequence searches can determine whether a sequence-based product can be used for clearance or not.

When is Bulk Sequence Search Needed?

During drug discovery and development, multiple recombinant cells expressing the target genes are studied for screening and characterization. Developing a sequence-based method/product involves access to sequences for general use, modifications, or comparative analysis. This is particularly important when similar sequences varying at specific length/base pairs are involved. There can be a large set of variable and recombinant sequences that are part of the analysis. This is further required during characterization of new sequences or markers.

Further, mining of microorganisms (e.g. bacteria related to 16s rRNA sequences) in waste treatment using information of microbial communities is another area requiring sequence analysis. Recent shift in culture-independent metagenome sequencing has driven the research forward.

To validate more than one drug targets or pursue variable microbial strains, multiple sequences can be searched for FTO clearance. For instance, insert and full production sequences for a plurality of protein targets can be considered in the search. Furthermore, to verify if the sequences of interest are owned by someone or are freely available to work on, searches targeting a plurality of sequences related to a gene/protein need to be performed.

How is Bulk Sequence Search Feasible?

Sequence similarity programs work on identifying sequences that share certain level of homology with the subject sequence within a stretch of specific base pair length. The similarity search tools such as BLAST, PSI-BLAST, megaBLAST are performed to estimate the likelihood of similar protein structures between similar sequences.

Since sequence match is established on statistical estimates between the aligned sequences, paid databases usually provide a more sensitive analysis, considering sequence size optimization, noise cancellation and database coverage. The accuracy of such searches depends on the algorithm specific to a database (a paid database may rank higher than an open access database).

In case of bulk searches, a feasible approach is to perform searches in phases wherein sequences identified in patents need not be re-run on a paid database. The approach is beneficial when there is a high probability that the sequences of interest are not covered under the scope of claims in any patent. Based on this approach, Sagacious IP provides bulk sequence search services that are divided into multiple phases. Sequences identified as “flagged” or claimed in a patent are directly analysed with respect to the technical concept, thus bypassing the requirement of running additional sequence searches.

The methodology for search phases is illustrated below:

Phase 1 Performing quick searches on Open Access DatabaseThe following steps define this phase of the project. Understanding the technology through client input.   Performing BLAST/megaBLAST searches to identify any relevant prior art disclosing the sequences [Includes date, jurisdiction restriction in view of requirements].Searches to identify sequences with similarity and coverage. E.g. 90% and above length and homology similarity.   Validating whether the sequences are claimed [FTO perspective].   Flagging the sequences with similar sequences identified as Red and others as Green.
Phase 2 Validating search on STN databaseThe following steps define this phase of the project. In case any relevant document for the sequences are not identified in Phase I, the search proceeds to Phase II, i.e., conducting more sensitive searches on STN Express – DGENE (paid Database). [Includes date, jurisdiction restriction in view of requirements].  Flagging the sequences with similar sequences identified as Red and others as Green.   Determining documents disclosing flagged sequences.
­­Phase 3 [Optional] Detailed Keyword-based analysisThe following steps define this phase of the project. In case a single reference is identified that discloses sequences that match with sequences of interest, the reference is further analysed with keywords-based searches.  Separating false positives (sequences not relevant to the technology).   Determining the most relevant documents from the keyword analysis.

Keyword-based analysis is significant since the sequences of interest might match with those sequences which are not related to the gene/protein of interest. This situation arises when the matched sequences have a likelihood of similar protein structure or intermediate sequence. This situation is more prevalent in sequences of smaller size. Notably, keyword-based analysis is used to correlate the identified sequences with other technical parameters such as recombinant cells, targeted diseases, and modified residues.

Case Study

A leading US-based research institute working in identification of unmapped proteins in drug development approached Sagacious IP for some bulk FTO searches for certain sequences of interest. The client was running studies between known drugs and drug-targeted proteins to prepare markers.

For this purpose, they were interested in determining protein sequences with identifiable markers and relevant applications. They provided sequences of interest and further information related to conserved residues, cellular activity, and linked diseases.

Sagacious IP prepared a three-phase methodology starting with quick searches on open access databases. Flagged sequences were identified in active patent documents. Once the relevance was established using keywords related to the supporting information about the markers/proteins, the in-house team shared the sequence status with the client. Only those sequences which were not identified in the open access database were run on paid database for confirmation. While the sequence searches were being run and documents were being analysed, the client received simultaneous updates so that they could proceed with cleared sequences. This methodology helped the client save time and costs while also enabling them to efficiently screen relevant targets from the rest to pursue the next steps in R&D.

Final Thoughts

R&D at molecular level requires characterization of a plurality of targets, which increases the number of sequences required to be identified for similarity and clearance. FTO of bulk sequences gets challenging due to varying algorithms and coverage of sequence search databases. A phase-wise search approach is extremely beneficial in marking limited “flagged” sequences for further analysis while saving cost and time by screening out cleared sequences. This approach has been acknowledged to be useful by many innovators working on multiple molecular targets in a short span of time. Sagacious IP’s Freedom to Operate (FTO) Search service intends to help you identify potential barriers, guide decisions related to product design and reveal designs around technologies. Click here to know more about our service.

–  Devika Saini (Life Sciences) and the Editorial Team

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

Exclusive Webinar Series
Exclusive Webinar Series. Cost: $0 (Free) Limited Seats Available. Don't miss the opportunity, Register Now