Patent Search with AI

Walid Amamou
11 min readAug 14, 2024

--

Patent Abstracts Word Cloud

Introduction

At the heart of the patent submission process lies the critical task of prior art searches — a meticulous examination of existing knowledge to determine the novelty and non-obviousness of an invention. However, as the volume of global information continues to expand at an unprecedented rate, the challenge of conducting comprehensive and efficient prior art searches has become increasingly daunting.

The patent process is a complex journey that begins with an inventor’s novel idea and culminates in the granting of exclusive rights to that invention. One of the most crucial steps in this journey is the prior art search. This involves a thorough investigation of all publicly available information relevant to the invention’s claims. Prior art can include previously published patents, scientific literature, public disclosures, and any other form of publicly accessible information that predates the filing of the patent application.

The sheer volume of information available today due to the exponential growth of technological advancements, presents a significant challenge for patent examiners and inventors alike. There are about 388 thousand patents published each year and the trend is increasing every year. With patent examiners spending 40% of their total time on manual prior art searches which involves manual sifting through vast databases, there is a growing risk of overlooking crucial pieces of prior art, which can lead to the granting of invalid patents or the rejection of truly novel inventions.

Throughout this paper, we will show the implications of AI in patent searches and provide a step-by-step tutorial on how to search large amounts of patents in a fraction of time. We will emphasize the precision, speed, and versatility that AI-powered solutions bring to the table, demonstrating how these technologies can significantly impact the work of patent applicants.

Understanding Prior Art Searches

Prior art is the bedrock upon which the patent system is built. It encompasses all information that has been made available to the public in any form before a given date that might be relevant to a patent’s claims of originality. The concept of prior art is crucial in determining whether an invention is novel and non-obvious — two key requirements for patentability.

The significance of prior art in patent processes cannot be overstated. It serves several critical functions:

1. Determining Novelty: By comparing a proposed invention against existing prior art, patent examiners can assess whether the invention is truly new. If any prior art discloses the same invention, the patent application will likely be rejected on grounds of lack of novelty.

2. Assessing Non-obviousness: Prior art also helps in determining whether an invention would have been obvious to a person having ordinary skill in the art (PHOSITA). If the invention is a mere combination of known elements with predictable results, it may be deemed obvious and thus unpatentable.

3. Defining the Scope of Patent Claims: Prior art helps in delineating the boundaries of what can be claimed in a patent. Inventors must carefully craft their claims to avoid encompassing existing prior art while still protecting the core of their invention.

4. Preventing Patent Infringement: A thorough understanding of prior art can help inventors and companies avoid infringing on existing patents, potentially saving them from costly legal battles in the future.

Various Challenges in Conducting Prior Art Searches

Despite its importance, conducting comprehensive prior art searches is fraught with challenges:

  1. Volume of Information: The sheer amount of information that needs to be sifted through is staggering. This includes not just patents and patent applications from around the world, but also scientific publications, technical disclosures, product manuals, and even public demonstrations or sales.
  2. Open Access Movement: The trend towards open access publishing in academia means that more research is freely available online, contributing to the growth of searchable prior art.
  3. Technological Complexity: As technology becomes more sophisticated and interdisciplinary, it becomes increasingly difficult for any single individual to have comprehensive knowledge across all relevant fields.
  4. Artificial Intelligence and Machine Learning: Ironically, while AI can help in searching prior art, it’s also contributing to the creation of new prior art at a rapid pace, particularly in fields like computer science and data analytics.
  5. Time Constraints: Patent examiners and inventors often work under significant time pressure, which can limit the thoroughness of their searches.
  6. Inconsistent Terminology: Different inventors or authors may use varying terms to describe similar concepts, making keyword-based searches less effective.
  7. Non-Patent Literature: While patent databases are well-organized, other forms of prior art like academic papers, technical reports, or product brochures can be more challenging to search systematically.
  8. Hidden or Obscure Prior Art: Some relevant prior art may be buried in obscure publications or may not be readily accessible online, making it easy to overlook.
  9. Rapidly Evolving Fields: In fast-moving technological areas, new prior art is constantly being generated, making it challenging to stay up-to-date.

In the next section, we will explore how document AI understanding, is rising to meet these challenges. We will examine how AI can not only handle the volume and complexity of modern prior art but also uncover insights and connections.

The Role of AI and Machine Learning in Prior Art Searches

In the context of prior art searches, AI and ML offer several key advantages:

  1. Data Processing Capacity: AI systems can analyze images and extract concepts such as material properties, temperatures, processes and tasks from enormous volumes of data at speeds far beyond human capability.
  2. Pattern Recognition: ML algorithms excel at identifying subtle patterns and relationships within data that might not be apparent to human observers.
  3. Continuous Learning: ML systems can continuously improve their performance as they are exposed to more data, adapting to new patterns and trends in patent filings and technical literature.
  4. Semantic Search: AI-powered semantic search goes beyond simple keyword matching. It understands the context and meaning of search queries, allowing it to identify relevant documents even when they don’t contain exact keyword matches.
  5. Concept-based Searching: AI can identify and search for related concepts, expanding the search beyond the specific terms used in the query.
  6. Cluster Analysis: ML algorithms can group similar documents together, helping searchers quickly identify different aspects or approaches related to their query.

Our Patent search analysis will be structured as follows:

  1. Named Entity Extraction from Patent abstracts using custom AI models
  2. Structured data export in CSV format
  3. Create a semantic search engine with Claude Artifact

Entity Extraction from Patents

Named entity extraction plays a pivotal role in analyzing concepts from patents. Named Entity Recognition (NER), refers to specific categories of entities such as people, places, organizations, or other objects that can be identified and classified within text data. NER systems are designed to automatically extract these entities from digital documents and classify them into predefined categories. For example, a named entity like “John Wayne” would be classified under the category “person,” while “Mexico City” would fall under “city”. This process is crucial in patent analysis as it helps in identifying and categorizing technical information such as material properties, annealing temperatures, processes and more, thereby improving information retrieval and knowledge extraction systems.

For this tutorial, we are going to focus on analyzing patent abstracts related to material science and more specifically graphene but the same process can be generalized to any domain.

The first step involves extracting the named entities Materials, Processes, and Tasks from the patent abstracts. I have chosen these entities as they are the most relevant to the material science domain. For different domains, different entities can be more relevant. There are multiple options to accomplish NER extraction:

  • Train a custom AI model to extract these specific entities
  • Use Large Language Model (LLM) such as Gemini or Claude to extract any entities on demand.
  • Use a combination of custom AI model and LLM to achieve comprehensive extraction

In this tutorial, we are going to use our own trained AI model which has been trained on these specific entities. We will use the platform the intelligent document processing platform kudra.ai because of its ease of use and friendly user experience for performing the actual entity extraction from hundreds of patents.

Kudra Entity Extraction Configuration Window
Kudra Extraction Interface

Next, we export the data in a CSV format containing all the materials, processes, and tasks in a structured format:

Structured data extracted from Kudra

Creating a Semantic Search Engine using Claude Artifact

Traditional prior art search relies on keyword and boolean search, which is becoming quickly ineffective due to the increasing complexity and volume of patent data. This method often struggles to capture the nuances of language and the context of inventions, leading to incomplete or inefficient searches. As a result, there is a growing shift towards more advanced techniques, such as using artificial intelligence and machine learning, to enhance the accuracy and efficiency of prior art searches.

In this tutorial, we are going to build on own search engine thanks to the new coding capability of Claude Artifact. To compile and deploy the generated code, we are going to use Streamlit.

Here is the prompt that we are going to give to Claude to create the app:

“Attached is a CSV file containing data extracted from patents. Can you please create an interactive app in streamlit with the following feature:

  • The user uploads a CSV file
  • Keyword boolean search with OR and AND search
  • Entity (Materials, Process, and Tasks) filter based on user input with fuzzy matching
  • Summarize the text of the patents found using GPT4-o
  • Pie chart showing the distribution of materials mentions
  • Word cloud of materials, processes, and tasks
  • Create 4 networks of cluster graphs of similar patents based on raw text, materials, processes and tasks”

After a few back and forths with Claude, we get the desired code:

Snippet of the code output from Claude Artifact

Now that we have the code in hand, we save the code in .py format and compile it in streamlit with a simple command “streamlit run patent_analysis_app.py”. I have used Pycharm IDE but it can be compiled in any other IDEs or in collab as well.

Patent search engine made with Claude Artifac
Pie chart and word cloud
Network graph to find similar patents

The control sidebar on the left, allows me to perform a boolean search by keyword but also by entities Materials, Process, and Tasks. For example, I can semantically search for patents that have graphene AND nanoparticles as materials. The results will be displayed in the table. I can click on Generate Summary to feed the abstracts to GPT4 and get a summary and insights about the article found.

Within these filtered articles, I can look for the distribution of the Process entities mentioned as shown in the pie chart below to get an idea about the processes that this material combination was involved in. For example, we can see that graphene nanoparticles were mixed with polymers to create a blend.

The GPT summary of the results is very insightful as well:

These three patents all revolve around the utilization and synthesis of graphene-based nanocomposites. The first patent discusses a method for utilizing a carbonaceous material, which could include graphene oxide, in removing sulfur compounds from fuel. The second patent talks about the “green” method of synthesizing a nano composite using reduced graphene oxide (rGO) and silica (SiO2), invoking a specific plant extract for the process. The third patent describes a method for forming a blend of graphene nanoparticles with a certain type of polymer, namely poly(styrene-co-methylmethacrylate, using microwave radiation to induce the bonding. These patents collectively promote the exploration and advancement of graphene-based solutions in various uses, from chemical processing to material synthesis.

Results and discussion

The implementation of AI-powered tools in patent searches has demonstrated significant improvements in efficiency, accuracy, and depth of analysis compared to traditional methods. The results of our study highlight several key advantages:

  1. Entity extraction at scale: The use of custom NER models, enabled the accurate extraction of specific entities such as materials, processes, and tasks from patent abstracts. This structured approach to data extraction facilitated more nuanced and targeted searches. By leveraging the custom NER model using Kudra, we are able to successfully transform vast amounts of unstructured patent data into structured information, including technical concepts, at scale. Structuring the data is critical to perform in-depth analysis in the next steps.
  2. Semantic search capabilities: The semantic search engine developed using Claude Artifact and implemented through Streamlit demonstrated the power of context-aware searching. By understanding the meaning behind search queries rather than relying solely on keyword matches, the system was able to identify relevant patents that might have been missed by traditional Boolean search methods.
  3. Interactive visualization: The integration of visual elements such as pie charts, word clouds, and network graphs provided intuitive ways to analyze patent data. These visualizations allowed for quick identification of trends, relationships, and clusters within the patent landscape, enhancing the researcher’s ability to draw insights from the data.
  4. Intelligent summarization: The use of GPT-4 for generating summaries of search results proved to be a valuable feature. This AI-driven summarization provided researchers with quick, coherent, and insightful overviews of relevant patents.

Conclusion

It is worth contemplating what we have accomplished here. Using Claude Artifact we are able to create a custom app on demand. Using a simple natural language query we can add any functionality without any code required. As software becomes more commoditized thanks to genAI, well-structured data will be more essential than ever to derive good insights.

However, it’s important to note that these tools are designed to augment human expertise rather than replace it entirely. The role of subject matter experts remains crucial in interpreting results, understanding context, and making final judgments on patentability.

This article demonstrates that genAI-powered tools when combined with well structured data can dramatically enhance the efficiency, accuracy, and depth of patent searches, offering several key benefits:

1. Improved processing of large volumes of patent data at unprecedented speeds
2. More accurate and context-aware search capabilities through semantic understanding
3. Enhanced visualization and analysis tools for identifying patterns and relationships
4. Time-saving features such as automated summarization and entity extraction

As these technologies continue to evolve, it will be important to:

1. Continuously refine and update the AI models to keep pace with emerging technologies and patent trends
2. Develop best practices for the use of AI in patent searches to ensure consistency
3. Provide training for patent professionals to effectively leverage these new tools

In conclusion, the integration of AI into patent searches represents a promising step toward more efficient and thorough prior art analysis. As these technologies mature and become more widely adopted, they have the potential to significantly improve the patent system’s ability to promote and protect innovation in an increasingly complex technological landscape.

To learn more about Kudra, visit our website at https://kudra.ai

--

--

Walid Amamou

Founder of UBIAI, annotation tool for NLP applications| PhD in Physics.