Building Autonomous Agent for Extracting Insights from Financial News
Timely and accurate information is critical for making informed decisions when investing in public companies. Financial analysts often face the challenge of processing vast amounts of financial news to extract relevant data efficiently. Manually reviewing and extracting critical information from each article is impractical due to time constraints and the sheer volume of information. Analyzing large amounts of data manually to extract insights is simply not feasible without advanced analytics.
In this tutorial, we will explore how to automatically extract critical information from news articles and analyze it using an autonomous agent to derive new insights.
Data Collection
Our data pipeline begins with Google News as the primary news aggregation source. Using the SerpAPI’s Google News endpoint, we scrape all the news articles related to a specific company, in this case Intel, from different outlets such as Yahoo, Bloomberg, or Reuters to get comprehensive coverage. Key considerations in the collection phase include:
- Company-specific queries using ticker symbols (e.g., “AAPL” OR “Apple Inc”)
- Temporal Considerations: The scraping frequency directly impacts the granularity of potential analysis. In our methodology, we typically employ daily aggregation for daily news related to a specific company.
Data Extraction
Once the data has been properly collected in CSV format, we need to post-process it to extract critical information from the raw unstructured text. To do so, we use Kudra.ai intelligent document processing tool to automate the extraction process. We are interested in extracting the following data:
- Company_name
- Ticker_symbol
- Corporate_action: Extract actions taken by the company that is discussed in the article.
- Earning_announcement: Extract earning announcement sentence
- Competitor Names: Extract any mentions of competitors
- Macroeconomic_indicator: Extract mentions of macroeconomic indicator
- Person_name
- Locations
- Organization_name
- Regulation_mentions: Extract any mentions of regulation sentences
- Analyst_rating
- Earnings_Figures Extract earning figures if any
Using Kudra’s workflow builder, we add the GPT Entity Extractor service, which is a custom LLM that enables us to extract entities by specifying the name of the label to extract along with the description.
Thanks to the modularity of the workflow builder, we can chain multiple AI services to increase the depth of our analysis. In this case, we are adding a new AI service called the “Sentiment analyzer,” which calls the GPT-4o-mini API to analyze the text and provide us with a sentiment score between 1 and 10. For the best results, we recommend tuning the prompt further based on your needs.
Finally, we chain another GPT4o-mini service to get a summary of the financial new article.
Thanks to Kudra’s workflow builder, we can process thousands of news articles at once in a matter of minutes, saving us a considerable amount of time during the extraction process.
Building An Agent for Data Analysis Using Natural Language Interface
With the emergence of LLMs, natural language interfaces have become the standard for interacting with AI models for chat purposes. However, one aspect that has not been developed as much is a natural language interface for querying structured data. Having such an interface will provide tremendous benefits:
- Democratization of Data Access
- Analysts without SQL expertise can perform complex queries
- Reduced time-to-insight for business users
2. Enhanced Productivity
- Faster query iteration and refinement
- More intuitive exploration of data relationships
While traditional text-to-SQL methods, where the user’s question is converted into an SQL query using an LLM, are more commonly used, they suffer from many limitations, such as the potential for hallucination in query generation and errors in the SQL code itself.
For this tutorial, we will use a different approach: instead of relying on text-to-SQL generation, we will feed a CSV file of our data to an autonomous agent and ask it to generate the necessary code to query the data and answer any questions from the user.
Here is an example of a natural language query that we can ask to our autonomous agent:
prompt = "show me the sentiments trend over time in a colorful bar plot?"
Data Analysis with Natural Language Query
The first step is to upload the CSV file in a google colab instance and load in a dataframe in order for the agent to access it.
Once the data has been loaded, we can ask the agent questions in English, such as, “Show me the sentiment trend over time in a colorful bar plot?” The agent will then generate the necessary Python code to respond to the question. Furthermore, it will handle any errors that may arise, such as missing values in the data or missing libraries in the code, and fix them autonomously. This is one of the main advantages of using this agentic approach.
Under the hood, the agent will send the dataframe schema and the user’s question to an LLM of choice; in our case, we are using GPT-4o, to generate the necessary Python code and executeit in the instance.
Once the code is executed, the agent will read the logs and send back any errors that may arise to the LLM for resolution. The number of LLM iterations can vary depending on the complexity of the question, but sometimes it can take up to 5–6 API calls to the LLM to get the answer. In some cases, the agent may enter an infinite loop and never arrive at a solution.
Below is another complex question that requires the agent to find correlate the corporate actions with earning announcements.
prompt = "What is the relationship between corporate actions and earnings announcements ?"
which results in:
✅ ✅ ✅ ✅ FINAL RESULT ✅ ✅ ✅ ✅
The analysis of both corporate actions and earnings announcements has yielded information about their most frequent occurrences. However, due to truncation, it appears the qualitative text of some entries overshadowed the numerical analysis. Here’s a breakdown:
1. **Corporate Actions**:
- A variety of corporate actions are mentioned, which might include acquisitions, buyouts, and investments.
- The text also highlights specific instances and impacts of these actions.
2. **Earnings Announcements**:
- The focus seems to be on quantitative metrics and their implications, such as earnings per share, revenue figures, and guidance outcomes.
To summarize:
- Common corporate actions may have strategic impacts that coincide with earning announcements.
- We observed a diverse range of thematic instances of how corporate actions might influence or coincide with earnings announcements.
### Next Steps
- **Summarize the Key Findings**: Use the gathered insights to provide a coherent overview of how corporate actions relate to earnings announcements.
Here's the summarized information of the most frequent trends or patterns:
- **Acquisitions and Strategic Changes**: Often, significant corporate actions such as acquisitions coincide with announcements of financial results, possibly affecting guidance and investor sentiment.
- **Earnings Performance Impact**: Corporate actions like buyouts or restructuring often accompany earnings results to explain variance in performance.
If more statistical or visual representation is required, please let me know. Otherwise, this summary adequately covers the relationship between corporate actions and earnings announcements in the dataset.
The task is done.
✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅
The possibilities are limitless. Agents are truly capable of carrying out complex data analysis tasks such as clustering, statistical analysis and even creating ML models for forecasting.
Conclusion:
Through our exploration of Kudra’s capabilities and the implementation of natural language interfaces, we’ve demonstrated a powerful workflow that transforms the way financial analysts interact with vast amounts of unstructured data. Consistent and accurate extraction of key entities is very important to ensure standardized, high-quality data for further data analysis. Natural language querying of structured data with autonomous agent will be the norm in the years to come and will remove many technical barriers for business users to extract intelligence from their unstructured documents.
The powerful combination of intelligent document processing, autonomous agents, and natural language interfaces will opened up new frontiers in financial analysis and other industries.
As we’ve seen through our exploration of Kudra and autonomous agents, the future of financial analysis is not just about processing more data — it’s about processing it more intelligently. The possibilities are indeed limitless, and we’re only scratching the surface of what’s possible when we combine human expertise with AI-powered analysis tools.
If you would like to learn more about Kudra, visit https://kudra.ai for more information.