Daily RAG research paper hub with arXiv, Gemini AI, and Notion

Name: Daily RAG research paper hub with arXiv, Gemini AI, and Notion
Availability: InStock
Author: dongou

Daily RAG research paper hub with arXiv, Gemini AI, and Notion preview

Open on n8n.io

$20/month : Unlimited workflows

2500 executions/month

Try free

THE #1 IN WEB SCRAPING

Scrape any website without limits

Try free

HOSTINGER

Early Deal
DISCOUNT 20%

Self-hosted n8n

Unlimited workflows - from $4.99/mo

Try free

#1 hub for scraping, AI & automation

6000+ actors - $5 credits/mo

Try free

Important notice

This workflow is provided as-is. Please review and test before using in production.

Overview

Fetch user-specific research papers from arXiv on a daily schedule, process and structure the data, and create or update entries in a Notion database, with support for data delivery

Paper Topic: single query keyword
Update Frequency: Daily updates, with fewer than 20 entries expected per day
Tools:
- Platform: n8n, for end-to-end workflow configuration
- AI Model: Gemini-2.5-Flash, for daily paper summarization and data processing
- Database: Notion, with two tables — Daily Paper Summary and Paper Details
- Message: Feishu (IM bot notifications), Gmail (email notifications)

1. Data Retrieval

arXiv API

The arXiv provides a public API that allows users to query research papers by topic or by predefined categories.

arXiv API User Manual

Key Notes:

Response Format: The API returns data as a typical Atom Response.
Timezone & Update Frequency:
- The arXiv submission process operates on a 24-hour cycle.
- Newly submitted articles become available in the API only at midnight after they have been processed.
- Feeds are updated daily at midnight Eastern Standard Time (EST).
- Therefore, a single request per day is sufficient.
Request Limits:
- The maximum number of results per call (max_results) is 30,000,
- Results must be retrieved in slices of at most 2,000 at a time, using the max_results and start query parameters.
Time Format:
- The expected format is [YYYYMMDDTTTT+TO+YYYYMMDDTTTT],
- TTTT is provided in 24-hour time to the minute, in GMT.

Scheduled Task

Execution Frequency: Daily
Execution Time: 6:00 AM
Time Parameter Handling (JS):
According to arXiv’s update rules, the scheduled task should query the previous day’s (T-1) submittedDate data.

2. Data Extraction

Data Cleaning Rules (Convert to Standard JSON)

Remove Header
- Keep only the 【entry】【/entry】 blocks representing paper items.
Single Item
- Each 【entry】【/entry】 represents a single item.
Field Processing Rules
- 【id】【/id】 ➡️ id
  Extract content.
  Example:
  【id】http://arxiv.org/abs/2409.06062v1【/id】 → http://arxiv.org/abs/2409.06062v1
- 【updated】【/updated】 ➡️ updated
  Convert timestamp to yyyy-mm-dd hh:mm:ss
- 【published】【/published】 ➡️ published
  Convert timestamp to yyyy-mm-dd hh:mm:ss
- 【title】【/title】 ➡️ title
  Extract text content
- 【summary】【/summary】 ➡️ summary
  Keep text, remove line breaks
- 【author】【/author】 ➡️ author
  Combine all authors into an array
  Example: [ "Ernest Pusateri", "Anmol Walia" ] (for Notion multi-select field)
- 【arxiv:comment】【/arxiv:comment】 ➡️ Ignore / discard
- 【link type="text/html"】 ➡️ html_url
  Extract URL
- 【link type="application/pdf"】 ➡️ pdf_url
  Extract URL
- 【arxiv:primary_category term="cs.CL"】 ➡️ primary_category
  Extract term value
- 【category】 ➡️ category
  Merge all 【category】 values into an array
  Example: [ "eess.AS", "cs.SD" ] (for Notion multi-select field)
Add Empty Fields
- github
- huggingface

3. Data Processing

Analyze and summarize paper data using AI, then standardize output as JSON.

Single Paper Basic Information Analysis and Enhancement
Daily Paper Summary and Multilingual Translation

4. Data Storage: Notion Database

Create a corresponding database in Notion with the same predefined field names.
In Notion, create an integration under Integrations and grant access to the database. Obtain the corresponding Secret Key.
Use the Notion "Create a database page" node to configure the field mapping and store the data.

Notes

"Create a database page" only adds new entries; data will not be updated.
The updated and published timestamps of arXiv papers are in UTC.
Notion single-select and multi-select fields only accept arrays. They do not automatically parse comma-separated strings. You need to format them as proper arrays.
Notion does not accept null values, which causes a 400 error.

5. Data Delivery

Set up two channels for message delivery: EMAIL and IM, and define the message format and content.

Email: Gmail

GMAIL OAuth 2.0 – Official Documentation
Configure your OAuth consent screen

Steps:

Enable Gmail API
Create OAuth consent screen
Create OAuth client credentials
Audience: Add Test users under Testing status

Message format: HTML
(Model: OpenAI GPT — used to design an HTML email template)

IM: Feishu (LARK)

Bots in groups
Use bots in groups

dongou

1 workflow

Nodes

@n8n/n8n-nodes-langchain.chainllm @n8n/n8n-nodes-langchain.lmchatgooglegemini n8n-nodes-base.code n8n-nodes-base.if n8n-nodes-base.scheduletrigger n8n-nodes-base.switch n8n-nodes-base.httprequest n8n-nodes-base.gmail

Complexity

advanced

Published 23 Sept 2025

Likes 0

View on n8n.io Download Workflow

Install path: /data/workflows/8847/8847.json

Share Your Workflow

Have a useful automation to share? Publish it and help the community.

Submit Your Template How to Submit

Related Workflows

Compare physical vs digital 24K gold costs and returns with GoldAPI, IBJA, Groq and Google Sheets

# Gold Investment Intelligence: Physical vs. Digital 24K Cost & Return Analysis This high-performance n8n workflow automates the complex task of comparing gold investment types in the Indian market. By pulling live data from the **GoldAPI** (Digital) and the **IBJA** (Physical benchmark), it calculates the "Landed Cost" including GST and making charges to provide a data-driven "Buy/Wait" verdict powered by a **Groq-hosted LLM**. Stop manually calculating gold spreads across different platforms. This workflow fetches real-time 24K gold rates, applies localized taxes (3% GST) and industry-standard fees (3-8%), analyzes global market sentiment via Yahoo Finance RSS and logs a professional investment report to Google Sheets. It transforms raw data into an actionable "Efficiency Score" using AI. ### Quick Implementation Steps: 1. **Import** the JSON file into your n8n canvas. 2. **Get API Key:** Sign up at GoldAPI.io and paste your token into the "Get Digital Price" header. 3. **Connect AI:** Add your Groq API credentials to the "Groq Chat Model" node. 4. **Prepare Sheet:** Create a Google Sheet with these headers: `Date`, `Digital_Price`, `Physical_Price`, `Arbitrage`, `Sentiment` and `Efficiency_Score`. 5. **Authorize:** Link your Google account in the "Add Report" node and select your spreadsheet. ## What It Does This workflow acts as a 24/7 Quantitative Financial Analyst specialized in the gold market. It eliminates the "hidden cost" surprise by calculating the absolute final price an investor pays out-of-pocket for 1 gram of 24K gold. It doesn't just look at the ticker price; it factors in the 3% GST applicable in India, platform spreads for digital gold apps and making charges for physical coins/bars. Beyond raw arithmetic, the workflow contextually understands the global landscape. By scraping the latest commodity headlines from Yahoo Finance, the integrated AI Agent determines if the current market vibe is "Greedy," "Neutral," or "Fearful." It then synthesizes the price gap and the news into a single "Efficiency Score" (1-100), telling you exactly how favorable the buying conditions are today. Finally, it maintains a permanent investment ledger. Every execution logs the data into a Google Sheet and formats a clean report ready to be sent to messaging apps, ensuring you never miss a buying opportunity when the "Arbitrage" (the price difference) favors one medium over the other. ## Who’s It For * **Retail Investors:** Who want to know if it's cheaper to buy gold on a digital app or from a local jeweler today. * **Financial Advisors:** Who need automated, data-backed daily gold market sentiment reports for their clients. * **Gold Enthusiasts:** Tracking the price variance (Arbitrage) between digital and physical assets to optimize their portfolio. * **FinTech Developers:** Looking for a professional blueprint on how to combine Web Scraping, Financial APIs and LLMs in n8n. ## Requirements * [n8n Instance (supporting LangChain nodes)](https://n8n.partnerlinks.io/om1efg2qgvwi). * **GoldAPI Token:** Free or Paid API key from GoldAPI.io. * **Groq API Key:** To power the AI analysis (supports Llama 3 or Mixtral models). * **Google Account:** For logging data into Google Sheets. * **Internet Access:** To scrape the IBJA (India Bullion and Jewellers Association) website. ## How It Works & Set Up ### Step 1: Data Collection The workflow starts with a **Manual Trigger** (can be changed to a **Schedule Trigger** for daily runs). It fetches the live 1-gram digital gold price in INR via an HTTP Request to GoldAPI. Simultaneously, it scrapes the official IBJA website to extract the physical benchmark rate using the CSS selector `#GoldRatesCompare999`. ### Step 2: The Math (Landed Costs) The **Comparator** node uses JavaScript to apply real-world fees: * **Digital:** Adds 3% GST and a 3% platform spread. * **Physical:** Adds 3% GST and 8% making charges (standard for coins). * It then determines which is cheaper and calculates the exact difference. ### Step 3: Market Context The workflow pulls the top 3 most recent gold market headlines from Yahoo Finance. These are bundled together so the AI can read them in one go. ### Step 4: AI Analysis & Output The **AI Agent** takes the math and the news headlines. It outputs a structured JSON verdict containing: * **Arbitrage Check:** A one-sentence summary of the best deal. * **Sentiment:** A label (Greedy/Neutral/Fearful). * **Efficiency Score:** A rating out of 100. ### Step 5: Logging The data is formatted into a readable message and appended as a new row in your designated Google Sheet. ## How To Customize Nodes * **Adjust Fees:** If your local jeweler charges 10% instead of 8%, simply open the **Comparator** node and change the `1.08` value in the code. * **Change AI Model:** In the **Groq Chat Model** node, you can switch between different models (e.g., Llama-3-70b vs 8b) depending on your speed and accuracy needs. * **Update Selectors:** If the IBJA website updates its design, you can update the CSS Selector in the **Extract Physical Price** node to point to the new data location. ## Add-ons * **WhatsApp/Telegram Notifications:** Add a node after "Message Format" to send the daily report directly to your phone. * **Email Summary:** Use the Gmail or Outlook node to send a weekly summary of the best buying days to your inbox. * **Price Drop Alert:** Add an **IF Node** to only trigger the workflow if the gold price drops by more than 2% in a single day. ## Use Case Examples * **Automated Savings:** Set the workflow to run every morning at 10 AM to decide where to put your daily savings. * **Jewelry Business Tool:** Use the data to show customers that your physical prices are competitive compared to digital market apps. * **Market Research:** Use the Google Sheet history to track how "Sentiment" affects the price of gold over several months. * **Arbitrage Trading:** Identify moments when digital gold is significantly undervalued compared to physical benchmarks. * **Wealth Management:** Proactively notify clients when the "Efficiency Score" hits 90+, indicating a prime buying opportunity. ## Troubleshooting Guide | Issue | Possible Cause | Solution | | :--- | :--- | :--- | | **Digital Price = 0 or Error** | API Key is missing or expired. | Check the "Get Digital Price" node; ensure the token is in the headers. | | **Physical Price Extraction Fails** | Website structure changed. | Verify the URL and the CSS selector `#GoldRatesCompare999` on IBJA. | | **AI Agent gives generic response** | Prompt is too vague or news is missing. | Ensure the "Data Array" node is successfully bundling headlines. | | **Google Sheets won't append** | Missing columns or incorrect ID. | Ensure headers (`Date`, `Sentiment`, etc.) match exactly in your sheet. | ## Need Help? Need assistance setting up your GoldAPI credentials, customizing the making charges or adding custom notifications to this workflow? We can help you build and scale your automation ideas. **[Contact WeblineIndia](https://www.weblineindia.com/contact-us.html)** to help you customize this workflow or build similar AI-powered financial tools for your business.

View

Track Idealista market stats weekly and email Google Sheets reports with Idealista Scraper

![Screenshot 20260413 at 15.59.08.png](fileId:5459) ## Who is this for Real estate investors comparing markets across cities, agencies generating market reports for clients, property consultants doing due diligence, or analysts tracking price trends in Southern European property markets. ## What this workflow does Every Monday at 8am, this workflow scrapes property listings from multiple Idealista markets, calculates key statistics, builds an HTML comparison report, emails it to you, and logs data to Google Sheets for long-term trend tracking. 1. The Schedule Trigger fires every Monday at 8am 2. Two Idealista Scraper nodes fetch Madrid and Barcelona listings in parallel via API-based extraction (never breaks) 3. Code nodes calculate per-market statistics: average/median price, price range, price per m2, average size, average rooms 4. The Merge node combines both market analyses into one dataset 5. A Code node builds a professionally formatted HTML comparison table 6. The report is emailed via Gmail and weekly stats are logged to Google Sheets Idealista has no official API and no built-in market analytics. This workflow turns raw listing data into actionable market intelligence, automatically, every week. ## Setup 1. Install **n8n-nodes-idealista-scraper** via Settings > Community Nodes (self-hosted n8n only) 2. Add your **Apify API** credential ([get token](https://console.apify.com/account/integrations)) 3. Add your **Gmail** credential (OAuth2) 4. Create a Google Sheet with a tab named "MarketHistory" 5. Update the email recipient in the Gmail node 6. Activate the workflow ## Requirements - Self-hosted n8n instance (community node not available on n8n Cloud) - Apify account with API token - Gmail account with OAuth2 credential configured in n8n - Google Sheets account with OAuth2 credential configured in n8n ## How to customize this workflow - Add more cities by duplicating a Scraper + Analysis pair (Valencia, Rome, Lisbon, Milan) - Switch `operation` from `sale` to `rent` to analyze rental markets - Add price filters to focus on specific segments (luxury above 1M EUR, budget below 200K EUR) - Calculate rental yield by scraping both sale and rent for the same area - Add an IF node after analysis to send alerts when average price drops below a threshold - Cost: ~$0.50/week (2 markets x 3 pages x ~40 properties each)

View

Generate weekly Reddit startup opportunity reports with Groq AI

### This n8n template automatically scans 10 subreddits every Monday, filters ~1000 posts for genuine frustration signals, and delivers a structured startup opportunity report to your inbox — powered by Groq AI. Perfect for indie hackers, product builders, and founders who want to stay on top of what people are actually begging someone to build — without spending hours manually browsing Reddit. **Good to know** - Uses Reddit's public JSON API — no Reddit account or API key required - Groq's free tier is generous enough to run this weekly at zero cost - Each run analyzes up to 1000 posts and completes in under 60 seconds ## How it works - A Schedule Trigger fires every Monday at 8AM to kick off the workflow - A Code node defines 10 target subreddits (entrepreneur, SaaS, freelance, startups, and more) - An HTTP Request node fetches the 100 newest posts from each subreddit using Reddit's public JSON endpoint - A Code node filters all posts against 27 frustration-signal keywords like "why doesn't X exist", "sick of manually", "wish there was a tool for this" - An Aggregate node merges all matched posts from all 10 subreddits into a single dataset - A Code node builds a structured AI prompt embedding all posts with specific instructions for analysis - An HTTP Request node sends the dataset to Groq's API (llama-3.3-70b-versatile) for deep analysis - A Code node wraps the AI output in a clean HTML email template - A Gmail node delivers the weekly report directly to your inbox ## How to use - Import the workflow and connect your Groq API key as an HTTP Header Auth credential - Connect your Gmail account via OAuth2 - Change the recipient email in the Gmail node to your own address - Run manually first to verify the full flow end to end, then activate the schedule ## Requirements - Groq account for AI analysis (free at console.groq.com) - Gmail account for delivery via OAuth2 ## Customising this workflow - Edit the subreddit list in the Define Subreddits node to focus on your specific niche or industry - Add or remove keywords in the Filter Posts node to tune how sensitive the pain detection is - Swap the Gmail node for Slack, Telegram, or Outlook if you prefer a different delivery channel - Change the schedule from weekly to daily for higher-frequency monitoring - Replace Groq with OpenAI GPT-4o by swapping the HTTP Request URL and auth header — the prompt format is identical

View

Need Custom Automation?

Get help designing a custom n8n workflow that connects your stack and fits your process.

Daily RAG research paper hub with arXiv, Gemini AI, and Notion

Workflow preview

Important notice

Overview

1. Data Retrieval

arXiv API

Scheduled Task

2. Data Extraction

Data Cleaning Rules (Convert to Standard JSON)

3. Data Processing

4. Data Storage: Notion Database

5. Data Delivery

Email: Gmail

IM: Feishu (LARK)