Post with python requests - how do I get the correct table data I'm requesting?
I am trying to get historical economic calendar data from this website - https://www.investing.com/economic-calendar/ for the following dates (1 Feb 2020 to 5 Feb 2020) .
Today is February 4, 2020.
If I use the https://www.investing.com/economic-calendar/ url below , I can pull the table using beautifulsoup, but I can't select any day other than the current day. I saved a table in a Python script for today (February 4, 2020).
import requests
import pandas as pd
from bs4 import BeautifulSoup
payload = {"country[]":["25","32","6","37","72","22","17","39","14","10","35","43","56","36","110","11","26","12","4","5"],
"dateFrom":"2020-02-01",
"dateTo":"2020-02-05",
"timeZone":"8",
"timeFilter":"timeRemain",
"currentTab":"custom",
"limit_from":"0"}
urlheader = {
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36",
"X-Requested-With": "XMLHttpRequest"
}
url = "https://www.investing.com/economic-calendar/"
req = requests.post(url, data=payload, headers=urlheader)
print(req)
soup = BeautifulSoup(req.content, "lxml")
table = soup.find('table', id="economicCalendarData")
The table variable looks like this
I can see that it sends a post request to "https://www.investing.com/economic-calendar/Service/getCalendarFilteredData" whenever I change the date range or filter settings.
Here is the request data I found.
Here is the POST link
So I use the following code instead, as I want to select the dates.
import requests
import pandas as pd
from bs4 import BeautifulSoup
payload = {"country[]":["25","32","6","37","72","22","17","39","14","10","35","43","56","36","110","11","26","12","4","5"],
"dateFrom":"2020-02-01",
"dateTo":"2020-02-05",
"timeZone":"8",
"timeFilter":"timeRemain",
"currentTab":"custom",
"limit_from":"0"}
urlheader = {
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36",
"X-Requested-With": "XMLHttpRequest"
}
url = "https://www.investing.com/economic-calendar/Service/getCalendarFilteredData"
req = requests.post(url, data=payload, headers=urlheader)
print(req)
soup = BeautifulSoup(req.content, "lxml")
table = soup.find('table', id="economicCalendarData")
But this time, there is no economicCalendarData, so the table variable comes out empty. The soup variable has data in it but there's no table data in it.
This is the table I'm trying to save.
Like I said earlier, if I use the url as https://www.investing.com/economic-calendar/, I can get the table data for the current day only (4 Feb 2020); no matter what dates I enter into the payload (dateFrom, dateTo).
For some reason, when I try to post to https://www.investing.com/economic-calendar/Service/getCalendarFilteredData , the table becomes empty , even though the soup variable contains data, not what I'm requesting. What am I doing wrong? How can I save the form on a date of my choosing?
You are really close. If I understand your requirements, the following should help you:
import requests
from bs4 import BeautifulSoup
url = "https://www.investing.com/economic-calendar/Service/getCalendarFilteredData"
payload = {"country[]":["25","32","6","37","72","22","17","39","14","10","35","43","56","36","110","11","26","12","4","5"],
"dateFrom":"2020-02-01",
"dateTo":"2020-02-05",
"timeZone":"8",
"timeFilter":"timeRemain",
"currentTab":"custom",
"limit_from":"0"}
req = requests.post(url, data=payload, headers={
"User-Agent":"Mozilla/5.0",
"X-Requested-With": "XMLHttpRequest"
})
soup = BeautifulSoup(req.json()['data'],"lxml")
for items in soup.select("tr"):
data = [item.get_text(strip=True) for item in items.select("th,td")]
print(data)