Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
97 changes: 96 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ In [ALFRED](http://research.stlouisfed.org/tips/alfred/) there is the concept of

- date: the date the value is for
- realtime_start: the first date the value is valid
- realitime_end: the last date the value is valid
- realtime_end: the last date the value is valid

For instance, there has been three observations (data points) for the GDP of 2014 Q1:

Expand All @@ -50,6 +50,18 @@ For instance, there has been three observations (data points) for the GDP of 201

This means the GDP value for Q1 2014 has been released three times. First release was on 4/30/2014 for a value of 17149.6, and then there have been two revisions on 5/29/2014 and 6/25/2014 for revised values of 17101.3 and 17016.0, respectively.

If you pass realtime_start and/or realtime_end to `get_series`, you will get a pandas.DataFrame with a pandas.MultiIndex instead of a pandas.Series.

For instance, with observation_start and observation_end set to 2015-01-01 and
realtime_start set to 2015-01-01, one will get:
```
GDP
obs_date rt_start rt_end
2015-01-01 2015-04-29 2015-05-28 17710.0
2015-05-29 2015-06-23 17665.0
2015-06-24 9999-12-31 17693.3
```

### Get first data release only (i.e. ignore revisions)

```python
Expand Down Expand Up @@ -83,6 +95,40 @@ this outputs:
2014-04-01 17294.7
dtype: float64
```

### Get latest data for multiple series for the latest release
```python
data = fred.get_dataframe(['SP500', 'GDP'], frequency='q')
data.tail()
```
this outputs:
```
SP500 GDP
2014-07-31 1975.91 17599.8
2014-10-31 2009.34 17703.7
2015-01-31 2063.69 17693.3
dtype: float64
```

Note that if you do not specify the frequency each series will be output on its
own intrinsic frequency introducing NaN in the dataframe.
```python
data = fred.get_dataframe(['GDP', 'PAYEMS'])
data.tail()
```
outputs:
```
GDP PAYEMS
2014-07-31 17599.8 139156
2014-08-31 NaN 139369
2014-09-30 NaN 139619
2014-10-31 17703.7 139840
2014-11-30 NaN 140263
2014-12-31 NaN 140592
2015-01-31 17693.3 140793
```


### Get latest data known on a given date

```python
Expand Down Expand Up @@ -215,6 +261,55 @@ this outputs:
</tbody>
</table>

### Get multiple series at multiple point in time

This work the same way as for the latest release, one just adds either
realtime_start, realtime_end, or both.

```python
data = fred.get_dataframe(['GDP', 'CP'], observation_start='7/1/2014',
observation_end='1/1/2015', realtime_start='7/1/2014')
data.tail()
```
outputs:
```
GDP CP
obs_date rt_start rt_end
2014-07-01 2014-10-30 2014-11-24 17535.4 NaN
2014-11-25 2014-12-22 17555.2 1872.7
2014-12-23 NaT 17599.8 NaN
2015-07-29 NaN 1894.6
2015-07-30 NaT NaN 1761.1
2014-10-01 2015-01-30 2015-02-26 17710.7 NaN
2015-02-27 2015-03-26 17701.3 NaN
2015-03-27 NaT 17703.7 NaN
2015-07-29 NaN 1837.5
2015-07-30 NaT NaN 1700.5
2015-01-01 2015-04-29 2015-05-28 17710.0 NaN
2015-05-29 2015-06-23 17665.0 1893.8
2015-06-24 NaT 17693.3 NaN
2015-07-29 NaN 1891.2
2015-07-30 NaT NaN 1734.5''')
```

The advantage of a this approach is that all the information is downloaded
now and one can apply further transformation without making more web queries.

For instance:
```python
dfo = df.reset_index(levels=[1, 2]) # move rt_start and rt_end to columns.
target = pd.to_datetime('2015-06-01')
dfo[(dfo.rt_start < target) & (target < dfo.rt_end)].groupby(level=0).first()
```
will output the value of the series as of the `target` date:
```python
rt_start rt_end GDP CP
obs_date
2014-07-01 2014-12-23 2015-07-29 17599.8 1894.6
2014-10-01 2015-03-27 2015-07-29 17703.7 1837.5
2015-01-01 2015-05-29 2015-06-23 17665.0 1893.8
```

### Get all vintage dates
```python
from __future__ import print_function
Expand Down
163 changes: 150 additions & 13 deletions fredapi/fred.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,11 +20,40 @@


class Fred(object):

"""Main interface to Fred.

Attributes:
earliest_realtime_start: minimum rt_start for Fred series.
latest_realtime_end: maximum rt_end for Fred series.
latest_time_stamp: maximum value for rt_end series. Defaults to
pandas.Timestamp.max.
"""

earliest_realtime_start = '1776-07-04'
latest_realtime_end = '9999-12-31'
latest_time_stamp = pd.Timestamp.max
nan_char = '.'
max_results_per_request = 1000
root_url = 'https://api.stlouisfed.org/fred'
# Maps Fred frequency code to pandas frequency code.
__freq_map = {'d': 'B', # business days.
'w': 'W', # weekly.
'bw' : '2W', # bi-weekly
'm': 'M', # monthly.
'q': '3M', # quarterly (not checked).
'sa': '6M', # semi-annual.
'a': '12M', # annual (not checked).
'wef': 'W-FRI', # Weekly, Ending Friday
'weth': 'W-THU', # Weekly Ending Thursday
'wew': 'W-WED', # Weekly Ending Wednesday
'wetu': 'W-TUE', # Weekly Ending Tuesday
'wem': 'W-MON', # Weekly Ending Monday
'wesu': 'W-SUN', # Weekly Ending Sunday
'wesa': 'W-SAT', # Weekly Ending Saturday
'bwew': '2W-WED', # Bi-weekly Ending Wednesday
'bwem': '2W-MON', # Bi-weekly Ending Monday
}

def __init__(self,
api_key=None,
Expand Down Expand Up @@ -69,17 +98,27 @@ def __fetch_data(self, url):
return root

def _parse(self, date_str, format='%Y-%m-%d'):
"""Helper function to convert FRED date string into datetime.datetime.

FRED max value 9999-12-31 is converted to Fred.latest_time_stamp.

Returns:
Time stamp as datetime.datetime or Fred.latest_time_stamp.

"""
helper function for parsing FRED date string into datetime
"""
if date_str == self.latest_realtime_end:
return self.latest_time_stamp
rv = pd.to_datetime(date_str, format=format)
if hasattr(rv, 'to_datetime'):
rv = rv.to_datetime()
return rv

def get_series_info(self, series_id):
"""
Get information about a series such as its title, frequency, observation start/end dates, units, notes, etc.
Get information about a series.

Information includes things such as its title, frequency, observation
start/end dates, units, notes, etc.

Parameters
----------
Expand All @@ -98,25 +137,35 @@ def get_series_info(self, series_id):
info = pd.Series(root.getchildren()[0].attrib)
return info

def get_series(self, series_id, observation_start=None, observation_end=None, **kwargs):
def get_series(self, series_id, observation_start=None,
observation_end=None, realtime_start=None,
realtime_end=None, **kwargs):
"""
Get data for a Fred series id. This fetches the latest known data, and is equivalent to get_series_latest_release()

Parameters
----------
series_id : str
Fred series id such as 'CPIAUCSL'
observation_start : datetime or datetime-like str such as '7/1/2014', optional
earliest observation date
observation_end : datetime or datetime-like str such as '7/1/2014', optional
latest observation date

observation_start : datetime or datetime-like str such as '7/1/2014'
earliest observation date (optional)
observation_end : datetime or datetime-like str such as '7/1/2014'
latest observation date (optional)
realtime_start : datetime or datetime-like str such as '7/1/2014'
earliest as-of date (optional)
realtime_end : datetime or datetime-like str such as '7/1/2014'
latest as-of date (optional)
kwargs : additional parameters
Any additional parameters supported by FRED. You can see https://api.stlouisfed.org/docs/fred/series_observations.html for the full list
Any additional parameters supported by FRED. You can see
https://api.stlouisfed.org/docs/fred/series_observations.html
for the full list

Returns
-------
data : Series
a Series where each index is the observation date and the value is the data for the Fred series
a pandas Series where each index is the observation date and the
value is the data for the Fred series
"""
url = "%s/series/observations?series_id=%s" % (self.root_url, series_id)
if observation_start is not None:
Expand All @@ -126,20 +175,108 @@ def get_series(self, series_id, observation_start=None, observation_end=None, **
if observation_end is not None:
observation_end = pd.to_datetime(observation_end, errors='raise')
url += '&observation_end=' + observation_end.strftime('%Y-%m-%d')
if realtime_start is not None:
realtime_start = pd.to_datetime(realtime_start, errors='raise')
url += '&realtime_start=' + realtime_start.strftime('%Y-%m-%d')
if realtime_end is not None:
realtime_end = pd.to_datetime(realtime_end, errors='raise')
url += '&realtime_end=' + realtime_end.strftime('%Y-%m-%d')
if kwargs.keys():
url += '&' + urlencode(kwargs)
root = self.__fetch_data(url)
if root is None:
raise ValueError('No data exists for series id: ' + series_id)
data = {}
realtime = (realtime_start or realtime_end)
values = []
obsdates = []
rtstarts = []
rtends = []
for child in root.getchildren():
val = child.get('value')
if val == self.nan_char:
val = float('NaN')
else:
val = float(val)
data[self._parse(child.get('date'))] = val
return pd.Series(data)
values.append(val)
obsdates.append(self._parse(child.get('date')))
if realtime:
rtstarts.append(self._parse(child.get('realtime_start')))
rtends.append(self._parse(child.get('realtime_end')))
if realtime:
names = ['obs_date', 'rt_start', 'rt_end']
index = pd.MultiIndex.from_arrays([obsdates, rtstarts, rtends],
names=names)
return pd.DataFrame(values, index=index, columns=[series_id])
else:
return pd.Series(values, index=obsdates)

def get_multi_series(self, series_ids, observation_start=None,
observation_end=None, realtime_start=None,
realtime_end=None, **kwargs):
"""Get multiple series in one dataframe.

Pass a frequency in kwargs to specify the release frequency of interest.
It will save a call to the series info to find out what frequency the
series is released.

If the series native release frequencies (default used unless one
specify the frequency in kwargs) do not match, the dataframe will show
NaN.

Parameters
----------
series_ids : list of str
Fred series id such as ['CPIAUCSL', 'SP500']
observation_start : datetime or datetime-like str such as '7/1/2014'
earliest observation date (optional)
observation_end : datetime or datetime-like str such as '7/1/2014'
latest observation date (optional)
realtime_start : datetime or datetime-like str such as '7/1/2014'
earliest as-of date (optional)
realtime_end : datetime or datetime-like str such as '7/1/2014'
latest as-of date (optional)
frequency : str
Values for frequency are expected to be lowercase codes (e.g. w, m,
q, ...). For more example, See
https://api.stlouisfed.org/docs/fred/series_observations.html#frequency
kwargs : additional parameters
Any additional parameters supported by FRED. For more info, see
https://api.stlouisfed.org/docs/fred/series_observations.html

Returns
-------
info : pandas.DataFrame
a DataFrame where each row is the observation date and the value
for the Fred series.

"""
all_series = []
columns = []
freq_override = None
if 'frequency' in kwargs:
freq_override = kwargs['frequency']
for series_id in series_ids:
if freq_override:
freq = freq_override
else:
info = self.get_series_info(series_id)
freq = info['frequency_short'].lower()
serie = self.get_series(series_id,
observation_start=observation_start,
observation_end=observation_end,
realtime_start=realtime_start,
realtime_end=realtime_end,
**kwargs)
# If the serie is not a stored as a dataframe, turn it into one.
if hasattr(serie, 'to_frame'):
serie = serie.to_frame(series_id)
actual_start = serie.index[0]
if freq not in self.__freq_map.keys():
raise ValueError('unknown frequency {} for {}'.
format(freq, series_id))
all_series.append(serie)
columns.append(series_id)
return pd.concat(all_series, axis=1)

def get_series_latest_release(self, series_id):
"""
Expand Down
Loading