MACHINE LEARNING RECIPES
DATA CLEANING PYTHON
DATA MUNGING
PANDAS CHEATSHEET
ALL TAGS
# How to do cross validation for time series?

This recipe helps you do cross validation for time series

While fitting our model, we might get lucky enough and get the best test dataset while splitting. It might even overfit or underfit our model. It is therefore suggested to perform cross validation i.e. splitting several times and there after taking mean of our accuracy.

So this recipe is a short example on how to do cross validation on time series . Let's get started.

```
import numpy as np
import pandas as pd
from statsmodels.tsa.arima_model import ARMA
from sklearn.model_selection import TimeSeriesSplit
from sklearn.metrics import mean_squared_error
```

Let's pause and look at these imports. Numpy and pandas are general ones. Here statsmodels.tsa.arima_model is used to import ARMA library for building of model. TimeSeriesSplit will help us in easy and random splitting while performing cross validation.

```
df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv', parse_dates=['date'])
df.head()
```

Here, we have used one time series data from github.

Now our dataset is ready.

```
tscv = TimeSeriesSplit(n_splits = 4)
rmse = []
for train_index, test_index in tscv.split(df):
cv_train, cv_test = df.iloc[train_index], df.iloc[test_index]
model = ARMA(cv_train.value, order=(0, 1)).fit()
predictions = model.predict(cv_test.index.values[0], cv_test.index.values[-1])
true_values = cv_test.value
rmse.append(np.sqrt(mean_squared_error(true_values, predictions)))
```

Firstly, we have set number of splitting to be 4. Then we have loop for our cross validation. Each time, dataset is spliited to train and test datset; model is fitted on it, prediction are made and RMSE(accuracy) is calculated for each split.

```
print(np.mean(rmse))
```

Here, we have printed the coeffiecient of model and the predicted values.

Once we run the above code snippet, we will see:

6.577393548356742

You might get different result but it will be close to given due to limited splitting.

In this time series project, you will learn how to build an autoregressive model in Python from Scratch for forecasting time series data.

Use the Zillow dataset to follow a test-driven approach and build a regression machine learning model to predict the price of the house based on other variables.

Data Science Project - Build a recommendation engine which will predict the products to be purchased by an Instacart consumer again.

In this supervised learning machine learning project, you will predict the availability of a driver in a specific area by using multi step time series analysis.

In this NLP Project, you will learn how to use the popular topic modelling library Gensim for implementing two state-of-the-art word embedding methods Word2Vec and FastText models.

This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

In this OpenCV project, you will learn computer vision basics and the fundamentals of OpenCV library using Python.

In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.