Johan Nilssons Lifestream

Cross validation and Regression

I'm trying to perform a regression on my data in order to predict an optimal delay for my products sales.

   ID           Delay(days)
   50            120

I've used train_test_split to split my data, everything was fine. But R2 squared got like between 0.07 and -0.12.

My first question is how can I interpret this, improve it ?

My second step is to use the LeaveOneOut, Kfolds, so I went to the sklearn doc and grabbed the code there but I'm getting an error saying: IndexError: indices are out-of-bounds.

Can someone help me in explaining how this works in term of coding ?

Thank you

EDIT: Here is my code: I have 2 datasets, train which has all sold and thrown products, and 'test' data that has all current products.

y = train['Delay']
X = train.drop('Delay',axis=1)


loo = LeaveOneOut()
for train_index, test_index in loo.split(X):
    print("TRAIN:", train_index, "TEST:", test_index)
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    print(X_train, X_test, y_train, y_test)

model = xgb.XGBRegressor()
model.fit(X_train,y_train)


score = model.score(X_test, y_test)

ypredict = model.predict(test)

print(score)

via Stack Overflow

blog comments powered by Disqus
Get the source for phplifestream at Github