These are my two dataframes saved in two variables:
> print(df.head())> club_name tr_jan tr_dec year 0 ADO Den Haag 1368 1422 2010 1 ADO Den Haag 1455 1477 2011 2 ADO Den Haag 1461 1443 2012 3 ADO Den Haag 1437 1383 2013 4 ADO Den Haag 1386 1422 2014> print(rankingdf.head())> club_name ranking year 0 ADO Den Haag 12 2010 1 ADO Den Haag 13 2011 2 ADO Den Haag 11 2012 3 ADO Den Haag 14 2013 4 ADO Den Haag 17 2014
I'm trying to merge these two using this code:
new_df = df.merge(ranking_df, on=['club_name', 'year'], how='left')
The how='left' is added because I have less datapoints in my ranking_df than in my standard df.
The expected behaviour is as such:
> print(new_df.head()) > club_name tr_jan tr_dec year ranking0 ADO Den Haag 1368 1422 2010 121 ADO Den Haag 1455 1477 2011 132 ADO Den Haag 1461 1443 2012 113 ADO Den Haag 1437 1383 2013 144 ADO Den Haag 1386 1422 2014 17
But I get this error:
ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat
But I do not wish to use concat since I want to merge the trees not just add them on.
Another behaviour that's weird in my mind is that my code works if I save the first df to .csv and then load that .csv into a dataframe.
The code for that:
df = pd.DataFrame(data_points, columns=['club_name', 'tr_jan', 'tr_dec', 'year'])df.to_csv('preliminary.csv')df = pd.read_csv('preliminary.csv', index_col=0)ranking_df = pd.DataFrame(rankings, columns=['club_name', 'ranking', 'year'])new_df = df.merge(ranking_df, on=['club_name', 'year'], how='left')
I think that it has to do with the index_col=0 parameter. But I have no idea to fix it without having to save it, it doesn't matter much but is kind of an annoyance that I have to do that.