Value Error: Negative Dimensions Are Not Allowed When Merging
Solution 1:
On a 32-bit machine, the default NumPy integer dtype is int32
.
On a 64-bit machine, the default NumPy integer dtype is int64
.
The largest integers representable by an int32
and int64
are:
In[88]: np.iinfo('int32').maxOut[88]: 2147483647In[87]: np.iinfo('int64').maxOut[87]: 9223372036854775807
So the integer index created by pd.merge
will support a maximum of 2147483647 = 2**31-1
rows on a 32-bit machine, and 9223372036854775807 = 2**63-1
rows on a 64-bit machine.
In theory, two 290000 row DataFrames merged with an outer
join may have as many as 290000**2 = 84100000000
rows. Since
In [89]: 290000**2 > np.iinfo('int32').max
Out[89]: True
the 32-bit machine may not be able to generate an integer index big enough to index the merged result.
And although the 64-bit machine can in theory generate an integer index big enough to accommodate the result, you may not have enough memory to build a 84 billion-row DataFrame.
Now, of course, the merged DataFrame may have fewer than 84 billion rows (the exact number depends on how many duplicate values appear in df1['POINTID']
and df2['POINTID']
) but the above back-of-the envelope calculation shows that the behavior you are seeing is consistent with having a lot of duplicates.
PS. You can get negative values when adding or multiplying positive integers in NumPy arrays if there is arithmetic overflow:
In [92]: np.int32(290000)*np.int32(290000)
Out[92]: -1799345920
My guess is that this is the reason for the exception:
ValueError: negative dimensions are not allowed
Post a Comment for "Value Error: Negative Dimensions Are Not Allowed When Merging"