Skip to content Skip to sidebar Skip to footer

Pareto Distribution: R Vs Python - Different Results

I'm trying to replicate R's fitdist() results (reference, cannot modify R code) in Python using scipy.stats. The results are totally different. Does anyone know why? How can I repl

Solution 1:

The discrepancy is mainly due to the differing pdfs.

Python

In python st.pareto.fit() uses a Pareto distribution defined via this pdf:

enter image description here

import scipy.stats as st
data = [2457.145, 1399.034, 20000.0, 476743.9, 24059.6, 28862.8]
print(st.pareto.fit(data, floc = 0, scale = 1))

# (0.4019785013487883, 0, 1399.0339889072732)

R

Whereas your R code is using a Pareto with this pdf:

enter image description here

library(fitdistrplus)
library(actuar)
data <- c(2457.145, 1399.034, 20000.0, 476743.9, 24059.6, 28862.8)
fitdist(data, 'pareto', "mle")$estimate

#    shape        scale 
#    0.76016410066.274196

Make R Mirror Python

To get R to use the same distribution as st.pareto.fit() use actuar::dpareto1():

library(fitdistrplus)
library(actuar)
data <- c(2457.145, 1399.034, 20000.0, 476743.9, 24059.6, 28862.8)
fitdist(data, 'pareto1', "mle")$estimate

#     shape          min 
#   0.40289211399.0284977

Make Python Mirror R

And here is one approach to approximate your R code in Python:

import numpy as np
from scipy.optimize import minimize

defdpareto(x, shape, scale):
    return shape * scale**shape / (x + scale)**(shape + 1)

defnegloglik(x):
    data = [2457.145, 1399.034, 20000.0, 476743.9, 24059.6, 28862.8]
    return -np.sum([np.log(dpareto(i, x[0], x[1])) for i in data])

res = minimize(negloglik, (1, 1), method='Nelder-Mead', tol=2.220446e-16)
print(res.x)

# [7.60082820e-01 1.00691719e+04]

Post a Comment for "Pareto Distribution: R Vs Python - Different Results"