Skip to content Skip to sidebar Skip to footer

Transferring Data From Product Datastore To Local Development Environment Datastore In Google App Engine (Python)

TL;DR I need to find a real solution to download my data from product datastore and load it to the local development environment. The detailed problem: I need to test my app in loc

Solution 1:

It sounds like you should be using a remote sandbox

Even if you get this to work, the localhost datastore still behaves differently than the actual datastore.

If you want to truly simulate your production environment, then i would recommend setting up a clone of your app engine project as a remote sandbox. You could deploy your app to a new gae project id appcfg.py update . -A sandbox-id, and use datastore admin to create a backup of production in google cloud storage and then use datastore admin in your sandbox to restore this backup in your sandbox.

Cloning production data into localhost

I do prime my localhost datastore with some production data, but this is not a complete clone. Just the core required objects and a few test users.

To do this I wrote a google dataflow job that exports select models and saves them in google cloud storage in jsonl format. Then on my local host I have an endpoint called /init/ which launches a taskqueue job to download these exports and import them.

To do this i reuse my JSON REST handler code which is able to convert any model to json and vice versa.

In theory you could do this for your entire datastore.

EDIT - This is what my to-json/from-json code looks like:

All of my ndb.Models subclass my BaseModel which has generic conversion code:

get_dto_typemap = {
    ndb.DateTimeProperty: dt_to_timestamp,
    ndb.KeyProperty: key_to_dto,
    ndb.StringProperty: str_to_dto,
    ndb.EnumProperty: str,
}
set_from_dto_typemap = {
    ndb.DateTimeProperty: timestamp_to_dt,
    ndb.KeyProperty: dto_to_key,
    ndb.FloatProperty: float_from_dto,
    ndb.StringProperty: strip,
    ndb.BlobProperty: str,
    ndb.IntegerProperty: int,
}

class BaseModel(ndb.Model):

    def to_dto(self):
        dto = {'key': key_to_dto(self.key)}
        for name, obj in self._properties.iteritems():
            key = obj._name
            value = getattr(self, obj._name)
            if obj.__class__ in get_dto_typemap:
                if obj._repeated:
                    value = [get_dto_typemap[obj.__class__](v) for v in value]
                else:
                    value = get_dto_typemap[obj.__class__](value)
            dto[key] = value
        return dto

    def set_from_dto(self, dto):
        for name, obj in self._properties.iteritems():
            if isinstance(obj, ndb.ComputedProperty):
                continue
            key = obj._name
            if key in dto:
                value = dto[key]
                if not obj._repeated and obj.__class__ in set_from_dto_typemap:
                    try:
                        value = set_from_dto_typemap[obj.__class__](value)
                    except Exception as e:
                        raise Exception("Error setting "+self.__class__.__name__+"."+str(key)+" to '"+str(value) + "': " + e.message)
                try:
                    setattr(self, obj._name, value)
                except Exception as e:
                    print dir(obj)
                    raise Exception("Error setting "+self.__class__.__name__+"."+str(key)+" to '"+str(value)+"': "+e.message)

class User(BaseModel):
    # user fields, etc

My request handlers then use set_from_dto & to_dto like this (BaseHandler also provides some convenience methods for converting json payloads to python dicts and what not):

class RestHandler(BaseHandler):
    MODEL = None

    def put(self, resource_id=None):
        if resource_id:
            obj = ndb.Key(self.MODEL, urlsafe=resource_id).get()
            if obj:
                obj.set_from_dto(self.json_body)
                obj.put()
                return obj.to_dto()
            else:
                self.abort(422, "Unknown id")
        else:
            self.abort(405)

    def post(self, resource_id=None):
        if resource_id:
            self.abort(405)
        else:
            obj = self.MODEL()
            obj.set_from_dto(self.json_body)
            obj.put()
            return obj.to_dto()

    def get(self, resource_id=None):
        if resource_id:
            obj = ndb.Key(self.MODEL, urlsafe=resource_id).get()
            if obj:
                return obj.to_dto()
            else:
                self.abort(422, "Unknown id")
        else:
            cursor_key = self.request.GET.pop('$cursor', None)
            limit = max(min(200, self.request.GET.pop('$limit', 200)), 10)
            qs = self.MODEL.query()
            # ... other code that handles query params
            results, next_cursor, more = qs.fetch_page(limit, start_cursor=cursor)
            return {
                '$cursor': next_cursor.urlsafe() if more else None,
                'results': [result.to_dto() for result in results],
            }

class UserHandler(RestHandler):
    MODEL = User

Post a Comment for "Transferring Data From Product Datastore To Local Development Environment Datastore In Google App Engine (Python)"