Django-celery Infrastructure Over Multiple Servers, Broker Is Redis
Solution 1:
What will strongly simplify your processing is some shared storage, accessible from all cooperating servers. With such design, you may distribute the work among more servers without worrying on which server will be next processing step done.
Using AWS S3 (or similar) cloud storage
If you can use some cloud storage, like AWS S3, use that.
In case you have your servers running at AWS too, you do not pay for traffic within the same region, and transfers are quite fast.
Main advantage is, your data are available from all the servers under the same bucket/key name, so you do not have to bother about who is processing which file, as all have shared storage on S3.
note: If you need to get rid of old files, you may even set up some policy file on give bucket, e.g. to delete files older than 1 day or 1 week.
Using other types of shared storage
There are more options
- Samba
- central file server
- FTP
- Google storage (very similar to AWS S3)
- Swift (from OpenStack)
- etc.
For small files you could even use Redis, but such solutions are for good reasons rather rare.
Solution 2:
Celery actually makes this pretty simple, since you're already putting the tasks on a queue. All that changes with more workers is that each worker takes whatever's next on the queue - so multiple workers can process at once, each on their own machine.
There's three parts to this, and you've already got one of them.
- Shared storage, so that all machines can access the same files
- A broker that can hand out tasks to multiple workers - redis is fine for that
- Workers on multiple machines
Here's how you set it up:
- User uploads file to front-end server, which stores in your shared storage (e.g. S3, Samba, NFS, whatever), and stores the reference in the database
- Front-end server kicks off a celery task to process the file e.g.
def my_view(request):
# ... deal with storing the file
file_in_db = store_file(request)
my_process_file_task.delay(file_in_db.id) # Use PK of DB record
# do rest of view logic...
- On each processing machine, run celery-worker:
python manage.py celery worker --loglevel=INFO -Q default -E
Then as you add more machines, you'll have more workers and the work will be split between them.
Key things to ensure:
- You must have shared storage, or this gets much more complicated
- Every worker machine must have the right Django/Celery settings to be able to find the redis broker and the shared storage (e.g. S3 bucket, keys etc)
Post a Comment for "Django-celery Infrastructure Over Multiple Servers, Broker Is Redis"