Launch an instance with sufficient memory and cpu (160G SSD disk, and c3.large), set up with the right security configuration
Install these on the instance:
sudo apt-get update
sudo apt-get install awscli
sudo apt-get install postgresql-client-common
sudo apt-get install postgresql-client
sudo apt-get install pigz
sudo apt-get install htop
sudo apt-get install python-pip
sudo pip install boto
Run aws configure to set up your AWS keys
Setup the following env parameters (used by different modules):
export AWS_ACCESS_KEY_ID=<ke>
export AWS_SECRET_ACCESS_KEY=<secret>
export aws_access_key_id=<key>
export aws_secret_access_key=<secret>
unset PGPASSWORD
Create ~/.pgpass for psql to run from bash. On each for RDS replica, and Redshift cluster with the following format: hostname:port:database:user:password
chmod 600 ~/.pgpass
Create a directory named Copy to the bash script for copying data from S3 to Redshift
Copy these scripts into the home directory
upload.py
get_data.sh
get_count.sh
The script is set up to import daily tables from device_sensor and tracker_motion.
python upload.py [table-name-prefix] [YYYY] [MM] [DD] [download_data] [split_data]
To migrate device_sensor for 2015-08-12:
python upload.py device_sensors_par 2015 08 12 yes yes > migrate_2015_08_12.log 2>&1
The sequence of events:
device_sensors_par_2015_08_12 will be createddevice_sensors_par_2015_08_12.csvdevice_sensors_par_2015_08_12-00000[main-bucket]/device_sensors_2015_08/2015_08_12/copy_device_sensors_par_2015_08_12.sh in the Copy directory with the command to upload data to Redshift.