We currently have a MySQL database table that holds about 2 TB of data in approximately 9 billion rows. Each row contains one second of power data from an electricity meter, with 31 columns in each row. The data is indexed on a date-time, a controller id and a power meter id.
The database is a production database, and the table is constantly being written to. We intend to create a read replica of the database, and then break the link between the read replica and the master. From the read replica, we'll be able to dump the data to AWS Glacier without impacting our production system. However, we would like to put in place a service that will keep this table to a more manageable size.
* The service should be written in Node JS >= v5.0.0.
* The service should comply with the AirBnB style guide.
* The service should be written as an NPM module, i.e. it can be required into another Node JS application.
* The service should not severely impact the production database, i.e. it should minimise the number of connections & resources required on the database, it should not try to pull all the data in one query. The service can take its time to complete.
* The service should only archive data that is over a year old.
* For data that is between 1 and 2 years old, the service should archive it to S3.
* For data that is over 2 years old, it should archive it to Glacier.
* The service should also scan S3 for files that are older than 2 years and migrate them to Glacier.
* The archived data should be stored in CSV files, with each file containing the data for one controller id and one meter id, with the starting date-time, controller id and meter id in the filename.
* Each file should contain one week of data.
* The service must be able to restart from where it left off if it is restarted, losses connection to the database etc.
* The completed project must contain tests.
* Applicants should be able to provide a outline of how they will implement their solution.
* All code to be stored in a private git repo provided by us.
* The service can use the AWS SDK and any available AWS resources.
Sample data will be made available for development and testing purposes.
Applicants should provide a cost and time estimate and examples of their previous work.