Alipay, Hangzhou

Archive for the ‘Hadoop’ Category

Make MySQL, Hadoop and Hive Work Together

Friday, July 6th, 2012

Companies are always looking for cheaper solutions of data storage. Recently some of them switched RDBMS from ORACLE to PostgreSQL, then switched ETL from ORACLE RAC to Greenplum.

In a lot of Web 2.0 companies, MySQL is the first choice of RDBMS. But MySQL is not good at data warehouse. Not only it is much more difficult to scale out than RAC and Greenplum are, but also the parser module inside MySQL is not as delicate as ORACLE. That’s the same reason we cannot use MongoDB or other NoSQL database for data warehouse. How to glue RDBMS and ETL perfectly in a cheap way is very interesting to a DBA.

In my current company, I find that MySQL and Hive work pretty well together. Here is a picture to show the whole architect.

(more…)

Database Structure of Hive’s Metastore

Monday, July 2nd, 2012