According to Inmon, famous author for several data warehouse books, "A data warehouse is a subject oriented, integrated, time variant, non volatile collection of data in support of management's decision making process".
DW & Data Mart
A data warehouse is a relational/multidimensional database that is designed for query and analysis rather than transaction processing. A data warehouse usually contains historical data that is derived from transaction data. It separates analysis workload from transaction workload and enables a business to consolidate data from several sources.
Star Schema is a relational database schema for representing multimensional data. It is the simplest form of data warehouse schema that contains one or more dimensions and fact tables. It is called a star schema because the entity-relationship diagram between dimensions and fact tables resembles a star where one fact table is connected to multiple dimensions. The center of the star schema consists of a large fact table and it points towards the dimension tables. The advantage of star schema are slicing down, performance increase and easy understanding of data.
A snowflake schema is a term that describes a star schema structure normalized through the use of outrigger tables. i.e dimension table hierachies are broken into simpler tables. In star schema example we had 4 dimensions like location, product, time, organization and a fact table(sales).
The centralized table in a star schema is called as FACT table. A fact table typically has two types of columns: those that contain facts and those that are foreign keys to dimension tables. The primary key of a fact table is usually a composite key that is made up of all of its foreign keys.
ETL Tools are meant to extract, transform and load the data into Data Warehouse for decision making. Before the evolution of ETL Tools, the above mentioned ETL process was done manually by using SQL code created by programmers. This task was tedious and cumbersome in many cases since it involved many resources, complex coding and more work hours. On top of it, maintaining the code placed a great challenge among the programmers.
These difficulties are eliminated by ETL Tools since they are very powerful and they offer many advantages in all stages of ETL process starting from extraction, data cleansing, data profiling, transformation, debuggging and loading into data warehouse when compared to the old method.
There are a number of ETL tools available in the market to do ETL process the data according to business/technical requirements. Following are some those.
Extraction, transformation, and loading. ETL refers to the methods involved in accessing and manipulating source data and loading it into target database.
Informatica is a powerful ETL tool from Informatica Corporation, a leading provider of enterprise data integration software and ETL softwares.
Informatica - Transformations
In Informatica, Transformations help to transform the source data according to the requirements of target system and it ensures the quality of the data being loaded into target.
ETL Solution's product "Transformation Manager(TM)" is a stand-alone Windows® or Linux® software suite of meta-data driven code generator programs that provides for the authoring, testing, and debugging of data transforms between virtually all types of data, whether XML, databases, flat files, JavaT classes or spreadsheets, with flexible deployment options in both J2EET and Microsoft®.NET architectures.
Database - RDBMS
There are a number of relational databases to store data. A relational database contains normalized data stored in tables. Tables contain records and columns. RDBMS makes it easy to work with individual records. Each row contains a unique instance of data for the categories defined by the columns