Data Warehousing

Data warehousing is the electronic storage of a large amount of information by a business. Warehoused data must be stored in a manner that is secure, reliable, easy to retrieve and easy to manage. The concept of data warehousing originated in 1988 with the work of IBM researchers Barry Devlin and Paul Murphy. The need to warehouse data evolved as computer systems became more complex and handled increasing amounts of data

.

Data warehousing is an electronic method of organizing information.

A data warehouse essentially combines information from several sources into one comprehensive database. For example, in the business world, a data warehouse might incorporate customer information from a company’s point-of-sale systems (the cash registers), its website, its mailing lists and its comment cards. Alternatively, it might incorporate all the information about employees, including time cards, demographic data, salary information, etc.

By combining all of this information in one place, a company can analyze its customers in a more holistic way, ensuring that it has considered all the information available. Data warehousing also makes data mining possible, which is the task of looking for patterns in the data that could lead to higher sales and profits.

 

Data warehousing emphasizes the capture of data from diverse sources for useful analysis and access, but does not generally start from the point-of-view of the end user who may need access to specialized, sometimes local databases. The latter idea is known as the data mart.

Approaches of Data Warehouse :

There are two approaches to data warehousing, top down and bottom up. The top down approach spins off data marts for specific groups of users after the complete data warehouse has been created. The bottom up approach builds the data marts first and then combines them into a single, all-encompassing data warehouse.

Typically, a data warehouse is housed on an enterprise mainframe server or increasingly, in the cloud. Data from various online transaction processing (OLTP) applications and other sources is selectively extracted for use by analytical applications and user queries

 

 

There are three different mechanisms to load the data

  • FastLoad
  • MultiLoad
  • TPump

Fast Load  : is design to load flat file from the main frame or Lan directly in to an empty data table or block level  . FastLoad is a command-driven utility to load large data into an empty table on a Relational Database Management System (RDBMS).

MultiLoad (sorted)

 

Tpump  : locks an individual row

If we make a change on dept_no 50 having Hash value [1100] , only this row will be locked . other rows are available for access T