FerencMantfeld
22nd January 2010, 03:54 PM
Since BI Plus supports many data objects, there is sometimes confusion as to which approach to take.
there are basically 3 types of data objects:
1. Memory-based
2. Repository-based (database)
3. Inline View.
A Memory based data object actually loads the data into HSQLDB, which is tightly integrated with BI Plus and is seamless to the end user. This database is a pure-java memory-resident database. Every record written to HQSLDB will consume more memory. Provided there is no page fault or excessive swapping at the O/S level, performance should be very fast.
Repository based data objects. These data objects write their contents to the database that hosts the repository. Prior to BI Plus 3.5, Oracle was the only option for the repository (since the engine was written in PL/SQL). Since 3.5 the engine has been rewritten and now runs on the seamlessly integrated java database (Derby). the limitations with Oracle are that if you are running Oracle XE (Express Edition) 10.2 the physical size of the USERS tablespace was capped at 4 GB. Any (Oracle) Repository-based data objects would be eating into this storage. 4 GB is a LOT of data for most data objects. With 3.5 when running non-Oracle (Derby), the size of the DB is limited only by the amount of disk space on the file system.
Inline View Data Objects. These data objects do not materialize any data at any time, but rather transform complex queries into inline views and they support the native underlying database constructs and are applicable to ANY datasource (never tried Excel though or any Jet datasource). So this is a select from a select statement. An example would be to show data skewing, like:
select CNT, count (Cnt) as HowManyValues from
(select postal_code, Count (*) as CNT from Sales group by postal_code) IL
The piece (subqurey) in the 2nd line above represents the inline view. It is resolved first, then its dataset is fed to the out outer query to select from.
When your number of rows are small (< 20,000 ), it make a lot of sense to run from memory. HSQLDB supports ANSI-SQL and is pretty limited in its SQL abilities when compared to more ribust database engines like Oracle or Sql Server or Postgres. The HSQLDB documentation can be found here: http://hsqldb.org/web/hsqlDocsFrame.html
When you want to do some powerful data manipulation and you run on an Oracle repository, you can use native Oracle syntax to achieve this but setting the data object to reside in Oracle. Be careful about the number of rows. Processing millions of rows this way is highly discouraged.
Data Objects are used to integrate data from disparate sources and should conform to the guidelines above.
When you have a large set of data (let's say 10 million rows), it makes no sense to go to the trouble of creating a data object, even if you're going to cache it. In this instance, it is recommended that you run some ETL processes and populate a staging table in a target database and then point your reports to that database.
there are basically 3 types of data objects:
1. Memory-based
2. Repository-based (database)
3. Inline View.
A Memory based data object actually loads the data into HSQLDB, which is tightly integrated with BI Plus and is seamless to the end user. This database is a pure-java memory-resident database. Every record written to HQSLDB will consume more memory. Provided there is no page fault or excessive swapping at the O/S level, performance should be very fast.
Repository based data objects. These data objects write their contents to the database that hosts the repository. Prior to BI Plus 3.5, Oracle was the only option for the repository (since the engine was written in PL/SQL). Since 3.5 the engine has been rewritten and now runs on the seamlessly integrated java database (Derby). the limitations with Oracle are that if you are running Oracle XE (Express Edition) 10.2 the physical size of the USERS tablespace was capped at 4 GB. Any (Oracle) Repository-based data objects would be eating into this storage. 4 GB is a LOT of data for most data objects. With 3.5 when running non-Oracle (Derby), the size of the DB is limited only by the amount of disk space on the file system.
Inline View Data Objects. These data objects do not materialize any data at any time, but rather transform complex queries into inline views and they support the native underlying database constructs and are applicable to ANY datasource (never tried Excel though or any Jet datasource). So this is a select from a select statement. An example would be to show data skewing, like:
select CNT, count (Cnt) as HowManyValues from
(select postal_code, Count (*) as CNT from Sales group by postal_code) IL
The piece (subqurey) in the 2nd line above represents the inline view. It is resolved first, then its dataset is fed to the out outer query to select from.
When your number of rows are small (< 20,000 ), it make a lot of sense to run from memory. HSQLDB supports ANSI-SQL and is pretty limited in its SQL abilities when compared to more ribust database engines like Oracle or Sql Server or Postgres. The HSQLDB documentation can be found here: http://hsqldb.org/web/hsqlDocsFrame.html
When you want to do some powerful data manipulation and you run on an Oracle repository, you can use native Oracle syntax to achieve this but setting the data object to reside in Oracle. Be careful about the number of rows. Processing millions of rows this way is highly discouraged.
Data Objects are used to integrate data from disparate sources and should conform to the guidelines above.
When you have a large set of data (let's say 10 million rows), it makes no sense to go to the trouble of creating a data object, even if you're going to cache it. In this instance, it is recommended that you run some ETL processes and populate a staging table in a target database and then point your reports to that database.