FIDAL - Financial Data Access Library

  Home | Documentation | Download
 

 

Data Management

Design Draft

horizontal rule

This document describes some NEW features planned for the data management of FIDAL.

Some of the feature discuss in this document are not yet implement. User of the library should read the API document before all.

The interface implementation is a C header file c\include\fidal.h

horizontal rule

Protection Scheme

(not implemented)
For now, FIDAL implements a "merging" logic that works down to each daily price bar. This works great assuming all data source are calculated on the same basis. As an example, all data source needs to be split/dividend adjusted to the same date. This requires the user to have a deep understanding (and control) over the data returned by each data sources. This is far from a sure thing.

The goal of a good protection scheme is to provide an alternative source when another fail, but that should never come at the cost of building a time series with doubtful validity. Consequently, it might be better to return data from ONE and ONLY ONE selected data source for a given FD_HistoryAlloc call. The choice of selection logic are:

Flags Rules
 FD_SELECT_BY_SRT
  1. (Size) Select the source that provides the largest amount of data.

  2. (Rank) If tie, select the source that has the highest user definable ranking.

  3. (Time) If tie, select the data source that provides the most recent data.

 FD_SELECT_BY_TRS
  1. (Time) Select the data source that provides the most recent data.

  2. (Rank) If tie, select the source that has the highest user definable ranking.

  3. (Size) If tie, select the source that provides the largest amount of data.

 FD_SELECT_BY_RST
  1. (Rank) Select the source that has the highest user definable ranking.

  2. (Size) If tie, select the source that provides the largest amount of data.

  3. (Time) If tie, select the data source that provides the most recent data.

 FD_SELECT_BY_TSR
  1. (Time) Select the data source that provides the most recent data.

  2. (Size) If tie, select the source that provides the largest amount of data.

  3. (Rank) If tie, select the source that has the highest user definable ranking.

 FD_SELECT_BY_STR
  1. (Size) Select the source that provides the largest amount of data.

  2. (Time) If tie, select the data source that provides the most recent data.

  3. (Rank) If tie, select the source that has the highest user definable ranking.

 FD_SELECT_BY_RTS
  1. (Rank) Select the source that has the highest user definable ranking.

  2. (Time) If tie, select the data source that provides the most recent data.

  3. (Size) If tie, select the source that provides the largest amount of data.

If after these selection rules, there is still a tie, the first data source who was added with FD_AddDataSource is selected.

The user definable ranking is an optional parameter to FD_AddDataSource() and is simply an integer value (higher value means higher ranking).

When none of these flag is used, FIDAL performs the data source merging logic.

Source Naming

(not implemented)
It might be useful to the user to know which data source did contribute data. The individual data source are simply identified with a string. The user specify the string when the data source is added with FD_AddDataSource. When not specified a default string is provided by FIDAL. The source name does not need to be unique.

Example:

/* Display the list of data source that did
 * contribute to make this FD_History.
 */
FD_History *history;
retCode = FD_HistoryAlloc( ..., &history );
if( retCode == FD_SUCCESS )
{
   for(i=0; i < history->nbSource; i++ )
   {
      printf( "%s\n", history->source[i] );
   }
}

Only the contributing sources are specified, consequently, you will get more than one source listed only when FIDAL does data merging.

Now that data source are named, the user will have the option to name a specific data source when calling FD_HistoryAlloc(), FD_QuoteAlloc() and FD_InfoAlloc().

Example:

/* Get data but use exclusively the named data source "CSI" */
FD_History *history;
retCode = FD_HistoryAlloc( udb, "CSI", "US.NASDAQ.STOCK",
                           "MSFT", ..., &history );

The possibility to request from a particular data source is a new feature planned for 0.1.2.

Meta-Data

(not implemented)
Some data source might provides additional pieces of information for a given category/symbol e.g. Company Name, CUSIP, EPS... The data is always return as a string.

Each data source can define their own set of meta-data. In fact, from an implementation point of view, the meta-data requested is quite "transparent" to FIDAL. It is just pass down to each data source "driver" that evaluate if they can resolve or not the requested meta-data. It is obvious that FIDAL usefulness is to eventually get multiple data source to use the same meta-data identifier e.g. "CompanyName" would be understood in the same way by all data source driver. Eventually, a guideline should be written.

Example 1:

/* Get the company name. */
FD_Info *info;
retCode = FD_InfoAlloc( udb, NULL, "US.NYSE.STOCK", "IBM",
                        "CompanyName", &info );
if( retCode == FD_SUCCESS )
{
   for( i=0; i < info->nbValue; i++ )
      printf( "Company Name [%s]", info->value[i] );
   FD_InfoFree(info);
}

Example 2:

/* Get the earnings. In addition, display which data
 * source provides the data, and the timestamp for each value.
 */
FD_Info *info;
retCode = FD_InfoAlloc( udb, NULL, "US.NYSE.STOCK", "IBM",
                        "Earnings", &info );
if( retCode == FD_SUCCESS )
{
   for( i=0; i < info->nbValue; i++ )
   {
      int year  = FD_GetYear( info->timestamp[i] );
      int month = FD_GetMonth( info->timestamp[i] );
      int day  = FD_GetDay( info->timestamp[i] );
      printf( "%d/%d/%d ", month, day, year);
      printf( "Earnings [%s] Source [%s]", info->value[i], info->source[i] );
   }
   FD_InfoFree(info);
}

Example 3:

/* Same as previous, but this time get the data from only
 * a particular named data source ("MySource").
 */
FD_Info *info;
retCode = FD_InfoAlloc( udb, "MySource", "US.NYSE.STOCK", "IBM",
                        "Earnings", &info );

Data Retrieval Timeout

(not implemented)
When doing FD_AddDataSource, two optional timeout can be specified. These timeouts will be used to help terminate prematurely the call of respectively FD_HistoryAlloc and FD_QuoteAlloc. These calls terminates prematurely with a FD_TIMEOUT return code when ALL the involved data sources do timeout.

Quote

(not implemented)
FD_QuoteAlloc() and FD_QuoteFree() functions would allow to retrieve the most recent market price (last,bid,ask etc...). Only one quote is returned (the most up-to-date). Data source are added with FD_AddDataSource.

Example:


FD_Quote *quote;
retCode = FD_QuoteAlloc( udb, NULL, "US.NYSE.STOCK", "IBM", FD_ALL, &quote );
if( retCode == FD_SUCCESS )
{
   int hour = FD_GetHour( quote->timestamp[0] );
   int min  = FD_GetMin ( quote->timestamp[0] );
   int sec  = FD_GetSec ( quote->timestamp[0] );
   printf( "Time %d:%d:%d", hour, min, sec );
   if( quote->last )
      printf( "Last = %f\n",  quote->last[0] );
   if( quote->ask )
      printf( "Ask = %f\n",  quote->ask[0] );
   /* ... */
   printf( "Source [%s]\n", quote->source[0] );
   FD_QuoteFree(quote);
}

Begin versus End Timestamp Boundary

FIDAL support two interpretation of timestamps. The user can choose their favorite interpretation (begin versus end period), while the data source can independently provide the data using a begin versus end period. FIDAL will automatically convert the timestamps to the user request interpretation.

This functionality is implemented, but not with the most speed efficient approach. The following just provides details of how the code will progressively implement the optimum version. The goal is to minimize the number of conversions.

Implementation plan (long term)

All "Mix" are about scenario of merging of data source with a mix of begin and end logic.
begin->end : Timestamp converted from begin logic to end logic.
end->begin : Timestamp converted from end logic to begin logic.
The consolidation logic is slightly different depending if begin/end logic of its input.

Alias

(not implemented)
It will be useful to be able to build "soft-links" into an unified database. That way, a user can map new categories/symbols by referring to other existing entries. Alias can be used as any other category/symbol entries with FD_HistoryAlloc and others. List of alias could be saved/load from one or multiple ASCII CSV files.

File Format is:
      <Alias Category>,<Alias Symbol>,<Target Category>,<Target Symbol>

Example of Alias File:

MyStockList,MSFT,US.NASDAQ.STOCK,MSFT
MyStockList,IBM,US.NYSE.STOCK,IBM
List.Energy.Coal,BTU,US.NYSE.STOCK,BTU
List.Basic Material.Paper & Paper Products,ABY,US.NYSE.ABY,ABY
List.NASDAQ Most Active,1,US.NASDAQ.STOCK,LNUX,
List.NASDAQ Most Active,2,US.NASDAQ.STOCK,MSFT,
List.NASDAQ Most Active,3,US.NASDAQ.STOCK,CIEN,
List.Correlation.1,1,US.NASDAQ.STOCK,CIEN
List.Correlation.1,2,US.NASDAQ.STOCK,SCMR
List.Correlation.1,3,US.NASDAQ.STOCK,CIEN
NASDAQ,*,US.NASDAQ.STOCK,*

First two lines put IBM and MSFT into your MyStockList category. You are solely responsible to make sure you are using unique symbols in MyStockList. In case of conflict, the last added alias replace the existing alias. A suggestion is to merge the target category and target symbol into being the alias symbol e.g. put in MyStockList the symbols "MSFT@US.NASDAQ.STOCK" and "IBM@US.NYSE.STOCK".

Line 3 and 4 shows how someone can build a Sector/Industry list.

Line 5, 6 and 7 shows that the category and symbol alias name does not have to be related to the target name at all. Here a list of most active is build using numbers as symbol names. Function will be provided to resolve the "root" of a category/symbols. That way, the user will be able to figure out that "1" maps to LNUX in US.NASDAQ.STOCK.

Line 8, 9 and 10 shows another example where lists might be created base on data correlation. This is one of the long term goal of FIDAL to offer such correlation functions.

The last line shows how a complete category can be aliased. Here the user will be able to access the same list of symbols using either "NASDAQ" or "US.NASDAQ.STOCK" category.

Implementation details: creation/deletion/save/load of alias is mostly under the control of the FIDAL user. This functionality will be also accessible to the FIDAL data source driver. That way, a driver can map the same "data" into multiple distinctive category/symbol naming conventions.

Realtime Update

Approach #1 - Polling (implemented)
Everytime the function FD_HistoryAlloc() is being called, a new time series is built from scratch. In a way, making periodic call to FD_HistoryAlloc() allows an application to get their hands on the most recent data. This polling approach can be used with FIDAL right now without further development.

Approach #2 - Callback on change (not implemented)
Almost as simple as polling, except that the application can register a callback function for a given source/category/symbol. FIDAL will perform a callback every time new data is available. The application can then perform a new FD_HistoryAlloc() to get the most recent data.

Approach #3 - Dynamic Update (no implemented, may be never)
A different approach is to provide dynamically only the "changes" to the application. This is the "natural" approach that seems to be preferred when I talked with many developers. It looks nice in principle, but personally I think it comes at the cost of unnecessary complexity at the application level. Think about coordinating for the handling of corrections and/or data source failover Overall, your application needs to build a time series at one point or another, so why not leave it to FIDAL to build it with FD_HistoryAlloc()? Of course, rebuilding from scratch every time has its speed cost, but this still needs to be evaluated how bad it is in practice. You need to monitor 100000 symbols in real-time? My approach would be to make the application distributed/redundant/recoverable instead of trying to cram everything on one server and application. With a distributed approach, overall scaling is better, and rebuilding the historical data from scratch is less of a factor (I would even re-perform the analysis of it from scratch).

Google  SourceForge Logo
  Web FidalSoft.org
 

Copyright? 2006 TicTacTec LLC. All Rights Reserved. Last Update: 07/21/06, Unique Visitor: