Building a Data Warehouse Step 3 — Designing Master Data and Transactional Tables for a Data Warehouse


Designing Tables for a Data Warehouse Requires Some Thought

In this next edition of our series on building a data warehouse, we are going to focus on designing master data and transactional tables for a data warehouse.  Our example is going relatively basic, but even with at a rudimentary level of complexity we can generate copious amounts of insightful data analysis.  The source accounting/ERP system generating the data used in this example could be coming from anyone of the typical midsize business systems, such as Fishbowl, Sage xx0 or even QuickBooks. We have an example of the final dashboard that you can view to better understand the final goal of this series.

We describe specifics for designing master data and transaction tables for a data warehouse, that support a midsize business data analytics needs. Click to Tweet

Access Tables Used in Our Data Warehouse

The list of tables in our Access database is displayed below.  We will describe each table in more detail in a moment.  The Calendar table was described in our earlier post that explained how to build an accounting calendar, which you may want to review if you have not already done so.

Our data warehouse application consists of three master data tables, including: 1)  MasterCust for customer master; 2) MasterProdHier for Product IDs and other dimensions pertaining to Products/Parts; and 3) MasterReps as the Sales Rep Master.  There are also three transactional tables as follows: 1) ME_Inventory contains the month-end physical inventory status; 2) TransactionShipments contains the daily shipment history for each invoice line item and 3) OutstandingPO has the purchase order history at each month end.  The ReorderLevels table is more of a scratch pad.

Image from Access Object list with the 8 tables used in our data warehouse (Calendar, Customer Master, Product Master, Sales Rep Master, Month End Inventory, Month End Shipments, Outstanding POs and Reorder Levels.

Master Product/Part Table with Product Hierarchy

The MasterProdHier table contains most of the important dimensions required for our data warehouse application.  The focus of our application is go to be finished goods, but raw material could be referenced in this table as well, though we would want to add a field identifying which parts were finished goods and which were raw materials.

List if fields with data types for Product Master.

Each of the fields are as follows:

  • PartNum: Unique part number
  • SKU: Stock Keeping Unit or generally the customer facing part number or a UPC code.
  • Item Desc: Brief item description
  • UnitOfMeas: Unit of Measure or UOM
  • PkgQnty: Quantity contained in a single unit.  So if the UOM is ounces and there are 16 ounces in a single sale unit, then the PkgQnty is 16.
  • Service Level: This is a statistical measure to quantity the probability that an item will be in the warehouse when the next order is received.  One minus the Service Level probability is the probability of a stock out.  For example, if there’s a 95% service level for a given product, then there is a 5% chance of a stock out at any given time. 
  • Lead Time Months: This is the number of months required for the time a new Purchase Order is confirmed with a vendor until the time that the stock replenishment is received in our warehouse.
  • Case Qnty: This customer sells by eaches and cases.  This is the number of units in a case.
  • Status: Active or Inactive
  • ROP: Reorder Point
  • OUL: Order Up To Level
  • Lvln: Product Hierarchy level.  The levels are a bit artibrary–Segment, Line, Family and Type, but it is meant as a way to logically organize products.  In addition to being able to perform summaries by the hierarchy, some systems allow you to set up customer discounts based on the product hierarchy.  There are many uses for a product hierarchy, but that is beyond the scope of this post.

Customer Master Table

Our Customer Master table (MasterCust) is rather basic in that it does not even the customer location, other than state, but it does serve its purpose. 

List if fields with data types for Customer Master.

Each of the fields is as follows:

  • Name: Customer Name
  • Top 10: Logical to reflect if this is one of business’ Top 10 customers
  • CustGroup: Field to group like customers into a single group name
  • State: State for bill to address

Sales Rep Master

This is a simple listing of the sales reps.  If desired, you could group sales reps by region or territory to produce summaries.  But in our example, we have a small number of sales reps.

List if fields with data types for Sales Rep Master.

Each of the fields is as follows:

  • Rep: The short rep name, usually the first name.
  • RepName: The full rep name.

Month-End Inventory Transactional Table

This table is used for a number of analyses, including looking at how many days of inventory are hand for each item, as well as reorder statistics.

List if fields with data types for month-end Inventory transactional table.

Each of the fields is as follows:

  • PartNum: Part Number
  • ME_Date: Month End date in YYYY-MM format
  • On_Hand: Quantity on hand at month end
  • dME_Date: The number Month-End Date
  • Ext_Cost: The extended cost of the inventory at month end.

Daily Shipment History Transactional Table

Each record in this table was a line item on an invoice. 

List if fields with data types for daily Shipment transactional table.

Each of the fields is as follows:

  • Type: Transaction type, e.g. invoice, credit memo, debit memo, etc.
  • Num: Sales Order number
  • PartNum: Part/Product number
  • Customer: Customer name
  • Ship Date: Ship date
  • Rep: Sales Rep
  • TotPrice: Total price for the line item on the invoice
  • TotCost: Total cost for the line item on the invoice
  • AvgSalesPrice: Average Sales Price (TotPrice / Units Shipped)
  • AvgUnitCost: Average Unit Cost: (TotCost / Units Shipped)
  • Units: Units shipped
  • UnitOfMeas: Unit of measure (units, ounces, pounds, feet, etc.)
  • PckQty: Pieces in a unit pack.  So if 1 pound and UOM is ounces, the PckQty would be 16.
  • GrProfit: This is the result of TotPrice – TotCost
  • Pieces: This is Units * PckQty.  If an invoice sold 10 items at 16 ounce each item, there would be 160 ounces.  Our data warehouse could then compute the profitability down to the ounce, if desired.

Outstanding POs is a transactional table

This table when combined with reorder points and units on hand provides some helpful information when making purchasing decisions.

List if fields with data types for month-end Outstanding Purchase Order transactional table.

Each of the fields is as follows:

  • ME Date: Month-end date in YYYY-MM format
  • Group: Physical inventory location
  • Vendor: Vendor
  • PO #: Purchase Order Number
  • PartNum: Part/Product Number
  • Part Desc: Description
  • Ord Qnty: Order Quantity
  • Status: Purchase Order Status
  • Due Date: Purchase Order Estimated Receipt Date
  • Unit Cost: Unit cost from vendor
  • Ext Cost: Extended cost from vendor

Reorder Levels is a Master Data table, sort of.

We just use this table to update the ROP and OUL metrics in the MasterProdHier table.

List if fields with data types for Reorder Point transactional table.

Table Relationships for our Data Warehouse

These are the relationships we established in our data warehouse.  As you can see, the TransactionalShipments is the central table in the application.

Diagram showing the table relationships and linking fields for the data warehouse.

Data Transformation

One thing that we did not address how to retrieve both master and transactional data from the accounting/ERP system to this data warehouse.  Given that there are so many different accounting/ERP systems, we decided to ignore that aspect for now.  This example was sourced from a Fishbowl Inventory system using both SQL data exports and some CSV exports.

If you would like to learn more about transforming CSV files from Fishbowl standard reports, please see our post from September 2018 to learn how we do just that.  This will be the platform that our data warehouse will utilize moving forward.  If you would like a copy of this Access file, please reach out to us.  Also, please join our mailing list for notifications and special offers from Profitwyse.

If you are a business owner that is looking for ways to leverage your data analytics, please contact us today

 

About the Author
About the Author
Chase Morrison provides CFO services, utilizing Profitwyse’s 3D Growth Platform™, enabling his business owner clients to more readily achieve their goals for wealth creation and family legacy.  Contact him today to learn how your business can hit the accelerator using Profitwyse’s proven platform.

 

Building a Data Warehouse Step 2 — How a Data Warehouse Can Leverage an Accounting Calendar


Using Calendar Dimensions in Your Financial Reporting

In our previous post, we described the structure of our Accounting Calendar and defined all the fields in the Calendar table, as well as provided a link to our a CSV file with all the detail Calendar table records, from 1/1/2008 through 12/31/2028 , that can be imported into your own application.  In this post, we are going to illustrate how a data warehouse can leverage an accounting calendar that we are sure you will find helpful.  We are going to use our accounting calendar to generate a rolling twelve-month data extraction for a Power Pivot application. 

Summarizing results by year or month or even by day is s fairly simple task, but generating a rolling twelve-month data extraction is more complicated, given that we want our application to be able to determine which twelve months are the most current without having the user provide the date ranges.

Learn how a data warehouse can leverage an accounting calendar to create time-based summaries of your transactional data to create better insight. Click to Tweet

First, create a query that can determine the inclusive dates for our twelve-month rolling window

Though we are getting ahead of ourselves, our data warehouse application has a number of both master data tables (customer master, sales rep master, etc.) and transactional tables (shipment history, inventory level history, etc.).  We will get into more details regarding these tables in the next post, but suffice for now that we have a shipment history table, named TransactionalShipments, that contains, as one of its 12 fields, a date field named Ship Date. The Ship Date field represents the date that each line item was shipped to the respective customer.

Using the Access Query Design window, we have created a query that references TransactionalShipments, and our Calendar table.  As you can see we are linking the two tables on date fields–Calendar:Date is linked to TransactionalShipments:Ship Date.  Take a look at the snippet below.

LastMonth query design using Access query design screen

The query above is going to return four fields, all of which are going to be driven by the last or max date in TransactionalShipments.  Here are the fields that will be returned:

  • MaxMMDDYYYY: This is the last date from the TransactionalShipments table and really drives all the reporting.  All the other dates from the Calendar table are a function of the last date of transactional data in TransactionalShipments, which is 9/28/2018.
  • MaxYYYY_MM: This comes from the Calendar table and is the Max YYYY-MM formated month for September 2018, which is 2018-09.
  • LastMonth: If you look closely at the Calendar table description in the prior post, there is a field named MonthNum.  MonthNum is the sequential month starting from the beginning of the table.  In the Calendar table, September 2018 is the 129th month.
  • FirstTrailing12: Many of our reports are going to be based on twelve months of rolling history.  In this instance, the most recent month is September 2018 making the first month in our twelve-month rolling history October 2017.  If you look at the query, we are using the Max function in an expression to compute the MonthNum index that is 12 months prior to the most recent month end.  The formula is Max(MonthNum)-11, which equals 118.  In our Calendar table, month 118 corresponds to October 2017. 

When we execute this new query, that we named LastMonth, it returns the following data:

LastMonth query result showing the current lastest month number (129) and the 12 month earlier month number (118)

As you can see above, the query returned 2018-09 as our last month of actual shipments, 9/28/2018 as the numeric date of the last shipment, 129 as the last numeric month of shipment history in our index of months and 118 as the first month in our rolling sequence of twelve months (e.g. Sep 2018 = 129, Aug 2018 = 128, …. Oct 2017 = 118).  This query will be used as a subquery by our next query in this example to filter the amount of shipment history data returned, eventually to a Power Pivot example.

Next, how to use our subquery (LastMonth) to limit the data returned from our data warehouse

Again, using the Access Query Design window, we have created a new query by adding in the TransactionalShipments table, Calendar table, linked on their respective date fields, and the LastMonth subquery.  Note that the LastMonth subquery is not linked to any of the tables, but is referenced in the Where clause for the MonthNum field.  This is where the filter will be set to limit the records returned to a rolling twelve months.

32.07 - Rolling 12 Month Shipment HIstory Query

Here is a little close look at the Where clause inthe MonthNum field:

The where clause from our twelve-month rolling history query that is used to limit the returned data to 12 months.

The above field will limit the data using the LastMonth subquery by limiting the records returned to only the months that are greater than or equal LastMonth:FirstTrailing12 (which was equal to month index 118 and equates to Oct 2017) and less than or equal to LastMonth:LastMonth (which was equal to month index 129 and equates to Sep 2018).  Here is the SQL that was generated as part of the query above, that you can paste into the Access query window:

SELECT TransactionalShipments.ItemNum, TransactionalShipments.[Ship Date], Calendar.[YYYY-MM Format], Sum(TransactionalShipments.TotPrice) AS SumOfTotPrice, Sum(TransactionalShipments.TotCost) AS SumOfTotCost, Sum(TransactionalShipments.Units) AS SumOfUnits
FROM LastMonth, Calendar INNER JOIN TransactionalShipments ON Calendar.Date = TransactionalShipments.[Ship Date]
WHERE (((Calendar.MonthNum)>=[LastMonth]![FirstTrailing12] And (Calendar.MonthNum)<=[LastMonth]![LastMonth]))
GROUP BY TransactionalShipments.ItemNum, TransactionalShipments.[Ship Date], Calendar.[YYYY-MM Format]

This is why I really prefer using the Query Design window, which is more intuitive for me than thinking through the SQL.  We named this query 12-Month Rolling Shipments, which will be used in our Power Pivot example next.

Using the 12-Month Rolling Shipments query in a Power Pivot example to filter for the most recent rolling twelve months of shipment history

Now we shift over to Excel to illustrate how to your data warehouse app can really begin to leverage an accounting calendar.  Starting with a blank Excel workbook, we are going to navigate to the Get External Data function from the Power Pivot Data Model function (Excel -> Power Pivot -> Data Model -> Manage -> Get External Data.  This post is not meant to be a detailed primer on how to use Power Pivot, but you can use this link to learn more from a YouTube video on the same subject.  Once on the Excel screen below, you will select to get data from Access as shown below.  This will bring up a dialog that allows you to navigate to the Access database for your data warehouse application bring up a list of tables and queries that can be imported into your Power Pivot data model.

Building a Power Pivot Excel file using Access tables and queries Step #1

Once we were able to select the tables and queries in our Access database, we selected to import the 12-Month Rolling Shipments query and the Calendar table.  We want the Calendar table because it will allow us to summarize our query in various ways, such as by year, month, quarter, weekday, etc.  Just remember that we are only returning the 12 months of shipment history in this example.  Generally you would want to limit the data going to your report consumers so that they are not overwhelmed with data.

Refining our Power Pivot Data Model by linking the Calendar table and 12-Month Rolling Shipment history query

Now quickly navigating to the Home -> View -> Diagram View you will see the two data objects we retrieved from Access and now using the cursor we can link the Calendar:Date field to the 12-Month Rolling Shipment:Ship Date field and we are good to go for now.

Building a Power Pivot Excel file using Access tables and queries Step #3

Adding a Power Pivot table using our new Data Model

Given that the Access data has been imported into a Power Pivot model and linked, we can now create a new Pivot Table by going to Home -> PivotTable which will add the new Pivot Table to the Excel file we currently have open.  Shown below we added the Total Price (total invoice price for each line item) as the Values, the ItemNum as the Rows, Weekday as the Columns and “YYYY-MM Format” as a filter.  You can see the result below, which is an example of using our Calendar table to enable analysis by weekday.

Building a Power Pivot Excel file using Access tables and queries Step #4

The nice thing about setting up a Pivot Table in this fashion, is that once the next, as well as all future, month-end is complete and the Access file is updated, then all that is required is for one of your report consumers to perform a Refresh All to update the Data model with most recent 12 months of shipment history.   This is how you really being to leverage an accounting calendar, because we are going to use the same approach to limit month-end inventory results, compute days of inventory on hand, etc.

There is much, much more you do with this data utilizing Measures and DAX, but this was just intended to show you how a data warehouse can leverage an created Accounting Calendar table to great effect.

We hope you found this useful in your endeavors for creating a more insightful data warehouse for your business.  If you are a business owner that is looking for ways to leverage your data analytics, please contact us today

 

About the Author
About the Author
Chase Morrison provides CFO services, utilizing Profitwyse’s 3D Growth Platform™, enabling his business owner clients to more readily achieve their goals for wealth creation and family legacy.  Contact him today to learn how your business can hit the accelerator using Profitwyse’s proven platform.

 

Access example of a versatile accounting calendar fields

Building a data warehouse for robust analytics requires a versatile accounting calendar


A 1st Step to Building a Data Warehouse that Supports Robust Data Analytics is a Versatile Accounting Calendar

Though it may not seem obvious relative to your thoughts on a data warehouse for your business, building a versatile accounting calendar is an important first step.  The first consideration is what type of accounting calendar your business uses.  There are predominately two types: 1) a normal accounting calendar based on the natrual calendar period endings, such as January 31st, February 28th, March 31st, etc.; and 2) an accounting calendar that when split into quarters, has 13 weeks in each quarter and within each quarter two months comprising 4 weeks and 1 month comprising 5 weeks.  Accounting calendars are frequently referred to by the pattern of 4-week and 5-week months are used to get the 13 weeks.  The three options are 4-4-5 (most common), 5-4-4 and 4-5-4 (the later two being less common).  The final consideration is what month the fiscal year ends.

A good example of how these vary is with the recent ending of calendar 3Q of 2019.  A business using a normal calendar would have ended the quarter on Monday, September 30th.  A business using an accounting calendar, would most likely have ended the quarter on Friday, September 27th, with the first Monday of the 4th quarter falling on September 30th, and 4th quarter ending on Friday, December 27th, which is 13 weeks later.  Hopefully that makes sense.  But the key takeaway is that you need to accommodate the rules around your accounting calendar in the data warehouse calendar.

In this post, we are going to explain how to create an accounting calendar based on the normal calendar, with a December 31st year end.  If you would like, you can download a  copy of our accounting calendar, as a CSV file, that you can import into your system.  If you are using some other calendar, the changes to our example should be fairly straight forward.

Building a data warehouse for robust analytics requires a versatile accounting calendar to enable powerful time-based analyses that produce actionable insight. Click to Tweet

More detail considerations for implementing your accounting calendar

We are going to utilize Microsoft Access as the database engine for our data warehouse application.  There are many reasons why, but one of the first is that Access is nearly ubiquitous on Windows-based machines, low cost, and the tables/queries are easily transportable to more robust solutions, such as Microsoft SQL Server, as demands dictate.

You can see a snippet of the table for our calendar below, which consists of 15 different fields that are used to summarize analytics using various calendar methods.  Again, you can download a CSV version of the accounting calendar used in this post, if you wish.  The calendar starts with January 1, 2008 and runs through December 31, 2028.  

Access accounting calendar for financial reporting

The fields are as follows:

    • DateNum: This is a sequential integer, starting at 1 for January 1, 2008 and ending at 7,671 for December 31, 2028.  This field has a number of uses, but at the most basic level, it can be used to easily determine days between two dates.
    • Date: This is a date field that contains the actual date.  Below we discuss the format of this field, which is very important given that we used this field to link to the date file in other data sources.
    • Weekday: This is the abbreviated weekday.
    • Lmonth: This is the long month name, e.g. January, February, etc.
    • Smonth: This is the short month name, e.g. Jan, Feb, etc.
    • Nmon: This is the numeric month number, assuming a December year end.  If you have a different fiscal month start and end, then this would need to be adjusted accordingly.
    • Nday: This is the day within the month.
    • Year: This is the year.
    • Nquarter: This is the numeric quarter.
    • Quarter: This is the quarter name, e.g. 1st, 2nd, etc.
    • YYYY-MM Format: This is a representation for each month and is used to organize monthly data by column or whatever.  In our calendar, the month of October, 2019 would be represented as 2019-10.  But if your business has a September fiscal year end, then October would be the first month of the new fiscal year.  In this case, October 2019 might be 2020-01 given that October would be the first of month of the new 2020 fiscal year.  You need to think about how you want your monthly data summarized in PowerPivot tables, Crosstab queries, etc.
    • YYYY-Q Format:  This is the same as the YYYY-MM monthly field, just it is for quarters.
  • DayOfYear: A serial number with January 1 being the numeral 1 and the last day being either 365 or 366.
  • WeekNum: This is the sequential week number in the entire table and starts at 1 for the first week in January 2008 and ends at 1,097, which is the last week, though only a partial week, in December 2028, starting on Sunday.
  • MonthNum:  This is the sequential month number, like the week number, staring at 1 and running through month 252, which is December 2028.  A frequent use of this field is to produce query results that are time bound.  For example, if only want users to have access to the most recent months of shipment history, we can easily use this field to filter the data.  More on this later.

Accounting calendar field data types used

This is a snippet of our Calendar table using the Design View so you can see what data types are used for the 15 fields.  All the numeric fields are Integers.

Access accounting calendar Design View

Some considerations when using date fields to link tables together

An important consideration or accommodation that needs to be addressed when using dates to link the calendar table to any other table that contains a date field is dealing with the time component embedded in date fields.  Though it goes a bit beyond the scope of this blog post, data warehouse designers need to be cognizant of the fact that date fields are in reality date and time fields.  As you can see below, I changed the Date format in our Access Calendar table to include the time.  You want to note that all the times are 12:00:00 AM.

Accounting calendar dates reflect 12:00:00 AM Time or time = 0 every day

Date/Time fields are decimal numbers, where the whole number part represents the date and the decimal part presents the time.  This means that you need to always be scrubbing any dates you import into your data warehouse and use the same time  (unless it is an important dimension in your application).  Take a look at the Excel snippet below with different dates:

Excel version showing how applications reflect date and time with time as decimal part of numeric

Access and many other applications treat dates/times in a similar fashion.  Compare the cells in B2 vs. C2.  You can see that both have the same date, yet have different times.  If you were to perform a comparison, as is done in D2, the two fields are not equal.  This also means that you cannot link these two fields together.  Also, at B3 you can see the decimal interpretation of 10/3/2019 @ 10:17 AM, which Excel perceives as 43743.42911782.  Excel is transforming 43743.42911782 into a corresponding date and time.  But if you can run your dates through a function, like Int(), you can remove the decimal part, which will make it very easy to link the date fields as is shown in B4 vs. C4.

Scrubbing dates is a fairly simple process given that you just need to use the INT() function or something similar to remove the decimal part. The INT() function will round down to the nearest integer.  So even if you have a date, that has a time one second before midnight, the INT() function will just round down to that same day, at 12:00 AM, e.g. 43743.99999999 will be rounded to 43743.00000000.  Contact us if that does not make sense.

That is about it for the primary concerns regarding the creation of an all important Accounting Calendar for your data warehouse application.  We hope you found this “how to” on building a data warehouse for robust analytics requires a versatile accounting calendar of value.  In the next installment of the data warehouse application “how to” we are going to show you ways to use your accounting calendar to limit or filter data from your data warehouse to your user apps such as PowerPivot or Power BI, where you want to limit the total data set, by date, that goes out to your user base.

If you are a business owner that is looking for ways to leverage your data analytics, please contact us today

 

About the Author
About the Author
Chase Morrison  provides CFO services, utilizing Profitwyse’s 3D Growth Platform™, enabling his business owner clients to more readily achieve their goals for wealth creation and family legacy.  Contact him today to learn how your business can hit the accelerator using Profitwyse’s proven platform.