Skip to main content

Malcolm McRoberts

Malcolm McRoberts's Public Library

  • Utilizing a data profiling tool that shows the range and value distributions of fields in a data set. This can be employed during testing and in production to compare source and target data sets and point out any data anomalies from source systems that may be missed even when the data movement is correct.

    • 1.“One Size Fits All” principle doesn’t work here...


      ETL Test Process is different from Standard Test Process. How?

      •   Test Objective is to enable customers to make intelligent decisions based on accurate and timely analysis of data. 

        Test Focus should be on verification and validation of business transformations applied on the data that helps customer in accurate and timely decision support E.g. the items with the most sales in a particular area within the last two years

      •   Consolidation & frequent retrieval of data (OLAP) takes precedence over frequent storage/rare retrieval (OLTPs) 

        Emphasis here is mostly on consolidating & modelling data from various disparate data sources into OLAP form to support faster retrieval of data in contrast with frequent storage and rare retrieval of the data in OLTP systems.

      •   Freshness and accuracy of the data is the key to success 

        Timely availability of the accurate and recent data is extremely critical for BI/DW applications to make accurate decisions on time. The Service Level Agreement (SLA) for the availability of latest and historic data has to be met, in spite of the fact that the volume of data and the size of the warehouse remain unpredictable to a great extent due to its dynamic nature.

      •   Need to maintain history of data; Space required is huge 

        Data warehouses typically maintain history of the data and hence the storage requirement is humongous as compared to transactional systems which primarily focus on recent data of immediate relevance only.

      •   Performance of Retrieval  is important: De-normalization preferred over Normalization 

        Typically de-normalized with fewer tables; use of star and/or snowflake schema as compared to OLTP systems which follow famous Codd’s data normalization approach
        The data in the warehouse are often stored multiple times - in their most granular form, this is done to gain the performance of data retrieval procedure.

      •   Importance of Data Security 

        PII (Personal Identifiable Information) and other sensitive information are of HBI (High Business Impact) to customers. Maintaining the confidentiality of the PII fields such as Customer name, customer account details, contact details etc. are amongst top priority for any DW application. Data has to be closely analyzed and programs designed to protect PII data and expose only the required information.

    • The Biggest Issues with Data Warehouse Testing:

      • Data warehouse/ETL testing is performed manually
      • This means that far less than 10% of data is tested, leaving the possibility of lots of data errors.
      • Comparison of the data is typically done using Excel or “stare and compare”, meaning verifying the data by viewing or “eyeballing” the results.
      • This is a huge problem because 1 test query can return as much as 10 million rows with 200 columns, making it impossible to find data errors.

  • Ideally Suited for SaaS Applications


    We fully manage solutions that encompass compute, storage, network, data protection, monitoring, analytics, and compliance. We combine real-world deployment expertise and proven best practices with hardened operational processes and always-available seasoned experts. This results in the ability for our clients to increase their business velocity and bring to market SaaS applications faster with superior uptime and the ability to comply with the most stringent requirements including HIPAA and PCI.


    Our solutions are tailored and personalized to meet a variety of deployment models and performance criteria. We are ideally suited to help you build out your SaaS applications in a cost-effective development environment, which can rapidly get converted to a full production infrastructure. This includes addressing the need to run in geographically distributed topologies with multiple connectivity points to ensure business continuity.

  • Run high performance analytics directly on Hadoop at the speed and scale needed to help your organization transform big data into business value

  • The MapR Distribution including Apache Hadoop provides you with an enterprise-grade distributed data platform to reliably store and process big data.

  • Rattle (the R Analytical Tool To Learn Easily) presents statistical and visual summaries of data, transforms data into forms that can be readily modelled, builds both unsupervised and supervised models from the data, presents the performance of models graphically, and scores new datasets.

  • var q = require('q'),  itemsToProcess = ["one", "two", "three", "four", "five"];  function getDeferredResult(prevResult) {  return (function (someResult) {  var deferred = q.defer();  // any async function (setTimeout for now will do, $.ajax() later)  setTimeout(function () {  var nextResult = (someResult || "Initial_Blank_Value ") + ".." + itemsToProcess[0];  itemsToProcess = itemsToProcess.splice(1);  console.log("tick", nextResult, "Array:", itemsToProcess);  deferred.resolve(nextResult);  }, 600);   return deferred.promise;  }(prevResult)); }  var chain = q.resolve("start"); for (var i = itemsToProcess.length; i > 0; i--) {  chain = chain.then(getDeferredResult); }

  • driver.manage().timeouts().implicitlyWait(10, TimeUnit.SECONDS);
  • WebDriverWait wait = new WebDriverWait(driver, 10); WebElement element = wait.until(ExpectedConditions.elementToBeClickable(>someid>)));

  • Rolling up our sleeves with Tableau
1 - 20 of 1748 Next › Last »
20 items/page

Diigo is about better ways to research, share and collaborate on information. Learn more »

Join Diigo