Analytics Testing Archives - Codoid
Select Page

Blog

Category Selected: Analytics Testing

5 results Found

Analytics Testing

What Makes Data Analytics in Manufacturing Beneficial

Analytics Testing

An End-to-End DBT Tutorial for Testing

Data Analytics Testing

Importance of ETL Testing

Data Analytics Testing

ETL Data Quality Testing Best Practices

Data Analytics Testing

Data Quality Checks for Data Warehouse/ETL


People also read

Blog

Top Errors in Automated Agile Testing and How to Avoid Each

QA Outsourcing

How QA Outsourcing Companies Perform Testing?

Accessibility Testing

Understanding Success Criterion 2.5.2 Pointer Cancellation

What Makes Data Analytics in Manufacturing Beneficial

What Makes Data Analytics in Manufacturing Beneficial

Manufacturing is, for the most part, the world economy’s engine. No matter what economy is in question, it’s one of the most valuable industries with many benefits. Two of the largest manufacturing hubs in the world, the USA and China, have a GDP whose contributions were at $2.3 trillion and $4 trillion respectively.

Manufacturing Is Key

Accessing today’s beloved services and products would be impossible without manufacturing. Phones we use for communication, cars and other vehicles that are driven…there’s so much going on in the world that the manufacturing industry delivers on.

The approach people have taken to manufacturing over the years has shifted since the industrial revolution. Streamlining the process is something people are consistently trying to find new ways to achieve. Manufacturing processes of big companies are outsourced to offshore plants that are more efficient. Production processes are consistently being automated so that quality improves and the chances of human errors lower considerably.

There are many companies that invest in the latest developments in manufacturing technology in order to increase their productivity. As a result, modern manufacturers can now produce thousands of products with minimal human involvement.

Data Analytics In Manufacturing: The Advantages

All told, manufacturing takes a huge chunk of means to improve through data analytics. There are so many benefits that go a long way over time. Some of the most important ones include, but are not limited to:

Automated Manufacturing Gets Empowered By Analytics Algorithms

Manufacturers can cut down on errors and miscommunications with data analytics that are integrated throughout their business. This, in turn, leads human interaction to be minimized in terms of quality control of finished products and raw material delivery.

Data Analytics Helps Product Design Improve

Product design to the market used to involve plenty of trial and error. If anything, the initial iteration of a product design did not really get a warm welcome from customers. This is usually from the design and ergonomics being far from ideal.

Thanks to the advancement of technology in today’s modern times, data analytics allow manufacturers to test designs for ergonomics and efficiency without a physical prototype being made. 

Traditionally, automotive manufacturers have used hardware-intensive methods to create car design mock-ups in order to test them in a wind tunnel. However, the advent of data analysis and artificial intelligence has allowed manufacturers to use software to test a design before making physical copies of it.

Product Management Gets More Efficient

Being able to project how many products to manufacture is critical for any manufacturer. Overproducing may cost millions of dollars, but the alternative—underestimating market demand—could tarnish a company’s reputation if it cannot meet customer demands in a timely fashion. 

Making the most of profit is best done by manufacturers through accurate demand estimation.

Conclusion

Since the industrial revolution, people’s views on manufacturing as a whole have taken a turn. Today’s data analytics plays a major role in the manufacturing industry. Benefits include more efficiency in product management and improvements to product design without unnecessary wasting of any resources.

Trying to find manual testing companies to work with on data analytics? Contact Codoid today! We’re an industry leader in Quality Assurance that’s passionate about helping to lead and guide the whole QA community.

An End-to-End DBT Tutorial for Testing

An End-to-End DBT Tutorial for Testing

Currently, we are extracting the data from different source systems, transforming it, and then finally loading it into the data warehouses. But when it comes to DBT, the raw source data from different source systems are extracted and directly loaded into the warehouse. After which, DBT allows us to create new materialized tables into which the transformed data is fed as it helps showcase the meaningful data for business analysis purposes by using simple select statements. But what makes DBT stand tall is it allows both the development and production environment teams to work in the same place by creating models & testing the scripts for transformation purposes. Being one of the leading QA companies, we will be mainly focusing on the testing part of the above process in this DBT Tutorial by exploring how we test our models.

Data flow of DBT Tutorial

What is DBT?

If you’re new to the DBT tool, worry not, we have got some basics in this DBT Tutorial for you as well. The Data Build Tool (DBT) is an open-source test automation tool and a command-line tool. It mainly focuses on the transformation part in the “Extract load and transform” pipeline. DBT allows both data analysts and data engineers to build the models and transform the data in their warehouses to make it meaningful.

New to DBT tool?

Let’s walk you through some of the basics to help you get a quick overview of DBT and the necessary steps that have to be done to create an account. Basically, there are two ways for DBT to be used:

1. DBT CLI(Command Line Interface):

The DBT CLI variant allows users to write their scripts and test them locally by using the DBT command-line tool.

2. DBT Cloud:

The DBT Cloud is a facilitated variant that smooths out an advancement with an online Integrated Development Environment, an interface to write our DBT test scripts and run them on a schedule.

In this DBT Tutorial, we will be covering the DBT cloud variant. It is worth noting that the ideas and practices can be stretched out to the CLI variant as well. So let’s get started.

Data warehouse:

As stated earlier, the DBT is used to handle the transformation part of the ELT for data warehousing. “But how does it work?” you might ask. Well, DBT helps us create a connection with the warehouse and then lets us write the SQL simple select statements against the warehouse to transform the data.

Supported Warehouses:

We have listed the names of the supported warehouses in DBT, and they are

• Postgres

• Redshift

• BigQuery

• Snowflake

• Apache Spark

• Databricks and

• Presto (Partially Supported).

We would be using the Snowflake warehouse in this blog for demonstration purposes.

DBT Tutorial for the Setup:

Now that we have seen an overview of DBT, let’s find out how to set up the DBT cloud and use its interface. If you’ve not yet created an account in DBT, you can sign up for DBT cloud by visiting their Sign up page.

Once you are done with your DBT cloud sign-up process, let’s find out how to set up your very first DBT project.

DBT recommends using GitHub to store our deployments. So the first step would be to create an empty repository in GitHub as shown below.

Setting up DBT tutorial

Enter your repository name and click on ‘Create repository’.

Once you have signed up with your DBT cloud, just select the snowflake warehouse and start configuring it using your snowflake credentials. We have shown a sample connection in the below images for your reference.

Set up Database connection - DBT tutorial

Development Credentials

After entering your warehouse and database details, make sure you have entered the snowflake user name and password in development credentials. Next, click on ‘Continue’ to get this confirmation message “Connection Test Success”. Now, you have to click on the ‘Continue’ button for configuring the GitHub. Since you’re done with GitHub configuration, we’re good to start with DBT.

DBT Tutorial for its Interface:

DBT has a lot of inbuilt options that help access and understand it easily. After setting up with DBT, the interface of DBT that we get to see initially will look like how it is shown in the image below.

DBT Interface

As you can see, DBT might initially look a bit empty and even show that the compilation error at the bottom right. So we would have to click on ‘Initialize your Project’ to start our first project. Doing this will make the DBT will provide us with a skeleton as shown in the below image for our project automatically.

Starting of DBT tutorial

We can clearly see that the compilation message that was displayed at the right bottom has now changed to become ‘ready’, implying that it is time to start our project. So let’s get started.

DBT Tutorial Database

First off, let’s find out the scope of some of the functions,

1. The commit button will handle all our git configurations like creating a new branch, pull request, commit, merge, etc.

2. After writing any SQL statements we can check the query then and there by using the dbt “preview data” and “compile SQL” buttons.

3. We can run and test our models by writing simple DBT commands (e.g. dbt debug, dbt run, dbt test).

4. DBT also generates documentation that will come in very handy for analytical purposes as it will contain all the details about the project. You can access it by clicking on the “view docs” option as shown in the above pic.

Note: Before running any model or a test, make sure that you have saved it.

Loading training data into the Warehouse:

So as stated earlier, DBT cloud currently supports only BigQuery, Postgres, Redshift & Snowflake warehouses, and we have decided to use Snowflake for the demo. So to illustrate how to import or load your training data into the warehouse, we will be loading the two tables that we have created.

Snowflake:

It is important to note that we have written the below instructions assuming that you have the database named compute_wh in your Snowflake account. So make sure that you have the required privileges to create the objects in this database.

Run the following commands in your Snowflake Warehouse SQL Runner:

To create the Database:

Create database STUDENT;
Create Schema STUDENT_INFO:
Use schema STUDENT_INFO;

To create the Tables:

Table 1: Student_Profile

Table 1 will contain the personal information of the students.

Create table student_profile (
  S_no int not null,
  Std_id int not null,
  Std_name varchar(45) null,
  Std_dep varchar(45) null,
  DOB datetime null,
  Primary key (Std_id));
  insert into student_profile (S_no, Std_id, Std_name, Std_dep, DOB) values ('1', '294', 'Afshad', 'CSE', '03/10/1998');
insert into student_profile (S_no, Std_id, Std_name, Std_dep, DOB) values ('2', '232', 'Sreekanth', 'Mech', '02/09/1997');
insert into student_profile (S_no, Std_id, Std_name, Std_dep, DOB) values ('3', '276', 'John', 'EEE', '10/06/1998');
insert into student_profile (S_no, Std_id, Std_name, Std_dep, DOB) values ('4', '303', 'Rahul', 'ECE', '12/05/1997');
insert into student_profile (S_no, Std_id, Std_name, Std_dep, DOB) values ('5', '309', 'Sam', 'Civil', '09/04/1999');
insert into student_profile (S_no, Std_id, Std_name, Std_dep, DOB) values ('6', '345', 'Ram', 'CA', '03/11/1998');
insert into student_profile (S_no, Std_id, Std_name, Std_dep, DOB) values ('7', '625', 'Priya', 'CSE', '03/12/1996');
insert into student_profile (S_no, Std_id, Std_name, Std_dep, DOB) values ('8', '739', 'Bob', 'MEC', '06/07/1998');
insert into student_profile (S_no, Std_id, Std_name, Std_dep, DOB) values ('9', '344', 'Ganesh', 'Mech', '07/09/2024');
insert into student_profile (S_no, Std_id, Std_name, Std_dep, DOB) values ('10', '123', 'Neha', 'ECE', '12/09/1998');
Table 2: STUDENT_RESULTS

Table 2 will contain the results of those students.

Create table student_results (
  S_no int not null,
  Std_id int not null,
  Marks int null,
  Result varchar(45) null,
  Primary key (Std_id));
insert into student_results (S_no, Std_id, Marks, Result) values ('1', '294', '78', 'Pass');
insert into student_results (S_no, Std_id, Marks, Result) values ('2', '232', '56', 'Pass');
insert into student_results (S_no, Std_id, Marks, Result) values ('3', '276', '88', 'Pass');
insert into student_results (S_no, Std_id, Marks, Result) values ('4', '303', '67', 'Pass');
insert into student_results (S_no, Std_id, Marks, Result) values ('5', '309', '38', 'Fail');
insert into student_results (S_no, Std_id, Marks, Result) values ('6', '345', '90', 'Pass');
insert into student_results (S_no, Std_id, Marks, Result) values ('7', '625', '87', 'Pass');
insert into student_results (S_no, Std_id, Marks, Result) values ('8', '739', '45', 'Fail');
insert into student_results (S_no, Std_id, Marks, Result) values ('9', '344', '97', 'Pass');
insert into student_results (S_no, Std_id, Marks, Result) values ('10', '123', '49', 'Fail');

Now that you have the training data in your warehouse, you are one step away from starting your DBT testing.

DBT Tutorial for the available Models:

In analytics, the process of modeling is changing the data from being the raw data to the final transformed data. Typically the data engineers are responsible for building tables that represents your source data, and on top of that, they also build the tables/views that transform the data step by step.

The models are just SQL select statements in your dbt project. These models are created inside the Models directory with the .sql extension.

What’s great about dbt is you don’t need to know DDL/DML to build the tables or views.

Note: The name of the file is used as the model or table name in your warehouse.

Creating Models:

Inside the model’s directory, we have created two simple models for better understanding.

Model's Directory - DBT Tutorial

Model 1:Student_profile.sql

{{ config(materialized='table') }}

with source_data as (
select 
    S_No,
    Std_Id,
    Std_Name,
    Std_Dep,
    DOB
 from 
STUDENT.STUDENT_INFO.STUDENT_PROFILE

)

select *
from source_data

Model 2: Student_Results.sql

{{ config(materialized='table') }}

with source_data as (
select 
    S_No,
    Std_Id,
    Marks,
    Result 
 from 
STUDENT.STUDENT_INFO.STUDENT_RESULTS

)

select *
from source_data

Once you’ve built your models, make sure to save and run the models to load them into the warehouse. You can use the ‘dbt run’ command to run your created models, or if you want to run any particular model, you can use the ‘dbt run –Student_Profile’ command.

Successful Completion of DBT function

DBT will show detailed information about the models when you run your models as shown in the above image. So we have successfully completed all the required steps and can move forward to find out how to perform testing in DBT.

DBT Tutorial to perform Testing:

We all know that testing plays a major role in deployment, so tests in analytics are just assertions that you have about your data. But what makes these assertions important is that if these assertions are met, it’ll be instrumental in helping others trust you by looking at the data.

The great part of DBT testing is that by default it provides four data quality checks. They are Unique, Not Null, Relationships (Referential Integrity), and Accepted Value checks.

Basically, DBT provides us with two types of testing. So let’s take a look at the steps to be done to perform the first type, the Schema test.

1. Schema Test:

Create a YAML file in the same model’s directory with a .yml Extension (Ex. Schema.yml).

Configure your tests (Unique, Not Null, Relationships, and Accepted Values) in the YAML file based on your table data and run it by using the command “dbt test”.

Note: Yaml is a data serialization and human-readable language that can be easily understood even by non-technical people. It is mainly used for configuration purposes as it supports all programming languages.

We have attached one YAML file for your reference below.

Schema.yml:

version: 2

models:
    - name: student_profile
      description: "Table contains all the information about the students"
      columns:
          - name: std_id
            description: "The primary key for this table"
            tests:
                - unique
                - not_null

    - name: student_results
      description: "Table contains the student results info"
      columns:
          - name: std_id
            description: "The primary key for this table"
            tests:
                - unique
                - not_null
                - relationships:
                    to: ref('student_profile')
                    field: std_id
          - name: result
            description: "Result of a student"
            tests:      
                - accepted_values:
                    values:
                      - pass
                      - fail

Once you run this test, you will get detailed information on all your tests. If you need to run only one particular schema test, you can use this command “dbt test –m Student_results”.

2. Data Test:

But if you are looking to write custom test scripts based on the requirements against the data, that also can be done by using tests of this type.

Start by creating the SQL files with your test name inside the test directory of the DBT project as shown below.

Test Directory

Now, add your test script to your test file.

Duplicate.sql:

select std_id, count(std_id) 
  from {{ref('student_profile')}} 
  group by std_id 
  having count(std_id)>2

referential_integrity.sql:

select std_id 
from {{ref('student_profile')}} 
where std_id not in (
    select std_id 
    from {{ref('student_results')}} )

Note:

Here ref function is used to refer one or more specific files into the script.

Finally, you can run the test by using the “dbt test –data” command.

Running the test by DBT command

DBT Tutorial for Documentation:

As discussed earlier, documentation is one of the greatest features of DBT. So once your project is completed, all you have to do is use the “dbt docs –generate” command to generate the document.

DBT Documentation

After running the command, just click on “view docs” at the top left of the DBT interface to view the complete information about your project in the form of a document. It even generates a lineage graph like the one shown below for a crystal clear understanding.

Lineage Graph

DBT also supports filters in the lineage graph, which is a great advantage when it comes to analysis.

Conclusion:

We hope that you have enjoyed reading this DBT Tutorial blog and also hope that DBT would be a great addition to your toolkit. To sum things up, DBT helps carry out the most complex and heavy transformations in a simple and easy manner by using simple select statements. It enables both data engineers and data analysts to work in the same environment and provides a unique experience for them to transform the data. The models in DBT enable faster execution and also make it so easy to test and modify. Finally, it even shows detailed information about the project by generating the document along with a lineage graph for clear analysis purposes. All thanks to such awesome features, DBT has been a resourceful tool that has helped us deliver the best software testing services to our clients.

Importance of ETL Testing

Importance of ETL Testing

ETL Testing (Extract-Transform-Load Testing) is an important component of Big Data Testing. It is a data-centric testing mechanism largely associated with Data Warehouse or Data Migration projects.

The Extract-Transform-Load process involves the extraction of data from the databases, the transformation of data depending on the data warehouse structure, and the loading of data to the data warehouses. Hence, the objective of ETL Testing is to test the ETL process in order to ensure that data is effectively managed in the Data Warehouse.

Importance and Types of ETL Testing

Importance of ETL Testing

ETL Testing holds great significance in the process of data warehousing and data migration as it validates the efficient loading of data from the source system to the data warehouse. It provides a consolidated view of the data to the enterprises, thereby enabling them to push better business decisions.

Some of the notable ETL Testing benefits that Software Testing companies generally highlight while endorsing the testing method to underscore the importance of the testing method are:

Helps in Identifying Problems with the Source InformationWith ETL Testing, the Extract-Transform-Load process can be tested at an early stage when data is extracted from the source systems. Testers can identify the issues with the source information from the start, prior to the stacking of the data in the data repository. They can also spot the discrepancies or ambiguities in business rules that have been designed by an enterprise for the management of data transformation and integration.

Facilitates the Transfer of Bulk DataETL Testing is necessary for Data Warehousing projects especially when the transfer of bulk data is involved. A Digital Assurance company may particularly propose ETL Testing to large businesses as they frequently implement data integration and data migration processes that require the transfer of data from one location to another. ETL Testing ensures that the bulk data has been moved completely to the new destination in a reliable manner.

Prevents Loss of Data and Duplication of RecordsThe loss of data and the duplication of records can be prevented by performing ETL Testing as it is an effective method for the authentication and validation of information in data warehouse systems. ETL Testing makes sure that the exchange of information from independent sources to the central data warehouse takes place in compliance with the transformation rules and is consistent with the validity tests.

Eliminates Possible Errors in Transmission of Information across an Enterprise

The ETL Testing method can eliminate possible human or system errors during the process of transfer of data from the extraction stage to the loading stage. Any human/system errors in the process can result in the transmission of inaccurate information across the enterprise. ETL Testing can eliminate the errors since it covers diverse strategies such as Data Completeness, Data Correctness, Data Integrity, Data Reconciliation, Data Transformation, Data Quality, and System Performance and Scalability.

Data Warehouse Testing

In Conclusion:

ETL Testing is a critical testing procedure for Data Warehousing projects since it helps to build, control, and protect integrated or migrated data. The ETL Testing process validates, verifies, and qualifies data in order to avert data loss and duplicated records. ETL Testing is gaining importance in the present times since Agile, DevOps, Artificial Intelligence, and Cloud Technologies are fast emerging as the top trends in software development industry. Work with the best – connect with us.

ETL Data Quality Testing Best Practices

ETL Data Quality Testing Best Practices

Why do my ETL data quality testing techniques fail to identify bad data?

Bad data costs organizations dearly in correction activities, lost customers, missed opportunities, and incorrect decisions.

How to perform ETL Data Quality Testing to find bad data?

Impact of poor data

  • Poor data can lead to bad business decisions
  • Delays in delivering data to decision makers
  • Lost customers through poor service
  • ETL Data Quality Testing

    ETL Testing Techniques

  • Number of Records Validation
  • Data Completeness
  • Not Null Validation
  • Validate valid values
  • Frequency Distribution
  • Min and Max Validations
  • Duplicate Records
  • Pattern Check
  • Consistency
  • Precision
  • Timeliness
  • Business Rules
  • Data Type Check
  • Size & Length Validation
  • A firm’s basis for competition . . . has changed from tangible products to intangible information. A firm’s information represents the firm’s collective knowledge used to produce and deliver products and services to consumers. Quality information is increasingly recognized as the most valuable asset of the firm. Firms are grappling with how to capitalize on information and knowledge. Companies are striving, more often sliently, to remedy business impacts rooted in poor quality information and knowledge.

    – Kuan-Tsae Huang, Yang W. Lee and Richard Y. Wang

    Data Quality Checks for Data Warehouse/ETL

    Data Quality Checks for Data Warehouse/ETL

    Data should be perceived as a strategic corporate tool, and data quality must be regarded as a strategic corporate responsibility. The corporate data universe is made up of a wide range of databases that are connected by infinite real-time and batch data feeds. Data is an ever-constant movement, and transition, the core of any solid and thriving business is high-quality data services which will, in turn, make for efficient and optimal business success.

    However, the huge conundrum here is that data quality, solely on its own cannot improve. In truth, most IT processes have a negative impact on the quality of data. Therefore, if nothing is done, the quality of data will continue to plummet until the point that data will be considered a burden. It is also pertinent to point out that data quality is not a feat that can easily be achieved and after that, you brand the mission complete and heap praises on yourself for eternity. Rather, it must be viewed as a garden that must be continuously looked after.

    Data Warehouse testing is becoming increasingly popular, and competent testers are being sought after. Data-driven decisions are termed to be accurate. A good grasp of data modelling and source to target data mappings can assist QA analysts with the relevant information to draw up an ideal testing strategy.

    Data Quality Check-Verify Field Data Type and Length

    Authenticate source and target fields data type and length.

    Data Quality Check-Verify Field Data Type and Length

    In the verification pictured above, we have a mismatch of the data type and length in the target table. It is a good habit to verify data type and length uniformity between the source and target tables.

    Data Quality Check-Verify Not Null Fields

    Authenticate Null Values in a column which features a NOT NULL CONSTRAINT

    Data Quality Check-Verify Not Null Fields

    When a source has a Not Null Constraint, it should not feature Null Values as demonstrated above.

    Data Quality Check-Verify For Duplicate Records

    If your data warehouse features duplicated records, then your business decisions will be inaccurate and undependable. Poor data that is marred with inaccurate and duplicate data records will not enable stakeholders to properly forecast business targets.

    Data Quality Check-Verify For Duplicate Records

    A QA team should ensure that source data files are verified to spot duplicate records and any error in data. Similarly, you can integrate automated data testing by adopting Python to authenticate all data rather than just checking an example or subset.

    Data Quality Check-Verify for Orphan Records

    A situation where there are child records with no matching parent records, then such records are termed “Orphan Records.” The business should outline Strategic relationship rules amongst parent and child tables.

    Data Quality Check-Verify for Orphan Records

    Data Quality Check-Verify for Unknown Data

    On some occasions, unknown data validation is necessary to ensure that the right changes happened as planned for value encoding. For instance, consider during the transformation process, if there is a logic to encode the value “Male” as “M”, subsequently, the QA team should cross-check the Gender column to ensure that it does not feature a different encoded value.

    Data Quality Check-Verify for Unknown Data

    Conclusion

    Data quality is a broad and complicated field with many aspects. Corporate data universe is made up of different databases, linked in countless real-time and batch data interfaces. Irrespective of the industry, revenue size or its target market, virtually every organization depends on credible data to produce useful information for appropriate business decisions. Data quality can be jeopardized at any level; reception, entering, integration, maintenance, loading or processing.

    Thus, efforts must be made to ensure that quality data via Data Warehouse testing/ETL testing is aimed at guaranteeing the production, availability, and use of high-quality data within an organization.