Analytics Testing Archives

SQL Queries Cheat Sheet

by Arthur Williams | Feb 16, 2023 | Analytics Testing, Blog | 0 comments

We all know that searching for a needle in a haystack is a very difficult or almost impossible task. We can figuratively compare it with searching for data in a large database or data warehouse. But with the help of SQL (Structured Query Language), you can do that with ease. Though having a strong grasp of all the SQL queries is important, a practical SQL Queries Cheat Sheet can be beneficial, especially if you’re a beginner.

We have split our SQL Queries Cheatsheet into different categories to help you easily locate the command you are looking for. We have categorized them based on the most common use cases such as creating a table, inserting a value into the table, and so on as mentioned below.

Database

CREATE Database

Creating a new database is one of the most basic functions one must know and you can do so by using the CREATE DATABASE keyword. We have specified the syntax you’ll need and given an SQL command as well. Likewise, we would have followed a similar procedure throughout our SQL Queries Cheat Sheet to make it easier for you.

SYNTAX:

CREATE DATABASE database_name

SQL Query:

CREATE DATABASE Codoid;

Here ‘Codoid’ is the database name

USE DATABASE

The USE keyword is used when there are multiple databases present and you want to use a specific database.

SYNTAX:

USE database_name;

SQL Query:

USE Codoid;

Once again, ‘Codoid’ is the database name here.

Next up in our SQL Queries Cheat Sheet, we will be seeing how to create a table.

CREATE A TABLE

You can use the CREATE TABLE keyword to create a new table as shown in the below syntax. You’ll have to specify the datatype for the columns based on your needs. Once we have created a table, we’ll see how we can display it as well.

SYNTAX:

CREATE TABLE table_name 
(
    column1 datatype,
    column2 datatype,
    column3 datatype,
   ....
);

CONSTRAINTS

Using CONSTRAINTS, you will be able to define the rules for the columns while creating or updating the table. Though there are different types of CONSTRAINTS, we will be seeing only the most used ones in our SQL Queries Cheat Sheet. We have specified the CONSTRAINTS and briefly explained their purpose as well.

NOT NULL – It doesn’t accept the NULL values in the column
UNIQUE – It will accept only NULL values and not DUPLICATE values
PRIMARY KEY – It doesn’t accept both NULL and DUPLICATE values
FOREIGN KEY – It will create a link between the two tables. The FOREIGN KEY value is usually the primary key of the foreign table.
DEFAULT – It will set a default value for a column if we don’t provide any particular values.

Now let’s create two new tables (Project and Employee) using the CREATE TABLE and CONSTRAINTS

SQL Query:

CREATE TABLE Project 
(
ID int primary key, name varchar(100)
);

So in the above example, we are creating a table by the name ‘Project’ in the Codoid database. It has two columns (ID and Name). We have defined the ID column to have an integer data type and it is also set as the primary key. So the ID column must be unique and shouldn’t be null.

And the name column has a varchar data type with a maximum length of 100 characters.

SQL Query:

CREATE TABLE Employee (
    Name Varchar(100) NOT NULL,
    Contact bigint UNIQUE,
    City varchar(100) DEFAULT 'Boston',
    Id int,
    ProjectId int,
    PRIMARY KEY (Id),
    FOREIGN KEY (ProjectId) REFERENCES Project(ID)
);

Same as in the prior example, we are creating another table in the Codoid database and assigning constraints. So the Employee table has 5 columns: Name, Contact, City, Id, and ProjectId.

Name – Varchar type field with a max length of 100 characters and shouldn’t be left blank.
Contact – bigint type where each value must be unique.
City – Varchar type that can store 100 characters. If it is left blank, it will automatically assign “Boston” as the default value.
Id – It is int type and is also the primary key for the table.
ProjectId – int type and is a foreign key that references the ID column from the Project table we created earlier.

DESC

Now that we have created the table, you can display the table structure by using the DESC keyword.

SYNTAX:

DESC table_name;

SQL Query:

DESC Employee;

Here, ‘Employee’ is the name of the table.

Output:

INSERT VALUES INTO THE TABLE

Next up in our SQL Queries Cheat Sheet, we’ll be focusing on how to insert values in the table we have created. You can insert a single value or multiple values into your table based on your requirements.

INSERT SINGLE VALUE

We have given an example where we insert the single value ‘001’ into the Project table we had created earlier.

SYNTAX

INSERT INTO table_name VALUE ( value1, value2,…..);

SQL Query:

INSERT INTO Project VALUE (001, ‘ETL Testing’);
INSERT MULTIPLE VALUES

If you wish to insert multiple values into the table, you can do so by specifying the list of columns you will be inserting the data into and then mentioning the multiple values as shown below.

SYNTAX:

INSERT INTO table_name (column_list) VALUES (value_list_1),(value_list_2),….);

SQL Query:

INSERT INTO Employee (Name, Contact, City, Id, ProjectId)
VALUES
     ('Robert', '8594230688','Willisburg',2201,001 ),
     ('Peter', '6155805664','Smithville',2202,001 );

SELECT

We have seen how we can insert the values into the table, now let’s see how to display the values which we inserted into the tables by using the SELECT.

SYNTAX:

SELECT * FROM table_name

To print all the columns from the table.

SQL Query:

SELECT * FROM Employee

SYNTAX:

SELECT column_name1, column_name2,... FROM table_name

To print selective columns from the table

SQL Query:

SELECT id,contact,city FROM Employee

GET DATA FROM A SINGLE TABLE

There will definitely be scenarios where you’ll have to get data from a table. And so we will now be covering the different ways you can get data from a single table. There are ways to get data from multiple tables as well and we will be covering that next in our SQL Queries Cheat Sheet.

When it comes to getting data from a single table, you can either display all columns value from the table or choose which column values have to be displayed.

All Columns

So previously in our SQL Queries Cheat Sheet, we had created a table by the name ‘Employee’. It had details such as Name, ID, Contact, City, & ProjectID. So let’s see how to get all this data now.

SYNTAX:

SELECT * FROM table_name;

SQL Query:

SELECT * FROM Employee;

Output:

Specific Column Values

If at all you need only one of the column values, you can use the below syntax to get it. In our example, we will be getting only the names of the employees from the table.

SYNTAX:

SELECT column_name FROM table_name

SQL Query:

SELECT Name FROM Employee;

This query will display only Name column values.

GET DATA FROM MULTIPLE TABLES

Based on your needs, you might encounter scenarios where you’ll have to get data from multiple tables. We have already created two tables by the names ‘Project’ and ‘Employee’ in our SQL Queries cheat sheet.

So let’s use these tables in our examples to illustrate the different SQL Queries listed below.

Joins
- Inner Join
- Left Join
- Right Join
- Full Join
- Cross Join
UNION
UNION ALL
Content

JOINS

As the name suggests, these SQL queries can be used to get data from multiple tables. As mentioned above, there are multiple types of Joins. Let’s take a look at them one by one in our SQL Queries Cheat Sheet.

INNER JOIN

Just JOIN or INNER JOIN can be used to get or display records that are present in both tables. In our example, we will be getting the names from one table and the project IDs from another. We have also shown the output along with the Syntax and SQL queries to make it easier for you to understand.

SYNTAX:

SELECT column_name1, column_name2 FROM table1
INNER JOIN table2 ON table1.column_name=table2.column_name
SQL Queries:
 SELECT Project.Name, Employee.Name FROM Employee
 INNER JOIN Project ON Project.Id=Employee.ProjectId

Output

LEFT JOIN

LEFT JOIN can be used to display all the rows from the LEFT table with corresponding rows from the RIGHT table. If there’s no matching row, NULLs will be returned as values from the RIGHT table.

SQL Query:

SELECT Project.Name, Employee.Name FROM Employee
 LEFT JOIN Project ON Project.Id=Employee.ProjectId

Output:

RIGHT JOIN

Likewise, RIGHT JOIN can be used to display all rows from the RIGHT table with corresponding rows from the LEFT table. Same as before, NULLs values will be returned if there are no matching rows on the LEFT table.

SQL Query:

SELECT Project.Name, Employee.Name FROM Employee
 RIGHT JOIN Project ON Project.Id=Employee.ProjectId

Output:

FULL JOIN

If you wish to get all the data from both the left & right tables, you can use the FULL JOIN SQL query. If there are no matching rows on either side, it will return NULL values.

SQL Query:

SELECT Project.Name, Employee.Name FROM Employee
FULL JOIN Project ON Project.Id=Employee.ProjetctId

Output:

Note: FULL JOIN is not accessible in MYSQL

CROSS JOIN

This join is different from the pattern of join queries we have seen earlier in our SQL Queries cheat sheet. It can be used to generate a paired combination of each row of the first table with each row of the second table. But it is different as it will eliminate null values from both tables.

SQL Query:

SELECT Project.Name AS Project_name, Employee.Name AS Employee_name FROM Project

CROSS JOIN Employee

Output:

UNION

Union query can be used to combine the result set of two or more SELECT statements without any data duplications in the result set. In our example, we are selecting the names from both the Project & Employee tables.

SQL Query:

SELECT NAME FROM PROJECT 
UNION 
SELECT NAME FROM EMPLOYEE;

Output:

We have gotten an output without any duplicates.

UNION ALL

If there is a scenario where you don’t want the duplicates to be removed when using the UNION query, you can make use of this query. So we are selecting the same data from the same tables as before.

SQL Query:

SELECT NAME FROM PROJECT 
UNION ALL
SELECT NAME FROM EMPLOYEE;

Output:

But we can see that the ‘ETL Testing’ duplicate hasn’t been removed.

Related Blogs

An End-to-End DBT Tutorial for Testing

Importance of ETL Testing

FILTERING DATA FROM THE TABLE

We can filter the data from a SQL query by adding CLAUSE, OPERATORS, and DISTINCT to the statement. There are different types of filters that you can use to get customized results based on your needs. That is what we’re going to look at now in our SQL Queries check sheet.

CLAUSES

SQL clauses are part of SQL statements that contain an expression, predicate, or query. We will be seeing an example for each of the 5 clauses we have listed below as they have their own use case.

WHERE
GROUP BY
HAVING
ORDER BY
LIMIT

WHERE

It is used to filter the rows in the result set based on a specific condition. The specific condition is usually an expression that involves one or more columns from the table being queried.
For example, we are going to filter data using the name in the Employee table. We have taken the name ‘Tony’ in the below example.

SQL Query:

SELECT * FROM EMPLOYEE
WHERE NAME='Tony'

Output:

GROUP BY

It is used to group rows in the result set according to specified columns or expressions. The GROUP BY clause is usually used along with aggregate functions such as SUM, AVG, and COUNT.
So in the below example in our SQL Queries cheat sheet, we have grouped the cities to identify the number of times each city is present in the same Employee table.

SQL Query:

SELECT CITY,COUNT(*) FROM EMPLOYEE 
GROUP BY CITY;

Output:

HAVING

It is usually used alongside the GROUP BY clause to filter groups in the result set based on specific conditions. It works in a way that is similar to the WHERE clause with the difference being that it operates on the grouped data.

So in the below example, we have grouped the results data by cities that have a count that is greater than 1.

SQL Query:

SELECT CITY,COUNT(*) as COUNT FROM EMPLOYEE 
GROUP BY CITY HAVING COUNT>1;

Output:

ORDER BY

It is easy to guess the functionality of this clause with just the name. By default, the result set is sorted in Ascending order. And this clause can be used to sort rows in the result set in ascending or descending order. It also supports multiple sorting criteria if you have more complex sorting needs.
So we’re going to change the sorting order of the cities from ascending to descending in the Employee table.

SQL Query:

SELECT CITY FROM EMPLOYEE 
ORDER BY CITY DESC

Output:

LIMIT

The last clause we are going to see in our SQL Queries cheat sheet is the LIMIT clause. You could be working with large amounts of data and may not want a very long list of data in your results. So you can limit the number of rows returned from a query by using the LIMIT clause.

So we have limited the number of rows returned from our query to 4 in the below example.

SQL Query:

SELECT * FROM EMPLOYEE 
LIMIT 4;

Output:

OPERATORS

An operator is a symbol or keyword that is used to perform an action in SQL. Operators can be used to search, combine, calculate, or compare data in a database.

The 4 different types of operators we are going to see in our SQL Queries cheat sheet are,

LIKE
ARITHMETIC
COMPARISON
LOGICAL

LIKE OPERATORS

The LIKE operator allows you to perform wildcard searches using the two major wildcards, the percentage ( % ) and underscore( _ ). The placement of the (%) symbol is particularly important when you define the pattern you want to search for.

SYNTAX:

SELECT column_name1, column_name2, …
FROM table_name
WHERE column_name LIKE pattern

SQL Query:

If you want to find all words that start with the letter ‘P’, then you’ll have to specify the % symbol after your target letter as shown below.

SELECT NAME FROM EMPLOYEE
WHERE NAME LIKE 'P%';

Output:

It has found the words starting with the letter ‘P’.

SQL Query:

Likewise, if you wish to find words that are ending with a particular letter, you’ll just have to shift the position of the % symbol to before the targetted letter as shown below.

SELECT NAME FROM PROJECT
WHERE NAME LIKE '%g';

Output:

It has found the words ending with the letter ‘g’.

SQL Query:

But what if you have to find an exact word? You can enclose the targetted word within two % symbols as shown below. We’re going to search for data that have the word ‘Testing’ in any position.

SELECT NAME FROM PROJECT
WHERE NAME LIKE '%Testing%';

Output:

Data with the word ‘Testing’ has been found.

ARITHMETIC OPERATORS

Arithmetic operators are addition( + ), subtraction ( – ), multiplication ( * ), division ( / ), and modulus( % ). They can be used to perform arithmetic calculations. We can also use these operators in WHERE clauses.

SQL Query:

SELECT 1+2 as ADDITION, 4-2 as SUBTRACTION, 4*2 as MULTIPLICATION, 
4/2 as DIVISION, 8%3 as MODULUS

AND

If you want to check two conditions are true, you can use this logical operator as shown in the example. Since there could be multiple people with the same name, we are going to find a person named ‘Robert’ with 2201 as the ID.

SQL Query:

SELECT * FROM EMPLOYEE 
WHERE NAME='Robert' AND id =2201

Output:

OR

When it comes to OR, it’ll extract data that satisfy any one of the two mentioned conditions. In our example, we will be searching for a particular person named ‘Robert’ or with 2200 as their ID. Though Robert’s ID isn’t 2200, it will string bring that data as the name is correct.

SQL Query:

SELECT * FROM EMPLOYEE 
WHERE NAME='Robert' AND id =2200

Output:

NOT

It can be used to check if the condition is not satisfying. So if you want everyone’s information other than Robert’s, you can use this operator as shown below in our SQL Queries cheat sheet.

SQL Query:

SELECT * FROM EMPLOYEE 
WHERE NOT NAME='Robert'

Output:

BETWEEN

You can even assign upper and lower limits to extract data that are in that range. So we will be getting the list of employees who have their IDs between 22202 and 2204.

SQL Query:

SELECT * FROM EMPLOYEE 
WHERE ID BETWEEN 2202 AND 2204

Output:

DISTINCT

It is used to return unique records in a result set by eliminating the duplicate values in a specific column or expression list to return only one instance of each distinct row. There are multiple cities mentioned in our Employee table and there are a few duplications as well. So we will be using this SQL query to get only the unique cities from that table.

SQL Query:

SELECT DISTINCT CITY FROM EMPLOYEE

Output:

CASE EXPRESSION

It is an expression that works the same as an If Then Else loop. It will evaluate the condition one by one to find the first out which is the first condition to be true and then returns a corresponding result.

So we will be identifying the cities of the employees and printing an appropriate statement.

SYNTAX

CASE
     WHEN condition1 THEN result1
      WHEN condition2 THEN result2
      ELSE result3
END

SQL Query:

SELECT NAME, CITY,
CASE
 WHEN CITY='Willisburg' THEN 'He belongs to Willisburg'
    WHEN CITY='Boston' THEN 'He belongs to Boston'
    ELSE 'He does not belong to Willisburg and Boston'
end as Belongs_To
FROM employee;

So using the above SQL statement, we are selecting the name, city, and case statement from the Employee table that we have. The function of the case statement is to check whether the employee belongs to Willisburg or Boston and then output a string that reflects that. If an employee doesn’t belong to any of the two cities, it will output a message saying that the person does not belong to either as shown below.

Output:

DROP

The DROP keyword is used to remove the data from a database table or object. It can also be used to delete entire tables, individual rows, or specific columns from a table. So let’s take a look at the queries you’ll need to perform these actions in our SQL Queries cheat sheet.

DROP DATABASE

It is used to delete an entire database.

SYNTAX:

DROP DATABASE database name;

DROP TABLE

It is used to delete a particular table.

SYNTAX:

DROP TABLE table name;

DROP COLUMN

It is used to delete a particular column from a particular table.

SYNTAX:

ALTER TABLE table name
DROP COLUMN column_name;

Related Blogs

The Importance of SQL for QA Engineers

The 5 Basic SQL Queries Explained with Examples

Conclusion

We hope you are now clear on what the important SQL queries are and how you can use them after reading our SQL Queries cheat sheet. Being a leading Data Analytics testing company, we are sure that these SQL Queries cover all the important ones that you will definitely need to know. We will also be publishing informative blogs such as this, so make sure you subscribe to our newsletter to never miss out on any of our updates.

The Importance of SQL for QA Engineers

by Hannah Rivera | Jan 18, 2023 | Analytics Testing, Blog | 0 comments

SQL for QA Engineers is a very important skill set to have to become a complete tester. Though you may not need to know SQL to perform front-end testing, it becomes a must-have to perform back-end testing. The reason is that you’ll have to verify the values in the database that were inserted, updated, or deleted to be sure there are no errors after processing. Being a leading Data Testing Services company, we have a team of testers who are experts in SQL. So in this blog, we will be explaining why SQL for QA Engineers is important by mentioning the different types of tests that will require SQL knowledge. We will also be using examples to help you understand the concepts with ease.

The Importance of SQL for QA Engineers

Data Validation
Data Setup and Cleaning
ETL Testing
Data Migration Testing
Data Integration Testing
Debugging
Usability Testing
Performance Testing
Security Testing

Data Validation

Data validation refers to checking the accuracy and quality of source data before using, importing, or processing the data. As it is one of the building blocks of backend testing, SQL for QA engineers plays a pivotal role. You can use SQL queries for retrieving specific data from the database and comparing it with the expected output to see if it meets the expected requirements.

Example: Let’s consider a scenario where you have to test an eCommerce application that stores customer data in the database. Though the customer data might have been collected properly, you’ll have to verify if the specific person’s data such as their name and address are accurate by retrieving them using SQL.

Data Setup and Cleaning

You might even need to set up test data in the database before beginning a test. So you can insert, edit, or delete data using SQL and clean up the test data once the tests have been completed.

Example: A medical application that stores patient data in a database will be a good example for this scenario. Even before the test begins, you can use SQL to insert test data for a patient, such as their contact information and medical background to see if everything is working as expected.

ETL Testing

ETL testing is one of the main reasons why SQL for QA engineers is so important. It validates the procedure used for extracting data from diverse sources, transforming it into the desired format, and loading it into a target database. The ETL (Extract, Transform & Load) method is used to combine data from several systems and guarantee its accuracy and coherence.

In order to verify that the data is accurate, you can use SQL to retrieve it from the source and destination databases and compare it.

Example: Let’s assume you have to perform ETL testing to see if the data from the sales database is properly modified and inserted into the reporting database. You can use SQL queries to extract data from both the sales & reporting databases and compare them to see if they are accurate.

Data Migration Testing

There will come a scenario where data from one database has to be migrated to another. This is different from ETL testing as the data will not be modified during the process; it will only be migrated.

Once the migration process has been completed, it is critical to ensure that the data is accurate and complete. So you can create SQL queries to retrieve data from both the source and destination databases and compare the results.

You can also utilize your SQL expertise as a tester to confirm the precision and comprehensiveness of the data being moved.

Example: Let’s assume that a company is moving employee data from an outdated HR Database to a new one. To check if all the employee data has been migrated properly, you can use SQL to extract the data for a specific employee from both the old and new databases and compare it.

Data Integration Testing

Whenever data from various systems or applications are merged or integrated, it has to be tested to ensure that the integrated data are accurate and consistent. SQL for QA Engineers will come in handy here as you can use SQL queries to retrieve and compare the data from several databases.

Example: Consider that client information from two databases; one for sales and one for customer support are being merged or integrated. To make sure that the client’s data is consistent across both systems, you can use SQL to retrieve the data for a single customer from both databases and compare it.

Debugging

Identifying the root cause of a bug or an issue is also a part of a good tester’s job. This is yet another reason why SQL for QA engineers is important. You’ll have to use SQL queries to identify the root cause of the issue.

Example: Let’s take the example of a banking application that is not logging all the transactions properly. You can use SQL to query the database to check for faults or data inconsistencies.

Usability Testing

Testing a software application’s usability involves determining how simple it is for users to complete tasks and reach their objectives. Likewise, you can test a database’s usability by running a variety of queries and making sure the desired results are provided.

Example: Let’s imagine that you are testing a database application that enables users to perform keyword searches on data. In such a scenario, SQL can be used to run a search query and make sure the expected results are returned. This might assist you in determining whether the database is simple enough for users to use and navigate.

Performance Testing

SQL for QA engineers has more use cases than just functional testing. You can use SQL to even test the performance of the database. Few databases might hold enormous volumes of data and face performance problems or bottlenecks.

Example: For this scenario, let’s take a social media platform as it will have a large user base using the platform from different parts of the world. You can use SQL to assess how well the database handles the load with different traffic conditions.

Security Testing

Whenever unvetted data gets used in SQL statements, there is a chance for a security vulnerability known as a SQL injection attack to happen. This vulnerability enables an attacker to run any SQL command they choose and obtain unauthorized access to a database.

As a tester, you can test a database’s security by attempting SQL injection attacks. SQL queries with suspicious input can be written in an attempt to exploit the vulnerability and obtain unauthorized access to the database.

Example: Let’s take a secure login that requires a username and a password. You can try to launch a SQL Injection attack to bypass the login form and access the database without authorization by submitting a username and password that contain malicious SQL commands.

Related Blogs

Importance of ETL Testing

Data Quality Checks for Data Warehouse/ETL

An End-to-End DBT Tutorial for Testing

Conclusion:

With the help of SQL, you can significantly improve your capacity to test and debug database-dependent programs. Being an experienced data analytics testing company, our testers are experts in using SQL in their tests. So we hope our examples and the blog on the whole have helped you understand the importance of SQL for QA Engineers. We will be publishing informative blogs on a weekly basis and recommend you subscribe to our Newsletter to never miss out on any of our content.

The 5 Basic SQL Queries Explained with Examples

by Hannah Rivera | Dec 5, 2022 | Analytics Testing, Blog | 1 comment

SQL (Structured Query language) can be used to interact with relational database management systems. It is designed for the retrieval, creation, manipulation, updation, storage, and management of data in a relational database. For a better understanding, we can compare SQL to a car key. We just use the key or push the button to start the car without worrying about the background process that happens automatically. We just have to throw simple SQL queries & statements to manage and retrieve the data from the database without having to understand or work with it under the counter of the internal database operations. In addition to that, both proprietary and open-source relational database management systems such as Microsoft SQL Server, Oracle Database, IBM DB2, and MySQL have been built around SQL. That is why it is important to know the 5 basic SQL queries we’ll be exploring in our blog.

SQL syntax is fairly simple and consists of commands as written statements like CREATE, TABLE, and ALTER COLUMN. These commands can be used to modify database tables & index structures, add, update, & delete rows of data, and retrieve subsets of information from the database for transactional processing and analytical operations. Being a leading data analytics testing company, we have vast experience in performing such actions in our testing process. But the basic SQL queries we will be discussing in this blog will be beneficial to everyone in general and not just for testers. But before that, let’s find out what is an SQL query and why you will need it.

What is an SQL query and Why do we need it?

While working with SQL language, you will use numerous keywords and SQL commands to compose statements and queries. Queries are basically requests we can use to communicate our requirements to the database and get the appropriate outputs. You can use SQL queries to pre-aggregate, filter, and select only the data necessary for the project you are working on. The selected data are generally called views. So if you have a table that contains millions of rows of data, creating a view of them will enable you to perform analysis and arrive at results much faster. Let’s now take a look at the 5 basic SQL queries that will help you create a view.

Getting Data from Columns using SELECT & FROM
To fetch non-repeated records by using DISTINCT
Filtering Data based on requirements by using WHERE Clause
Query using Not in operator
Sort the Result set by using the ORDER BY Keyword

Getting Data from Columns using SELECT & FROM

SELECT * FROM WORKER_TABLE;

Explanation:

SELECT – It is used to specify the column of the table from which the data has to be fetched & displayed. In the example, we have used ‘*’ instead of a specific column name as ‘*’ is used to denote all the columns.

FROM – It is used to specify the database from which the data has to be fetched from. In our query, we have specified ‘worker_table 1’ to be the database we are gathering data from.

Output:

To fetch non-repeated records by using DISTINCT

SELECT DISTINCT DEPT AS DEPARTMENT FROM WORKER_TABLE

Explanation:

There is a high chance that the database we selected has a lot of duplicate values. So DISTINCT is the next query we’re going to see from our list of basic SQL queries as it can be used to retrieve only the unique data and avoid duplication.

By using DISTINCT in the above example, we will be collecting unique data from the dept column shown in the worker_table 1 and show it under the alias name ‘Department’. An Alias is generally created to give a temporary name to the unique data by using AS as the keyword.

Output:

Filtering Data based on requirements by using WHERE Clause

WHERE EMPLOYEE_ID >105

Explanation:

We have now used 2 of our basic SQL queries to extract data that has no duplicates. But in most scenarios, you’ll need more refinement than that. So you can use the WHERE clause to filter the data based on your requirements.

Let’s say you want to get the data of the employees who have joined the company after a specific date. So you can set a requirement that the employee_id should be greater than 105.

Output:

Query using Not in operator

SELECT * FROM WORKER_TABLE 
WHERE FIRST_NAME NOT IN ('Zoe', 'Sadie')

Explanation:

You can further drill down your results by adding exceptions using the NOT IN command. NOT IN is an important command that can be used along with the WHERE clause. It can be used to retrieve data from rows that don’t satisfy the specified condition.

In this case, we are retrieving a list of the employees whose first names are not Amitabh and Aruna. Here, Amitabh and Aruna are the exceptions that we wanted to avoid.

Output:

Sort the Result set by using the ORDER BY Keyword

SELECT * FROM WORKER_TABLE
ORDER BY DEPT DESC;

Explanation:

By using all the basic SQL queries, you would now be able to have a refined set of data as per your need. The last query we are going to discuss, ORDER BY can be used to sort the result we have according to the specified column.

You have to specify the column by which the data has to be sorted by placing it after the ORDER BY keyword. By default, the column will be alphabetically sorted (i.e) asc in ascending order. But if you wish to fetch the rows in descending order, you can do so by adding ‘desc’ after the column name as mentioned above.

Output:

Conclusion

Hopefully, you now have a clear understanding of what makes these 5 basic SQL queries so important. They lay the foundation that you can build upon to grow your expertise in SQL. We have been providing data testing services for many years and our testers have found SQL to be very important. We will be publishing more informative content, so make sure you subscribe to our newsletter to never miss out on any of our latest content.

What Makes Data Analytics in Manufacturing Beneficial

by Charlotte Johnson | Mar 28, 2022 | Analytics Testing, Blog | 0 comments

Manufacturing is, for the most part, the world economy’s engine. No matter what economy is in question, it’s one of the most valuable industries with many benefits. Two of the largest manufacturing hubs in the world, the USA and China, have a GDP whose contributions were at $2.3 trillion and $4 trillion respectively.

Manufacturing Is Key

Accessing today’s beloved services and products would be impossible without manufacturing. Phones we use for communication, cars and other vehicles that are driven…there’s so much going on in the world that the manufacturing industry delivers on.

The approach people have taken to manufacturing over the years has shifted since the industrial revolution. Streamlining the process is something people are consistently trying to find new ways to achieve. Manufacturing processes of big companies are outsourced to offshore plants that are more efficient. Production processes are consistently being automated so that quality improves and the chances of human errors lower considerably.

There are many companies that invest in the latest developments in manufacturing technology in order to increase their productivity. As a result, modern manufacturers can now produce thousands of products with minimal human involvement.

Data Analytics In Manufacturing: The Advantages

All told, manufacturing takes a huge chunk of means to improve through data analytics. There are so many benefits that go a long way over time. Some of the most important ones include, but are not limited to:

Automated Manufacturing Gets Empowered By Analytics Algorithms

Manufacturers can cut down on errors and miscommunications with data analytics that are integrated throughout their business. This, in turn, leads human interaction to be minimized in terms of quality control of finished products and raw material delivery.

Data Analytics Helps Product Design Improve

Product design to the market used to involve plenty of trial and error. If anything, the initial iteration of a product design did not really get a warm welcome from customers. This is usually from the design and ergonomics being far from ideal.

Thanks to the advancement of technology in today’s modern times, data analytics allow manufacturers to test designs for ergonomics and efficiency without a physical prototype being made.

Traditionally, automotive manufacturers have used hardware-intensive methods to create car design mock-ups in order to test them in a wind tunnel. However, the advent of data analysis and artificial intelligence has allowed manufacturers to use software to test a design before making physical copies of it.

Product Management Gets More Efficient

Being able to project how many products to manufacture is critical for any manufacturer. Overproducing may cost millions of dollars, but the alternative—underestimating market demand—could tarnish a company’s reputation if it cannot meet customer demands in a timely fashion.

Making the most of profit is best done by manufacturers through accurate demand estimation.

Conclusion

Since the industrial revolution, people’s views on manufacturing as a whole have taken a turn. Today’s data analytics plays a major role in the manufacturing industry. Benefits include more efficiency in product management and improvements to product design without unnecessary wasting of any resources.

Trying to find manual testing companies to work with on data analytics? Contact Codoid today! We’re an industry leader in Quality Assurance that’s passionate about helping to lead and guide the whole QA community.

An End-to-End DBT Tutorial for Testing

by admin | Aug 2, 2021 | Analytics Testing, Blog, Latest Post | 1 comment

Currently, we are extracting the data from different source systems, transforming it, and then finally loading it into the data warehouses. But when it comes to DBT, the raw source data from different source systems are extracted and directly loaded into the warehouse. After which, DBT allows us to create new materialized tables into which the transformed data is fed as it helps showcase the meaningful data for business analysis purposes by using simple select statements. But what makes DBT stand tall is it allows both the development and production environment teams to work in the same place by creating models & testing the scripts for transformation purposes. Being one of the leading QA companies, we will be mainly focusing on the testing part of the above process in this DBT Tutorial by exploring how we test our models.

What is DBT?

If you’re new to the DBT tool, worry not, we have got some basics in this DBT Tutorial for you as well. The Data Build Tool (DBT) is an open-source test automation tool and a command-line tool. It mainly focuses on the transformation part in the “Extract load and transform” pipeline. DBT allows both data analysts and data engineers to build the models and transform the data in their warehouses to make it meaningful.

New to DBT tool?

Let’s walk you through some of the basics to help you get a quick overview of DBT and the necessary steps that have to be done to create an account. Basically, there are two ways for DBT to be used:

1. DBT CLI(Command Line Interface):

The DBT CLI variant allows users to write their scripts and test them locally by using the DBT command-line tool.

2. DBT Cloud:

The DBT Cloud is a facilitated variant that smooths out an advancement with an online Integrated Development Environment, an interface to write our DBT test scripts and run them on a schedule.

In this DBT Tutorial, we will be covering the DBT cloud variant. It is worth noting that the ideas and practices can be stretched out to the CLI variant as well. So let’s get started.

Data warehouse:

As stated earlier, the DBT is used to handle the transformation part of the ELT for data warehousing. “But how does it work?” you might ask. Well, DBT helps us create a connection with the warehouse and then lets us write the SQL simple select statements against the warehouse to transform the data.

Supported Warehouses:

We have listed the names of the supported warehouses in DBT, and they are

• Postgres

• Redshift

• BigQuery

• Snowflake

• Apache Spark

• Databricks and

• Presto (Partially Supported).

We would be using the Snowflake warehouse in this blog for demonstration purposes.

DBT Tutorial for the Setup:

Now that we have seen an overview of DBT, let’s find out how to set up the DBT cloud and use its interface. If you’ve not yet created an account in DBT, you can sign up for DBT cloud by visiting their Sign up page.

Once you are done with your DBT cloud sign-up process, let’s find out how to set up your very first DBT project.

DBT recommends using GitHub to store our deployments. So the first step would be to create an empty repository in GitHub as shown below.

Enter your repository name and click on ‘Create repository’.

Once you have signed up with your DBT cloud, just select the snowflake warehouse and start configuring it using your snowflake credentials. We have shown a sample connection in the below images for your reference.

After entering your warehouse and database details, make sure you have entered the snowflake user name and password in development credentials. Next, click on ‘Continue’ to get this confirmation message “Connection Test Success”. Now, you have to click on the ‘Continue’ button for configuring the GitHub. Since you’re done with GitHub configuration, we’re good to start with DBT.

DBT Tutorial for its Interface:

DBT has a lot of inbuilt options that help access and understand it easily. After setting up with DBT, the interface of DBT that we get to see initially will look like how it is shown in the image below.

As you can see, DBT might initially look a bit empty and even show that the compilation error at the bottom right. So we would have to click on ‘Initialize your Project’ to start our first project. Doing this will make the DBT will provide us with a skeleton as shown in the below image for our project automatically.

We can clearly see that the compilation message that was displayed at the right bottom has now changed to become ‘ready’, implying that it is time to start our project. So let’s get started.

First off, let’s find out the scope of some of the functions,

1. The commit button will handle all our git configurations like creating a new branch, pull request, commit, merge, etc.

2. After writing any SQL statements we can check the query then and there by using the dbt “preview data” and “compile SQL” buttons.

3. We can run and test our models by writing simple DBT commands (e.g. dbt debug, dbt run, dbt test).

4. DBT also generates documentation that will come in very handy for analytical purposes as it will contain all the details about the project. You can access it by clicking on the “view docs” option as shown in the above pic.

Note: Before running any model or a test, make sure that you have saved it.

Loading training data into the Warehouse:

So as stated earlier, DBT cloud currently supports only BigQuery, Postgres, Redshift & Snowflake warehouses, and we have decided to use Snowflake for the demo. So to illustrate how to import or load your training data into the warehouse, we will be loading the two tables that we have created.

Snowflake:

It is important to note that we have written the below instructions assuming that you have the database named compute_wh in your Snowflake account. So make sure that you have the required privileges to create the objects in this database.

Run the following commands in your Snowflake Warehouse SQL Runner:

To create the Database:

Create database STUDENT;
Create Schema STUDENT_INFO:
Use schema STUDENT_INFO;

To create the Tables:

Table 1: Student_Profile

Table 1 will contain the personal information of the students.

Create table student_profile (
  S_no int not null,
  Std_id int not null,
  Std_name varchar(45) null,
  Std_dep varchar(45) null,
  DOB datetime null,
  Primary key (Std_id));
  insert into student_profile (S_no, Std_id, Std_name, Std_dep, DOB) values ('1', '294', 'Afshad', 'CSE', '03/10/1998');
insert into student_profile (S_no, Std_id, Std_name, Std_dep, DOB) values ('2', '232', 'Sreekanth', 'Mech', '02/09/1997');
insert into student_profile (S_no, Std_id, Std_name, Std_dep, DOB) values ('3', '276', 'John', 'EEE', '10/06/1998');
insert into student_profile (S_no, Std_id, Std_name, Std_dep, DOB) values ('4', '303', 'Rahul', 'ECE', '12/05/1997');
insert into student_profile (S_no, Std_id, Std_name, Std_dep, DOB) values ('5', '309', 'Sam', 'Civil', '09/04/1999');
insert into student_profile (S_no, Std_id, Std_name, Std_dep, DOB) values ('6', '345', 'Ram', 'CA', '03/11/1998');
insert into student_profile (S_no, Std_id, Std_name, Std_dep, DOB) values ('7', '625', 'Priya', 'CSE', '03/12/1996');
insert into student_profile (S_no, Std_id, Std_name, Std_dep, DOB) values ('8', '739', 'Bob', 'MEC', '06/07/1998');
insert into student_profile (S_no, Std_id, Std_name, Std_dep, DOB) values ('9', '344', 'Ganesh', 'Mech', '07/09/2024');
insert into student_profile (S_no, Std_id, Std_name, Std_dep, DOB) values ('10', '123', 'Neha', 'ECE', '12/09/1998');

Table 2: STUDENT_RESULTS

Table 2 will contain the results of those students.

Create table student_results (
  S_no int not null,
  Std_id int not null,
  Marks int null,
  Result varchar(45) null,
  Primary key (Std_id));
insert into student_results (S_no, Std_id, Marks, Result) values ('1', '294', '78', 'Pass');
insert into student_results (S_no, Std_id, Marks, Result) values ('2', '232', '56', 'Pass');
insert into student_results (S_no, Std_id, Marks, Result) values ('3', '276', '88', 'Pass');
insert into student_results (S_no, Std_id, Marks, Result) values ('4', '303', '67', 'Pass');
insert into student_results (S_no, Std_id, Marks, Result) values ('5', '309', '38', 'Fail');
insert into student_results (S_no, Std_id, Marks, Result) values ('6', '345', '90', 'Pass');
insert into student_results (S_no, Std_id, Marks, Result) values ('7', '625', '87', 'Pass');
insert into student_results (S_no, Std_id, Marks, Result) values ('8', '739', '45', 'Fail');
insert into student_results (S_no, Std_id, Marks, Result) values ('9', '344', '97', 'Pass');
insert into student_results (S_no, Std_id, Marks, Result) values ('10', '123', '49', 'Fail');

Now that you have the training data in your warehouse, you are one step away from starting your DBT testing.

DBT Tutorial for the available Models:

In analytics, the process of modeling is changing the data from being the raw data to the final transformed data. Typically the data engineers are responsible for building tables that represents your source data, and on top of that, they also build the tables/views that transform the data step by step.

The models are just SQL select statements in your dbt project. These models are created inside the Models directory with the .sql extension.

What’s great about dbt is you don’t need to know DDL/DML to build the tables or views.

Note: The name of the file is used as the model or table name in your warehouse.

Creating Models:

Inside the model’s directory, we have created two simple models for better understanding.

Model 1:Student_profile.sql

{{ config(materialized='table') }}

with source_data as (
select 
    S_No,
    Std_Id,
    Std_Name,
    Std_Dep,
    DOB
 from 
STUDENT.STUDENT_INFO.STUDENT_PROFILE

)

select *
from source_data

Model 2: Student_Results.sql

{{ config(materialized='table') }}

with source_data as (
select 
    S_No,
    Std_Id,
    Marks,
    Result 
 from 
STUDENT.STUDENT_INFO.STUDENT_RESULTS

)

select *
from source_data

Once you’ve built your models, make sure to save and run the models to load them into the warehouse. You can use the ‘dbt run’ command to run your created models, or if you want to run any particular model, you can use the ‘dbt run –Student_Profile’ command.

DBT will show detailed information about the models when you run your models as shown in the above image. So we have successfully completed all the required steps and can move forward to find out how to perform testing in DBT.

DBT Tutorial to perform Testing:

We all know that testing plays a major role in deployment, so tests in analytics are just assertions that you have about your data. But what makes these assertions important is that if these assertions are met, it’ll be instrumental in helping others trust you by looking at the data.

The great part of DBT testing is that by default it provides four data quality checks. They are Unique, Not Null, Relationships (Referential Integrity), and Accepted Value checks.

Basically, DBT provides us with two types of testing. So let’s take a look at the steps to be done to perform the first type, the Schema test.

1. Schema Test:

Create a YAML file in the same model’s directory with a .yml Extension (Ex. Schema.yml).

Configure your tests (Unique, Not Null, Relationships, and Accepted Values) in the YAML file based on your table data and run it by using the command “dbt test”.

Note: Yaml is a data serialization and human-readable language that can be easily understood even by non-technical people. It is mainly used for configuration purposes as it supports all programming languages.

We have attached one YAML file for your reference below.

Schema.yml:

version: 2

models:
    - name: student_profile
      description: "Table contains all the information about the students"
      columns:
          - name: std_id
            description: "The primary key for this table"
            tests:
                - unique
                - not_null

    - name: student_results
      description: "Table contains the student results info"
      columns:
          - name: std_id
            description: "The primary key for this table"
            tests:
                - unique
                - not_null
                - relationships:
                    to: ref('student_profile')
                    field: std_id
          - name: result
            description: "Result of a student"
            tests:      
                - accepted_values:
                    values:
                      - pass
                      - fail

Once you run this test, you will get detailed information on all your tests. If you need to run only one particular schema test, you can use this command “dbt test –m Student_results”.

2. Data Test:

But if you are looking to write custom test scripts based on the requirements against the data, that also can be done by using tests of this type.

Start by creating the SQL files with your test name inside the test directory of the DBT project as shown below.

Now, add your test script to your test file.

Duplicate.sql:

select std_id, count(std_id) 
  from {{ref('student_profile')}} 
  group by std_id 
  having count(std_id)>2

referential_integrity.sql:

select std_id 
from {{ref('student_profile')}} 
where std_id not in (
    select std_id 
    from {{ref('student_results')}} )

Note:

Here ref function is used to refer one or more specific files into the script.

Finally, you can run the test by using the “dbt test –data” command.

DBT Tutorial for Documentation:

As discussed earlier, documentation is one of the greatest features of DBT. So once your project is completed, all you have to do is use the “dbt docs –generate” command to generate the document.

After running the command, just click on “view docs” at the top left of the DBT interface to view the complete information about your project in the form of a document. It even generates a lineage graph like the one shown below for a crystal clear understanding.

DBT also supports filters in the lineage graph, which is a great advantage when it comes to analysis.

Conclusion:

We hope that you have enjoyed reading this DBT Tutorial blog and also hope that DBT would be a great addition to your toolkit. To sum things up, DBT helps carry out the most complex and heavy transformations in a simple and easy manner by using simple select statements. It enables both data engineers and data analysts to work in the same environment and provides a unique experience for them to transform the data. The models in DBT enable faster execution and also make it so easy to test and modify. Finally, it even shows detailed information about the project by generating the document along with a lineage graph for clear analysis purposes. All thanks to such awesome features, DBT has been a resourceful tool that has helped us deliver the best software testing services to our clients.

Frequently Asked Questions

What is dbt tool used for?

DBT (Data Build Tool) is used to perform the Transform part of the ETL (Extract Transform Load) process in a simplified manner by writing transformations as queries and orchestrating them effectively.
What is DBT testing?

DBT testing is the process of ensuring the data you have is reliable by using assertions you have about the data as tests. There are 2 types of dbt testing; namely Schema Test and Data Test.
What are the four generic tests that dbt ships with?

The 4 generic tests that dbt ships with are Unique, Not Null, Relationships (Referential Integrity), and Accepted Value checks.
What does dbt stand for data?

DBT stands for Data Build Tool and as the name suggests, dbt can be used to transform data in warehouses in an effective manner.
Can I use dbt for free?

Yes, you can use dbt Core for free as it is an open-source tool released under an Apache 2.0 License. But dbt Cloud is not a completely free tool as few features are restricted in the free version.

Importance of ETL Testing

by admin | Jun 8, 2019 | Analytics Testing, Blog | 0 comments

The ETL Testing method can eliminate possible human or system errors during the process of transfer of data from the extraction stage to the loading stage. Any human/system errors in the process can result in the transmission of inaccurate information across the enterprise. ETL Testing can eliminate the errors since it covers diverse strategies such as Data Completeness, Data Correctness, Data Integrity, Data Reconciliation, Data Transformation, Data Quality, and System Performance and Scalability.

ETL Testing is a critical testing procedure for Data Warehousing projects since it helps to build, control, and protect integrated or migrated data. The ETL Testing process validates, verifies, and qualifies data in order to avert data loss and duplicated records. ETL Testing is gaining importance in the present times since Agile, DevOps, Artificial Intelligence, and Cloud Technologies are fast emerging as the top trends in software development industry. Work with the best – connect with us.

« Older Entries

Category Selected: Analytics Testing

People also read

Categories

Subscribe

Talk to our Experts

Amazing clients who trust us

SQL Queries Cheat Sheet

Database

CREATE Database

USE DATABASE

CREATE A TABLE

CONSTRAINTS

DESC

INSERT VALUES INTO THE TABLE

INSERT SINGLE VALUE

SELECT

GET DATA FROM A SINGLE TABLE

All Columns

Specific Column Values

GET DATA FROM MULTIPLE TABLES

JOINS

INNER JOIN

LEFT JOIN

RIGHT JOIN

FULL JOIN

CROSS JOIN

CROSS JOIN Employee

UNION

UNION ALL

Related Blogs

FILTERING DATA FROM THE TABLE

CLAUSES

WHERE

SQL Query:

GROUP BY

HAVING

ORDER BY

LIMIT

OPERATORS

LIKE OPERATORS

ARITHMETIC OPERATORS

COMPARISON OPERATOR

EQUAL TO ( = )

NOT EQUAL TO ( != )

GREATER THAN ( > )

LESSER THAN ( < )

IN OPERATOR

LOGICAL OPERATORS

AND

OR

NOT

BETWEEN

DISTINCT

CASE EXPRESSION

DROP

DROP DATABASE

DROP TABLE

DROP COLUMN

Related Blogs

Conclusion

The Importance of SQL for QA Engineers

Data Validation

Data Setup and Cleaning

ETL Testing

Data Migration Testing

Data Integration Testing

Debugging

Usability Testing

Performance Testing

Security Testing

Related Blogs

Conclusion:

What is an SQL query and Why do we need it?

Getting Data from Columns using SELECT & FROM

Explanation:

Output:

To fetch non-repeated records by using DISTINCT

Explanation:

Output:

Filtering Data based on requirements by using WHERE Clause

Amazing clients who
trust us