by Arthur Williams | Feb 16, 2023 | Analytics Testing, Blog |
We all know that searching for a needle in a haystack is a very difficult or almost impossible task. We can figuratively compare it with searching for data in a large database or data warehouse. But with the help of SQL (Structured Query Language), you can do that with ease. Though having a strong grasp of all the SQL queries is important, a practical SQL Queries Cheat Sheet can be beneficial, especially if you’re a beginner.
We have split our SQL Queries Cheatsheet into different categories to help you easily locate the command you are looking for. We have categorized them based on the most common use cases such as creating a table, inserting a value into the table, and so on as mentioned below.
SQL Queries Cheat Sheet
Database
CREATE Database
Creating a new database is one of the most basic functions one must know and you can do so by using the CREATE DATABASE keyword. We have specified the syntax you’ll need and given an SQL command as well. Likewise, we would have followed a similar procedure throughout our SQL Queries Cheat Sheet to make it easier for you.
SYNTAX:
CREATE DATABASE database_name
SQL Query:
Here ‘Codoid’ is the database name
USE DATABASE
The USE keyword is used when there are multiple databases present and you want to use a specific database.
SYNTAX:
SQL Query:
Once again, ‘Codoid’ is the database name here.
Next up in our SQL Queries Cheat Sheet, we will be seeing how to create a table.
CREATE A TABLE
You can use the CREATE TABLE keyword to create a new table as shown in the below syntax. You’ll have to specify the datatype for the columns based on your needs. Once we have created a table, we’ll see how we can display it as well.
SYNTAX:
CREATE TABLE table_name
(
column1 datatype,
column2 datatype,
column3 datatype,
....
);
CONSTRAINTS
Using CONSTRAINTS, you will be able to define the rules for the columns while creating or updating the table. Though there are different types of CONSTRAINTS, we will be seeing only the most used ones in our SQL Queries Cheat Sheet. We have specified the CONSTRAINTS and briefly explained their purpose as well.
- NOT NULL – It doesn’t accept the NULL values in the column
- UNIQUE – It will accept only NULL values and not DUPLICATE values
- PRIMARY KEY – It doesn’t accept both NULL and DUPLICATE values
- FOREIGN KEY – It will create a link between the two tables. The FOREIGN KEY value is usually the primary key of the foreign table.
- DEFAULT – It will set a default value for a column if we don’t provide any particular values.
Now let’s create two new tables (Project and Employee) using the CREATE TABLE and CONSTRAINTS
SQL Query:
CREATE TABLE Project
(
ID int primary key, name varchar(100)
);
So in the above example, we are creating a table by the name ‘Project’ in the Codoid database. It has two columns (ID and Name). We have defined the ID column to have an integer data type and it is also set as the primary key. So the ID column must be unique and shouldn’t be null.
And the name column has a varchar data type with a maximum length of 100 characters.
SQL Query:
CREATE TABLE Employee (
Name Varchar(100) NOT NULL,
Contact bigint UNIQUE,
City varchar(100) DEFAULT 'Boston',
Id int,
ProjectId int,
PRIMARY KEY (Id),
FOREIGN KEY (ProjectId) REFERENCES Project(ID)
);
Same as in the prior example, we are creating another table in the Codoid database and assigning constraints. So the Employee table has 5 columns: Name, Contact, City, Id, and ProjectId.
- Name – Varchar type field with a max length of 100 characters and shouldn’t be left blank.
- Contact – bigint type where each value must be unique.
- City – Varchar type that can store 100 characters. If it is left blank, it will automatically assign “Boston” as the default value.
- Id – It is int type and is also the primary key for the table.
- ProjectId – int type and is a foreign key that references the ID column from the Project table we created earlier.
DESC
Now that we have created the table, you can display the table structure by using the DESC keyword.
SYNTAX:
SQL Query:
Here, ‘Employee’ is the name of the table.
Output:
INSERT VALUES INTO THE TABLE
Next up in our SQL Queries Cheat Sheet, we’ll be focusing on how to insert values in the table we have created. You can insert a single value or multiple values into your table based on your requirements.
INSERT SINGLE VALUE
We have given an example where we insert the single value ‘001’ into the Project table we had created earlier.
SYNTAX
INSERT INTO table_name VALUE ( value1, value2,…..);
SQL Query:
INSERT INTO Project VALUE (001, ‘ETL Testing’);
INSERT MULTIPLE VALUES
If you wish to insert multiple values into the table, you can do so by specifying the list of columns you will be inserting the data into and then mentioning the multiple values as shown below.
SYNTAX:
INSERT INTO table_name (column_list) VALUES (value_list_1),(value_list_2),….);
SQL Query:
INSERT INTO Employee (Name, Contact, City, Id, ProjectId)
VALUES
('Robert', '8594230688','Willisburg',2201,001 ),
('Peter', '6155805664','Smithville',2202,001 );
SELECT
We have seen how we can insert the values into the table, now let’s see how to display the values which we inserted into the tables by using the SELECT.
SYNTAX:
To print all the columns from the table.
SQL Query:
SYNTAX:
SELECT column_name1, column_name2,... FROM table_name
To print selective columns from the table
SQL Query:
SELECT id,contact,city FROM Employee
GET DATA FROM A SINGLE TABLE
There will definitely be scenarios where you’ll have to get data from a table. And so we will now be covering the different ways you can get data from a single table. There are ways to get data from multiple tables as well and we will be covering that next in our SQL Queries Cheat Sheet.
When it comes to getting data from a single table, you can either display all columns value from the table or choose which column values have to be displayed.
All Columns
So previously in our SQL Queries Cheat Sheet, we had created a table by the name ‘Employee’. It had details such as Name, ID, Contact, City, & ProjectID. So let’s see how to get all this data now.
SYNTAX:
SELECT * FROM table_name;
SQL Query:
Output:
Specific Column Values
If at all you need only one of the column values, you can use the below syntax to get it. In our example, we will be getting only the names of the employees from the table.
SYNTAX:
SELECT column_name FROM table_name
SQL Query:
SELECT Name FROM Employee;
This query will display only Name column values.
GET DATA FROM MULTIPLE TABLES
Based on your needs, you might encounter scenarios where you’ll have to get data from multiple tables. We have already created two tables by the names ‘Project’ and ‘Employee’ in our SQL Queries cheat sheet.
So let’s use these tables in our examples to illustrate the different SQL Queries listed below.
- Joins
- Inner Join
- Left Join
- Right Join
- Full Join
- Cross Join
- UNION
- UNION ALL
- Content
JOINS
As the name suggests, these SQL queries can be used to get data from multiple tables. As mentioned above, there are multiple types of Joins. Let’s take a look at them one by one in our SQL Queries Cheat Sheet.
INNER JOIN
Just JOIN or INNER JOIN can be used to get or display records that are present in both tables. In our example, we will be getting the names from one table and the project IDs from another. We have also shown the output along with the Syntax and SQL queries to make it easier for you to understand.
SYNTAX:
SELECT column_name1, column_name2 FROM table1
INNER JOIN table2 ON table1.column_name=table2.column_name
SQL Queries:
SELECT Project.Name, Employee.Name FROM Employee
INNER JOIN Project ON Project.Id=Employee.ProjectId
Output
LEFT JOIN
LEFT JOIN can be used to display all the rows from the LEFT table with corresponding rows from the RIGHT table. If there’s no matching row, NULLs will be returned as values from the RIGHT table.
SQL Query:
SELECT Project.Name, Employee.Name FROM Employee
LEFT JOIN Project ON Project.Id=Employee.ProjectId
Output:
RIGHT JOIN
Likewise, RIGHT JOIN can be used to display all rows from the RIGHT table with corresponding rows from the LEFT table. Same as before, NULLs values will be returned if there are no matching rows on the LEFT table.
SQL Query:
SELECT Project.Name, Employee.Name FROM Employee
RIGHT JOIN Project ON Project.Id=Employee.ProjectId
Output:
FULL JOIN
If you wish to get all the data from both the left & right tables, you can use the FULL JOIN SQL query. If there are no matching rows on either side, it will return NULL values.
SQL Query:
SELECT Project.Name, Employee.Name FROM Employee
FULL JOIN Project ON Project.Id=Employee.ProjetctId
Output:
Note: FULL JOIN is not accessible in MYSQL
CROSS JOIN
This join is different from the pattern of join queries we have seen earlier in our SQL Queries cheat sheet. It can be used to generate a paired combination of each row of the first table with each row of the second table. But it is different as it will eliminate null values from both tables.
SQL Query:
SELECT Project.Name AS Project_name, Employee.Name AS Employee_name FROM Project
CROSS JOIN Employee
Output:
UNION
Union query can be used to combine the result set of two or more SELECT statements without any data duplications in the result set. In our example, we are selecting the names from both the Project & Employee tables.
SQL Query:
SELECT NAME FROM PROJECT
UNION
SELECT NAME FROM EMPLOYEE;
Output:
We have gotten an output without any duplicates.
UNION ALL
If there is a scenario where you don’t want the duplicates to be removed when using the UNION query, you can make use of this query. So we are selecting the same data from the same tables as before.
SQL Query:
SELECT NAME FROM PROJECT
UNION ALL
SELECT NAME FROM EMPLOYEE;
Output:
But we can see that the ‘ETL Testing’ duplicate hasn’t been removed.
FILTERING DATA FROM THE TABLE
We can filter the data from a SQL query by adding CLAUSE, OPERATORS, and DISTINCT to the statement. There are different types of filters that you can use to get customized results based on your needs. That is what we’re going to look at now in our SQL Queries check sheet.
CLAUSES
SQL clauses are part of SQL statements that contain an expression, predicate, or query. We will be seeing an example for each of the 5 clauses we have listed below as they have their own use case.
- WHERE
- GROUP BY
- HAVING
- ORDER BY
- LIMIT
WHERE
It is used to filter the rows in the result set based on a specific condition. The specific condition is usually an expression that involves one or more columns from the table being queried.
For example, we are going to filter data using the name in the Employee table. We have taken the name ‘Tony’ in the below example.
SQL Query:
SELECT * FROM EMPLOYEE
WHERE NAME='Tony'
Output:
GROUP BY
It is used to group rows in the result set according to specified columns or expressions. The GROUP BY clause is usually used along with aggregate functions such as SUM, AVG, and COUNT.
So in the below example in our SQL Queries cheat sheet, we have grouped the cities to identify the number of times each city is present in the same Employee table.
SQL Query:
SELECT CITY,COUNT(*) FROM EMPLOYEE
GROUP BY CITY;
Output:
HAVING
It is usually used alongside the GROUP BY clause to filter groups in the result set based on specific conditions. It works in a way that is similar to the WHERE clause with the difference being that it operates on the grouped data.
So in the below example, we have grouped the results data by cities that have a count that is greater than 1.
SQL Query:
SELECT CITY,COUNT(*) as COUNT FROM EMPLOYEE
GROUP BY CITY HAVING COUNT>1;
Output:
ORDER BY
It is easy to guess the functionality of this clause with just the name. By default, the result set is sorted in Ascending order. And this clause can be used to sort rows in the result set in ascending or descending order. It also supports multiple sorting criteria if you have more complex sorting needs.
So we’re going to change the sorting order of the cities from ascending to descending in the Employee table.
SQL Query:
SELECT CITY FROM EMPLOYEE
ORDER BY CITY DESC
Output:
LIMIT
The last clause we are going to see in our SQL Queries cheat sheet is the LIMIT clause. You could be working with large amounts of data and may not want a very long list of data in your results. So you can limit the number of rows returned from a query by using the LIMIT clause.
So we have limited the number of rows returned from our query to 4 in the below example.
SQL Query:
SELECT * FROM EMPLOYEE
LIMIT 4;
Output:
OPERATORS
An operator is a symbol or keyword that is used to perform an action in SQL. Operators can be used to search, combine, calculate, or compare data in a database.
The 4 different types of operators we are going to see in our SQL Queries cheat sheet are,
- LIKE
- ARITHMETIC
- COMPARISON
- LOGICAL
LIKE OPERATORS
The LIKE operator allows you to perform wildcard searches using the two major wildcards, the percentage ( % ) and underscore( _ ). The placement of the (%) symbol is particularly important when you define the pattern you want to search for.
SYNTAX:
SELECT column_name1, column_name2, …
FROM table_name
WHERE column_name LIKE pattern
SQL Query:
If you want to find all words that start with the letter ‘P’, then you’ll have to specify the % symbol after your target letter as shown below.
SELECT NAME FROM EMPLOYEE
WHERE NAME LIKE 'P%';
Output:
It has found the words starting with the letter ‘P’.
SQL Query:
Likewise, if you wish to find words that are ending with a particular letter, you’ll just have to shift the position of the % symbol to before the targetted letter as shown below.
SELECT NAME FROM PROJECT
WHERE NAME LIKE '%g';
Output:
It has found the words ending with the letter ‘g’.
SQL Query:
But what if you have to find an exact word? You can enclose the targetted word within two % symbols as shown below. We’re going to search for data that have the word ‘Testing’ in any position.
SELECT NAME FROM PROJECT
WHERE NAME LIKE '%Testing%';
Output:
Data with the word ‘Testing’ has been found.
ARITHMETIC OPERATORS
Arithmetic operators are addition( + ), subtraction ( – ), multiplication ( * ), division ( / ), and modulus( % ). They can be used to perform arithmetic calculations. We can also use these operators in WHERE clauses.
SQL Query:
SELECT 1+2 as ADDITION, 4-2 as SUBTRACTION, 4*2 as MULTIPLICATION,
4/2 as DIVISION, 8%3 as MODULUS
Output:
COMPARISON OPERATOR
It is used to compare two different values or expressions. These operators include ‘=’, ‘!=’, ‘ <’, ‘>’, ‘<=’, ‘>=’ and ‘IN’. Now let’s see a sample for each in our SQL Queries cheat sheet. =’,>’,>
EQUAL TO ( = )
In the below example, we are going to take a list of employees whose salary is 1000.
SQL Query:
SELECT NAME, SALARY
FROM EMPLOYEE WHERE SALARY=1000
Output:
NOT EQUAL TO ( != )
Similarly, we can also get the list of employees whose salaries are not 1000 by using this operator.
SQL Query:
SELECT NAME, SALARY
FROM EMPLOYEE WHERE SALARY!=1000
Output:
GREATER THAN ( > )
It is one of the two comparison operators that you can to get data that is greater than a defined value. In this case, it is above 1001.
SQL Query:
SELECT NAME, SALARY
FROM EMPLOYEE WHERE SALARY>1001
Output:
LESSER THAN ( < )
In the below example, we will be getting a list of the employees whose salary is below 2000.
SQL Query:
SELECT NAME, SALARY
FROM EMPLOYEE WHERE SALARY<2000
Output:
IN OPERATOR
The IN operator allows you to specify multiple values in a WHERE clause for a single condition. It tests whether a specified value matches any value in a list of values. So in this example, we will be getting a list of employees whose names are Robert & Tony.
SQL Query:
SELECT * FROM EMPLOYEE
WHERE NAME IN ('Robert','Tony')
Output:
LOGICAL OPERATORS
AND, OR, NOT, and BETWEEN are the logical operators that are used along Boolean operators to compare two values in a logical statement.
AND
If you want to check two conditions are true, you can use this logical operator as shown in the example. Since there could be multiple people with the same name, we are going to find a person named ‘Robert’ with 2201 as the ID.
SQL Query:
SELECT * FROM EMPLOYEE
WHERE NAME='Robert' AND id =2201
Output:
OR
When it comes to OR, it’ll extract data that satisfy any one of the two mentioned conditions. In our example, we will be searching for a particular person named ‘Robert’ or with 2200 as their ID. Though Robert’s ID isn’t 2200, it will string bring that data as the name is correct.
SQL Query:
SELECT * FROM EMPLOYEE
WHERE NAME='Robert' AND id =2200
Output:
NOT
It can be used to check if the condition is not satisfying. So if you want everyone’s information other than Robert’s, you can use this operator as shown below in our SQL Queries cheat sheet.
SQL Query:
SELECT * FROM EMPLOYEE
WHERE NOT NAME='Robert'
Output:
BETWEEN
You can even assign upper and lower limits to extract data that are in that range. So we will be getting the list of employees who have their IDs between 22202 and 2204.
SQL Query:
SELECT * FROM EMPLOYEE
WHERE ID BETWEEN 2202 AND 2204
Output:
DISTINCT
It is used to return unique records in a result set by eliminating the duplicate values in a specific column or expression list to return only one instance of each distinct row. There are multiple cities mentioned in our Employee table and there are a few duplications as well. So we will be using this SQL query to get only the unique cities from that table.
SQL Query:
SELECT DISTINCT CITY FROM EMPLOYEE
Output:
CASE EXPRESSION
It is an expression that works the same as an If Then Else loop. It will evaluate the condition one by one to find the first out which is the first condition to be true and then returns a corresponding result.
So we will be identifying the cities of the employees and printing an appropriate statement.
SYNTAX
CASE
WHEN condition1 THEN result1
WHEN condition2 THEN result2
ELSE result3
END
SQL Query:
SELECT NAME, CITY,
CASE
WHEN CITY='Willisburg' THEN 'He belongs to Willisburg'
WHEN CITY='Boston' THEN 'He belongs to Boston'
ELSE 'He does not belong to Willisburg and Boston'
end as Belongs_To
FROM employee;
So using the above SQL statement, we are selecting the name, city, and case statement from the Employee table that we have. The function of the case statement is to check whether the employee belongs to Willisburg or Boston and then output a string that reflects that. If an employee doesn’t belong to any of the two cities, it will output a message saying that the person does not belong to either as shown below.
Output:
DROP
The DROP keyword is used to remove the data from a database table or object. It can also be used to delete entire tables, individual rows, or specific columns from a table. So let’s take a look at the queries you’ll need to perform these actions in our SQL Queries cheat sheet.
DROP DATABASE
It is used to delete an entire database.
SYNTAX:
DROP DATABASE database name;
DROP TABLE
It is used to delete a particular table.
SYNTAX:
DROP COLUMN
It is used to delete a particular column from a particular table.
SYNTAX:
ALTER TABLE table name
DROP COLUMN column_name;
Conclusion
We hope you are now clear on what the important SQL queries are and how you can use them after reading our SQL Queries cheat sheet. Being a leading Data Analytics testing company, we are sure that these SQL Queries cover all the important ones that you will definitely need to know. We will also be publishing informative blogs such as this, so make sure you subscribe to our newsletter to never miss out on any of our updates.
by Hannah Rivera | Jan 18, 2023 | Analytics Testing, Blog |
SQL for QA Engineers is a very important skill set to have to become a complete tester. Though you may not need to know SQL to perform front-end testing, it becomes a must-have to perform back-end testing. The reason is that you’ll have to verify the values in the database that were inserted, updated, or deleted to be sure there are no errors after processing. Being a leading Data Testing Services company, we have a team of testers who are experts in SQL. So in this blog, we will be explaining why SQL for QA Engineers is important by mentioning the different types of tests that will require SQL knowledge. We will also be using examples to help you understand the concepts with ease.
The Importance of SQL for QA Engineers
Data Validation
Data validation refers to checking the accuracy and quality of source data before using, importing, or processing the data. As it is one of the building blocks of backend testing, SQL for QA engineers plays a pivotal role. You can use SQL queries for retrieving specific data from the database and comparing it with the expected output to see if it meets the expected requirements.
Example: Let’s consider a scenario where you have to test an eCommerce application that stores customer data in the database. Though the customer data might have been collected properly, you’ll have to verify if the specific person’s data such as their name and address are accurate by retrieving them using SQL.
Data Setup and Cleaning
You might even need to set up test data in the database before beginning a test. So you can insert, edit, or delete data using SQL and clean up the test data once the tests have been completed.
Example: A medical application that stores patient data in a database will be a good example for this scenario. Even before the test begins, you can use SQL to insert test data for a patient, such as their contact information and medical background to see if everything is working as expected.
ETL Testing
ETL testing is one of the main reasons why SQL for QA engineers is so important. It validates the procedure used for extracting data from diverse sources, transforming it into the desired format, and loading it into a target database. The ETL (Extract, Transform & Load) method is used to combine data from several systems and guarantee its accuracy and coherence.
In order to verify that the data is accurate, you can use SQL to retrieve it from the source and destination databases and compare it.
Example: Let’s assume you have to perform ETL testing to see if the data from the sales database is properly modified and inserted into the reporting database. You can use SQL queries to extract data from both the sales & reporting databases and compare them to see if they are accurate.
Data Migration Testing
There will come a scenario where data from one database has to be migrated to another. This is different from ETL testing as the data will not be modified during the process; it will only be migrated.
Once the migration process has been completed, it is critical to ensure that the data is accurate and complete. So you can create SQL queries to retrieve data from both the source and destination databases and compare the results.
You can also utilize your SQL expertise as a tester to confirm the precision and comprehensiveness of the data being moved.
Example: Let’s assume that a company is moving employee data from an outdated HR Database to a new one. To check if all the employee data has been migrated properly, you can use SQL to extract the data for a specific employee from both the old and new databases and compare it.
Data Integration Testing
Whenever data from various systems or applications are merged or integrated, it has to be tested to ensure that the integrated data are accurate and consistent. SQL for QA Engineers will come in handy here as you can use SQL queries to retrieve and compare the data from several databases.
Example: Consider that client information from two databases; one for sales and one for customer support are being merged or integrated. To make sure that the client’s data is consistent across both systems, you can use SQL to retrieve the data for a single customer from both databases and compare it.
Debugging
Identifying the root cause of a bug or an issue is also a part of a good tester’s job. This is yet another reason why SQL for QA engineers is important. You’ll have to use SQL queries to identify the root cause of the issue.
Example: Let’s take the example of a banking application that is not logging all the transactions properly. You can use SQL to query the database to check for faults or data inconsistencies.
Usability Testing
Testing a software application’s usability involves determining how simple it is for users to complete tasks and reach their objectives. Likewise, you can test a database’s usability by running a variety of queries and making sure the desired results are provided.
Example: Let’s imagine that you are testing a database application that enables users to perform keyword searches on data. In such a scenario, SQL can be used to run a search query and make sure the expected results are returned. This might assist you in determining whether the database is simple enough for users to use and navigate.
SQL for QA engineers has more use cases than just functional testing. You can use SQL to even test the performance of the database. Few databases might hold enormous volumes of data and face performance problems or bottlenecks.
Example: For this scenario, let’s take a social media platform as it will have a large user base using the platform from different parts of the world. You can use SQL to assess how well the database handles the load with different traffic conditions.
Security Testing
Whenever unvetted data gets used in SQL statements, there is a chance for a security vulnerability known as a SQL injection attack to happen. This vulnerability enables an attacker to run any SQL command they choose and obtain unauthorized access to a database.
As a tester, you can test a database’s security by attempting SQL injection attacks. SQL queries with suspicious input can be written in an attempt to exploit the vulnerability and obtain unauthorized access to the database.
Example: Let’s take a secure login that requires a username and a password. You can try to launch a SQL Injection attack to bypass the login form and access the database without authorization by submitting a username and password that contain malicious SQL commands.
Conclusion:
With the help of SQL, you can significantly improve your capacity to test and debug database-dependent programs. Being an experienced data analytics testing company, our testers are experts in using SQL in their tests. So we hope our examples and the blog on the whole have helped you understand the importance of SQL for QA Engineers. We will be publishing informative blogs on a weekly basis and recommend you subscribe to our Newsletter to never miss out on any of our content.
by Hannah Rivera | Dec 5, 2022 | Analytics Testing, Blog |
SQL (Structured Query language) can be used to interact with relational database management systems. It is designed for the retrieval, creation, manipulation, updation, storage, and management of data in a relational database. For a better understanding, we can compare SQL to a car key. We just use the key or push the button to start the car without worrying about the background process that happens automatically. We just have to throw simple SQL queries & statements to manage and retrieve the data from the database without having to understand or work with it under the counter of the internal database operations. In addition to that, both proprietary and open-source relational database management systems such as Microsoft SQL Server, Oracle Database, IBM DB2, and MySQL have been built around SQL. That is why it is important to know the 5 basic SQL queries we’ll be exploring in our blog.
SQL syntax is fairly simple and consists of commands as written statements like CREATE, TABLE, and ALTER COLUMN. These commands can be used to modify database tables & index structures, add, update, & delete rows of data, and retrieve subsets of information from the database for transactional processing and analytical operations. Being a leading data analytics testing company, we have vast experience in performing such actions in our testing process. But the basic SQL queries we will be discussing in this blog will be beneficial to everyone in general and not just for testers. But before that, let’s find out what is an SQL query and why you will need it.
What is an SQL query and Why do we need it?
While working with SQL language, you will use numerous keywords and SQL commands to compose statements and queries. Queries are basically requests we can use to communicate our requirements to the database and get the appropriate outputs. You can use SQL queries to pre-aggregate, filter, and select only the data necessary for the project you are working on. The selected data are generally called views. So if you have a table that contains millions of rows of data, creating a view of them will enable you to perform analysis and arrive at results much faster. Let’s now take a look at the 5 basic SQL queries that will help you create a view.
Getting Data from Columns using SELECT & FROM
SELECT * FROM WORKER_TABLE;
Explanation:
SELECT – It is used to specify the column of the table from which the data has to be fetched & displayed. In the example, we have used ‘*’ instead of a specific column name as ‘*’ is used to denote all the columns.
FROM – It is used to specify the database from which the data has to be fetched from. In our query, we have specified ‘worker_table 1’ to be the database we are gathering data from.
Output:
To fetch non-repeated records by using DISTINCT
SELECT DISTINCT DEPT AS DEPARTMENT FROM WORKER_TABLE
Explanation:
There is a high chance that the database we selected has a lot of duplicate values. So DISTINCT is the next query we’re going to see from our list of basic SQL queries as it can be used to retrieve only the unique data and avoid duplication.
By using DISTINCT in the above example, we will be collecting unique data from the dept column shown in the worker_table 1 and show it under the alias name ‘Department’. An Alias is generally created to give a temporary name to the unique data by using AS as the keyword.
Output:
Filtering Data based on requirements by using WHERE Clause
Explanation:
We have now used 2 of our basic SQL queries to extract data that has no duplicates. But in most scenarios, you’ll need more refinement than that. So you can use the WHERE clause to filter the data based on your requirements.
Let’s say you want to get the data of the employees who have joined the company after a specific date. So you can set a requirement that the employee_id should be greater than 105.
Output:
Query using Not in operator
SELECT * FROM WORKER_TABLE
WHERE FIRST_NAME NOT IN ('Zoe', 'Sadie')
Explanation:
You can further drill down your results by adding exceptions using the NOT IN command. NOT IN is an important command that can be used along with the WHERE clause. It can be used to retrieve data from rows that don’t satisfy the specified condition.
In this case, we are retrieving a list of the employees whose first names are not Amitabh and Aruna. Here, Amitabh and Aruna are the exceptions that we wanted to avoid.
Output:
Sort the Result set by using the ORDER BY Keyword
SELECT * FROM WORKER_TABLE
ORDER BY DEPT DESC;
Explanation:
By using all the basic SQL queries, you would now be able to have a refined set of data as per your need. The last query we are going to discuss, ORDER BY can be used to sort the result we have according to the specified column.
You have to specify the column by which the data has to be sorted by placing it after the ORDER BY keyword. By default, the column will be alphabetically sorted (i.e) asc in ascending order. But if you wish to fetch the rows in descending order, you can do so by adding ‘desc’ after the column name as mentioned above.
Output:
Conclusion
Hopefully, you now have a clear understanding of what makes these 5 basic SQL queries so important. They lay the foundation that you can build upon to grow your expertise in SQL. We have been providing data testing services for many years and our testers have found SQL to be very important. We will be publishing more informative content, so make sure you subscribe to our newsletter to never miss out on any of our latest content.
by Charlotte Johnson | Mar 28, 2022 | Analytics Testing, Blog |
Manufacturing is, for the most part, the world economy’s engine. No matter what economy is in question, it’s one of the most valuable industries with many benefits. Two of the largest manufacturing hubs in the world, the USA and China, have a GDP whose contributions were at $2.3 trillion and $4 trillion respectively.
Manufacturing Is Key
Accessing today’s beloved services and products would be impossible without manufacturing. Phones we use for communication, cars and other vehicles that are driven…there’s so much going on in the world that the manufacturing industry delivers on.
The approach people have taken to manufacturing over the years has shifted since the industrial revolution. Streamlining the process is something people are consistently trying to find new ways to achieve. Manufacturing processes of big companies are outsourced to offshore plants that are more efficient. Production processes are consistently being automated so that quality improves and the chances of human errors lower considerably.
There are many companies that invest in the latest developments in manufacturing technology in order to increase their productivity. As a result, modern manufacturers can now produce thousands of products with minimal human involvement.
Data Analytics In Manufacturing: The Advantages
All told, manufacturing takes a huge chunk of means to improve through data analytics. There are so many benefits that go a long way over time. Some of the most important ones include, but are not limited to:
Automated Manufacturing Gets Empowered By Analytics Algorithms
Manufacturers can cut down on errors and miscommunications with data analytics that are integrated throughout their business. This, in turn, leads human interaction to be minimized in terms of quality control of finished products and raw material delivery.
Data Analytics Helps Product Design Improve
Product design to the market used to involve plenty of trial and error. If anything, the initial iteration of a product design did not really get a warm welcome from customers. This is usually from the design and ergonomics being far from ideal.
Thanks to the advancement of technology in today’s modern times, data analytics allow manufacturers to test designs for ergonomics and efficiency without a physical prototype being made.
Traditionally, automotive manufacturers have used hardware-intensive methods to create car design mock-ups in order to test them in a wind tunnel. However, the advent of data analysis and artificial intelligence has allowed manufacturers to use software to test a design before making physical copies of it.
Product Management Gets More Efficient
Being able to project how many products to manufacture is critical for any manufacturer. Overproducing may cost millions of dollars, but the alternative—underestimating market demand—could tarnish a company’s reputation if it cannot meet customer demands in a timely fashion.
Making the most of profit is best done by manufacturers through accurate demand estimation.
Conclusion
Since the industrial revolution, people’s views on manufacturing as a whole have taken a turn. Today’s data analytics plays a major role in the manufacturing industry. Benefits include more efficiency in product management and improvements to product design without unnecessary wasting of any resources.
Trying to find manual testing companies to work with on data analytics? Contact Codoid today! We’re an industry leader in Quality Assurance that’s passionate about helping to lead and guide the whole QA community.
by admin | Aug 2, 2021 | Analytics Testing, Blog, Latest Post |
Currently, we are extracting the data from different source systems, transforming it, and then finally loading it into the data warehouses. But when it comes to DBT, the raw source data from different source systems are extracted and directly loaded into the warehouse. After which, DBT allows us to create new materialized tables into which the transformed data is fed as it helps showcase the meaningful data for business analysis purposes by using simple select statements. But what makes DBT stand tall is it allows both the development and production environment teams to work in the same place by creating models & testing the scripts for transformation purposes. Being one of the leading QA companies, we will be mainly focusing on the testing part of the above process in this DBT Tutorial by exploring how we test our models.
What is DBT?
If you’re new to the DBT tool, worry not, we have got some basics in this DBT Tutorial for you as well. The Data Build Tool (DBT) is an open-source test automation tool and a command-line tool. It mainly focuses on the transformation part in the “Extract load and transform” pipeline. DBT allows both data analysts and data engineers to build the models and transform the data in their warehouses to make it meaningful.
New to DBT tool?
Let’s walk you through some of the basics to help you get a quick overview of DBT and the necessary steps that have to be done to create an account. Basically, there are two ways for DBT to be used:
1. DBT CLI(Command Line Interface):
The DBT CLI variant allows users to write their scripts and test them locally by using the DBT command-line tool.
2. DBT Cloud:
The DBT Cloud is a facilitated variant that smooths out an advancement with an online Integrated Development Environment, an interface to write our DBT test scripts and run them on a schedule.
In this DBT Tutorial, we will be covering the DBT cloud variant. It is worth noting that the ideas and practices can be stretched out to the CLI variant as well. So let’s get started.
Data warehouse:
As stated earlier, the DBT is used to handle the transformation part of the ELT for data warehousing. “But how does it work?” you might ask. Well, DBT helps us create a connection with the warehouse and then lets us write the SQL simple select statements against the warehouse to transform the data.
Supported Warehouses:
We have listed the names of the supported warehouses in DBT, and they are
• Postgres
• Redshift
• BigQuery
• Snowflake
• Apache Spark
• Databricks and
• Presto (Partially Supported).
We would be using the Snowflake warehouse in this blog for demonstration purposes.
DBT Tutorial for the Setup:
Now that we have seen an overview of DBT, let’s find out how to set up the DBT cloud and use its interface. If you’ve not yet created an account in DBT, you can sign up for DBT cloud by visiting their Sign up page.
Once you are done with your DBT cloud sign-up process, let’s find out how to set up your very first DBT project.
DBT recommends using GitHub to store our deployments. So the first step would be to create an empty repository in GitHub as shown below.
Enter your repository name and click on ‘Create repository’.
Once you have signed up with your DBT cloud, just select the snowflake warehouse and start configuring it using your snowflake credentials. We have shown a sample connection in the below images for your reference.
After entering your warehouse and database details, make sure you have entered the snowflake user name and password in development credentials. Next, click on ‘Continue’ to get this confirmation message “Connection Test Success”. Now, you have to click on the ‘Continue’ button for configuring the GitHub. Since you’re done with GitHub configuration, we’re good to start with DBT.
DBT Tutorial for its Interface:
DBT has a lot of inbuilt options that help access and understand it easily. After setting up with DBT, the interface of DBT that we get to see initially will look like how it is shown in the image below.
As you can see, DBT might initially look a bit empty and even show that the compilation error at the bottom right. So we would have to click on ‘Initialize your Project’ to start our first project. Doing this will make the DBT will provide us with a skeleton as shown in the below image for our project automatically.
We can clearly see that the compilation message that was displayed at the right bottom has now changed to become ‘ready’, implying that it is time to start our project. So let’s get started.
First off, let’s find out the scope of some of the functions,
1. The commit button will handle all our git configurations like creating a new branch, pull request, commit, merge, etc.
2. After writing any SQL statements we can check the query then and there by using the dbt “preview data” and “compile SQL” buttons.
3. We can run and test our models by writing simple DBT commands (e.g. dbt debug, dbt run, dbt test).
4. DBT also generates documentation that will come in very handy for analytical purposes as it will contain all the details about the project. You can access it by clicking on the “view docs” option as shown in the above pic.
Note: Before running any model or a test, make sure that you have saved it.
Loading training data into the Warehouse:
So as stated earlier, DBT cloud currently supports only BigQuery, Postgres, Redshift & Snowflake warehouses, and we have decided to use Snowflake for the demo. So to illustrate how to import or load your training data into the warehouse, we will be loading the two tables that we have created.
Snowflake:
It is important to note that we have written the below instructions assuming that you have the database named compute_wh in your Snowflake account. So make sure that you have the required privileges to create the objects in this database.
Run the following commands in your Snowflake Warehouse SQL Runner:
To create the Database:
Create database STUDENT;
Create Schema STUDENT_INFO:
Use schema STUDENT_INFO;
To create the Tables:
Table 1: Student_Profile
Table 1 will contain the personal information of the students.
Create table student_profile (
S_no int not null,
Std_id int not null,
Std_name varchar(45) null,
Std_dep varchar(45) null,
DOB datetime null,
Primary key (Std_id));
insert into student_profile (S_no, Std_id, Std_name, Std_dep, DOB) values ('1', '294', 'Afshad', 'CSE', '03/10/1998');
insert into student_profile (S_no, Std_id, Std_name, Std_dep, DOB) values ('2', '232', 'Sreekanth', 'Mech', '02/09/1997');
insert into student_profile (S_no, Std_id, Std_name, Std_dep, DOB) values ('3', '276', 'John', 'EEE', '10/06/1998');
insert into student_profile (S_no, Std_id, Std_name, Std_dep, DOB) values ('4', '303', 'Rahul', 'ECE', '12/05/1997');
insert into student_profile (S_no, Std_id, Std_name, Std_dep, DOB) values ('5', '309', 'Sam', 'Civil', '09/04/1999');
insert into student_profile (S_no, Std_id, Std_name, Std_dep, DOB) values ('6', '345', 'Ram', 'CA', '03/11/1998');
insert into student_profile (S_no, Std_id, Std_name, Std_dep, DOB) values ('7', '625', 'Priya', 'CSE', '03/12/1996');
insert into student_profile (S_no, Std_id, Std_name, Std_dep, DOB) values ('8', '739', 'Bob', 'MEC', '06/07/1998');
insert into student_profile (S_no, Std_id, Std_name, Std_dep, DOB) values ('9', '344', 'Ganesh', 'Mech', '07/09/2024');
insert into student_profile (S_no, Std_id, Std_name, Std_dep, DOB) values ('10', '123', 'Neha', 'ECE', '12/09/1998');
Table 2: STUDENT_RESULTS
Table 2 will contain the results of those students.
Create table student_results (
S_no int not null,
Std_id int not null,
Marks int null,
Result varchar(45) null,
Primary key (Std_id));
insert into student_results (S_no, Std_id, Marks, Result) values ('1', '294', '78', 'Pass');
insert into student_results (S_no, Std_id, Marks, Result) values ('2', '232', '56', 'Pass');
insert into student_results (S_no, Std_id, Marks, Result) values ('3', '276', '88', 'Pass');
insert into student_results (S_no, Std_id, Marks, Result) values ('4', '303', '67', 'Pass');
insert into student_results (S_no, Std_id, Marks, Result) values ('5', '309', '38', 'Fail');
insert into student_results (S_no, Std_id, Marks, Result) values ('6', '345', '90', 'Pass');
insert into student_results (S_no, Std_id, Marks, Result) values ('7', '625', '87', 'Pass');
insert into student_results (S_no, Std_id, Marks, Result) values ('8', '739', '45', 'Fail');
insert into student_results (S_no, Std_id, Marks, Result) values ('9', '344', '97', 'Pass');
insert into student_results (S_no, Std_id, Marks, Result) values ('10', '123', '49', 'Fail');
Now that you have the training data in your warehouse, you are one step away from starting your DBT testing.
DBT Tutorial for the available Models:
In analytics, the process of modeling is changing the data from being the raw data to the final transformed data. Typically the data engineers are responsible for building tables that represents your source data, and on top of that, they also build the tables/views that transform the data step by step.
The models are just SQL select statements in your dbt project. These models are created inside the Models directory with the .sql extension.
What’s great about dbt is you don’t need to know DDL/DML to build the tables or views.
Note: The name of the file is used as the model or table name in your warehouse.
Creating Models:
Inside the model’s directory, we have created two simple models for better understanding.
Model 1:Student_profile.sql
{{ config(materialized='table') }}
with source_data as (
select
S_No,
Std_Id,
Std_Name,
Std_Dep,
DOB
from
STUDENT.STUDENT_INFO.STUDENT_PROFILE
)
select *
from source_data
Model 2: Student_Results.sql
{{ config(materialized='table') }}
with source_data as (
select
S_No,
Std_Id,
Marks,
Result
from
STUDENT.STUDENT_INFO.STUDENT_RESULTS
)
select *
from source_data
Once you’ve built your models, make sure to save and run the models to load them into the warehouse. You can use the ‘dbt run’ command to run your created models, or if you want to run any particular model, you can use the ‘dbt run –Student_Profile’ command.
DBT will show detailed information about the models when you run your models as shown in the above image. So we have successfully completed all the required steps and can move forward to find out how to perform testing in DBT.
DBT Tutorial to perform Testing:
We all know that testing plays a major role in deployment, so tests in analytics are just assertions that you have about your data. But what makes these assertions important is that if these assertions are met, it’ll be instrumental in helping others trust you by looking at the data.
The great part of DBT testing is that by default it provides four data quality checks. They are Unique, Not Null, Relationships (Referential Integrity), and Accepted Value checks.
Basically, DBT provides us with two types of testing. So let’s take a look at the steps to be done to perform the first type, the Schema test.
1. Schema Test:
Create a YAML file in the same model’s directory with a .yml Extension (Ex. Schema.yml).
Configure your tests (Unique, Not Null, Relationships, and Accepted Values) in the YAML file based on your table data and run it by using the command “dbt test”.
Note: Yaml is a data serialization and human-readable language that can be easily understood even by non-technical people. It is mainly used for configuration purposes as it supports all programming languages.
We have attached one YAML file for your reference below.
Schema.yml:
version: 2
models:
- name: student_profile
description: "Table contains all the information about the students"
columns:
- name: std_id
description: "The primary key for this table"
tests:
- unique
- not_null
- name: student_results
description: "Table contains the student results info"
columns:
- name: std_id
description: "The primary key for this table"
tests:
- unique
- not_null
- relationships:
to: ref('student_profile')
field: std_id
- name: result
description: "Result of a student"
tests:
- accepted_values:
values:
- pass
- fail
Once you run this test, you will get detailed information on all your tests. If you need to run only one particular schema test, you can use this command “dbt test –m Student_results”.
2. Data Test:
But if you are looking to write custom test scripts based on the requirements against the data, that also can be done by using tests of this type.
Start by creating the SQL files with your test name inside the test directory of the DBT project as shown below.
Now, add your test script to your test file.
Duplicate.sql:
select std_id, count(std_id)
from {{ref('student_profile')}}
group by std_id
having count(std_id)>2
referential_integrity.sql:
select std_id
from {{ref('student_profile')}}
where std_id not in (
select std_id
from {{ref('student_results')}} )
Note:
Here ref function is used to refer one or more specific files into the script.
Finally, you can run the test by using the “dbt test –data” command.
DBT Tutorial for Documentation:
As discussed earlier, documentation is one of the greatest features of DBT. So once your project is completed, all you have to do is use the “dbt docs –generate” command to generate the document.
After running the command, just click on “view docs” at the top left of the DBT interface to view the complete information about your project in the form of a document. It even generates a lineage graph like the one shown below for a crystal clear understanding.
DBT also supports filters in the lineage graph, which is a great advantage when it comes to analysis.
Conclusion:
We hope that you have enjoyed reading this DBT Tutorial blog and also hope that DBT would be a great addition to your toolkit. To sum things up, DBT helps carry out the most complex and heavy transformations in a simple and easy manner by using simple select statements. It enables both data engineers and data analysts to work in the same environment and provides a unique experience for them to transform the data. The models in DBT enable faster execution and also make it so easy to test and modify. Finally, it even shows detailed information about the project by generating the document along with a lineage graph for clear analysis purposes. All thanks to such awesome features, DBT has been a resourceful tool that has helped us deliver the best software testing services to our clients.
Frequently Asked Questions
- What is dbt tool used for?
DBT (Data Build Tool) is used to perform the Transform part of the ETL (Extract Transform Load) process in a simplified manner by writing transformations as queries and orchestrating them effectively.
- What is DBT testing?
DBT testing is the process of ensuring the data you have is reliable by using assertions you have about the data as tests. There are 2 types of dbt testing; namely Schema Test and Data Test.
- What are the four generic tests that dbt ships with?
The 4 generic tests that dbt ships with are Unique, Not Null, Relationships (Referential Integrity), and Accepted Value checks.
- What does dbt stand for data?
DBT stands for Data Build Tool and as the name suggests, dbt can be used to transform data in warehouses in an effective manner.
- Can I use dbt for free?
Yes, you can use dbt Core for free as it is an open-source tool released under an Apache 2.0 License. But dbt Cloud is not a completely free tool as few features are restricted in the free version.