Tasker Database

The Tasker system

Over the next few assignments we are going to build a software system loosely based on TaskRabbit. TaskRabbit offers a web site and a mobile app where customers can post short jobs and get connected with people who can do those jobs for them. Typical tasks that customers post on TaskRabbit include help with moving to a new apartment, help assembling Ikea furniture, or doing simple home repairs.

The Tasker system we are going to build will manage the key aspects of posting jobs, selecting someone to do the job, and providing feedback when the job is done.

In these notes I will be working through the initial steps in setting up a database for our system. You can click the button above to download a set of dump files for the database I created. You can use the data import feature in the MySQL workbench to load that database.

Setting up the database

To start with, we will need to make a completely new database in the MySQL workbench. to do this, we right-click in the Schema area and select the command to make a new Schema. We will name the new schema 'tasker'. If you are importing my files you will need to do this step before running the import command.

Setting up tables in the MySQL workbench

To start setting up our database for the system we will begin by setting up the very first table, which stores basic information on users for our system, both workers and customers.

To make a new table, we right-click on the Tables area under our database and select the Create Table command. The screen that pops up will prompt us to enter a name for our table and to set up all of the columns the table needs.

Here is what that first table will look like:

The first thing to notice here is that every column in the table has an associated data type. The two datatypes in use here are int and varchar(n). The varchar(n) data type is a string type; specifically, it is a variable length string data type with a maximum length of n.

The most important column in this table is the first one, iduser. This column is the primary key for the table. The purpose of the iduser column is to serve as a unique identifier for rows in the table. One fact that reflects the importance of this column is the fact that it is not allowed to be empty, or NULL. To help ensure that this column is never empty I have also applied the auto increment feature so that MySQL will provide a value for this column even if none is given at insert time.

The second table stores additional information that is relevant to workers. Workers will need to provide ID and Social Security information. All of that will be stored in a separate worker_extra table:

Next, we need to think about jobs. Customers will be posting jobs they want done, and once those jobs are posted workers will want to bid to do those jobs.

Here is job table structure:

The job_start and job_end fields allow us to set up a range of times. The idea here is that when a customer posts a job they can specify a range times that they will be at home and available to welcome a worker.

When workers bid to do jobs those bids will go into a bid table:

When a worker places a bid for a job they will also indicate the span of time in which they are available to do the job. Customers can consider this when selecting a winning bidder for the job.

Once the customer selects a worker for a job, we will want to record information about the job, such as how many hours the worker worked on the job and the rating the customer gives the worker after the job is done. This information is stored in a rating table:

Advanced SQL features

The table structure we have set up follows the usual conventions for tables in a relational database. At this stage we can anticipate a few specific challenges that will arise when we start working with the database.

The first feature we need to implement is some sort of convenient display of information about the workers who have bid for a job. This information comes from both the bid table and the rating table. Let's consider the rating table first. Suppose we wanted to display information about workers and their ratings:

select worker, rating from tasker.rating where rating is not null;

This would give us all of the jobs that have ever been completed. Buried in that mass of results is rating information for different workers, which in principle would help us to see what worker we would like to select.

To make this information more useful, the next thing we can do is to aggregate it to give summary information about each worker. Here is some SQL code to do that:

select worker,  count(rating) as jobs, avg(rating) as rating from rating group by worker;

This statement uses the SQL group by construct, which groups the rows by worker. All of the rows for a given worker will go into a group, and we can then ask for summary information for that worker. To get the summary information we will use some SQL aggregation functions, such as count() and avg() on the rating field. To make the results more user friendly we also use the as construct to change the name of each of the resulting columns.

This is such a useful query that we are going to use this query as the basis for a view in the database. A view is a synthetic table whose entries are synthesized from data found in tables. The procecure for creating a view is simple. All you have to do is to construct the select statement that does what you want and then make a view from it. To make a new view in the workbench all we have to do is to right-click on the Views section of the database and select the Create View.. command. In the screen that appears we enter the details of the new view: the name for the view and the select statement that defines the view:

CREATE VIEW `rating_summary` AS
select worker,  
       count(rating) as jobs,
       avg(rating) as rating 
from rating group by worker;

Now that we have a way to display summary information about jobs that workers have done, we may want to supplement that information with information pulled from another table.

The standard way to handle that challenge in SQL is to use a join. A join merges two tables or views together into a single table that we can query. Here is an example that joins the user table with the rating_summary table:

select worker, real_name, email, jobs, rating from tasker.user join tasker.rating_summary on tasker.user.iduser=tasker.rating_summary.worker;

In every select statement we have a from clause which specifies what table to query. We can use a join in that from clause to join two tables together and then query that. Since a join will be connecting rows from one table with rows from another table, we need to specify which rows to join together. That is the purpose of the on clause: that part specifies how to select the rows to join together. In this case we are joining rows from the rating_summary view with rows in the user table where the worker id numbers match the user's iduser values. This useful select statement can also be turned in to a worker_summary view.

Finally, we can join that view with the bid table to display useful information a customer may want to consider when looking at bids:

select idbid, job, start_time, end_time, worker,
 real_name as name, jobs, rating
from bid join worker_summary 
on bid.worker = worker_summary.worker;