Problem Set 3
- Preliminaries
- Part I
- Part II
- Overview
- Getting started
- Code-reading and design questions
- General notes
- Problem 5: Marshalling column values
- Testing your marshall() method
- Problem 6: Completing INSERT commands
- Problem 7: Table iterators and unmarshalling
- Problem 8: SELECT * for a single table
- Sample interaction
- Submitting your work for Part II
Preliminaries
In your work on this assignment, make sure to abide by the collaboration policies of the course.
If you have questions, please come to office hours, post them on Piazza, or email cs460-staff@cs.bu.edu
.
Make sure to submit your work on Gradescope, following the procedures found at the end of Part I and Part II.
Part I
45 points total
Creating the necessary folder
Create a subfolder called ps3
within your cs460
folder, and put all of the files for this assignment in that folder.
Creating the necessary file
This part of the assignment will all be completed in a single PDF file. To create it, you should do the following:
Access the template that we have created by clicking on this link and signing into your Google account as needed.
When asked, click on the Make a copy button, which will save a copy of the template file to your Google Drive.
Select File->Rename, and change the name of the file to
ps3_partI
.Add your work for the problems from Part I to this file.
Once you have completed all of these problems, choose File->Download->PDF document, and save the PDF file in your
ps3
folder. The resulting PDF file (ps3_partI.pdf
) is the one that you will submit. See the submission guidelines at the end of Part I.
Problem 1: Properties of schedules
18 points total
Below are two schedules in which actions by three transactions (T1, T2, and T3) are interleaved.
schedule 1:
w1(A); r3(A); w2(C); r2(B); c2; r1(C); c1; w3(B); c3
schedule 2:
r1(C); w2(B); w1(A); w2(C); c2; r3(B); w3(B); c3; r1(B); c1
For each of the above schedules, you should take the following steps:
In your copy of the
ps3_partI
template (see above), edit the diagram that we have provided for the schedule, making whatever changes are needed to construct the schedule’s precedence graph.State whether the schedule is conflict serializable. Explain briefly why or why not. In addition, if the schedule is conflict serializable, state one possible equivalent serial schedule.
State whether the schedule is recoverable. Explain briefly why or why not.
State whether the schedule is cascadeless. Explain briefly why or why not.
Problem 2: Two-phase locking
8 points total
Consider these two transactions:
T1: r(A); r(B); r(C); w(C); commit T2: r(B); r(A); r(C); w(A); commit
The following is the beginning of one possible schedule of these transactions, with appropriate lock instructions added:
T1
T2
sl(A); r(A)
sl(B); r(B)
...
sl(B); r(B)
sl(A); r(A)
...Under regular two-phase locking (not strict or rigorous), could T2’s next action be to unlock B? Explain briefly why or why not.
Regardless of your answer to part 1, imagine that T1 actually goes next and completes two more actions:
T1
T2
sl(A); r(A)
sl(B); r(B)
sl(C); r(C)
...
sl(B); r(B)
sl(A); r(A)
...Fill in the table that we have provided in
ps3_partI
to show one way in which this partial schedule could be successfully completed under strict two-phase locking. There is more than one possible answer to this question.Important guidelines:
The actions that you add should go below the dotted lines in the table. Make sure to position them so that there is no confusion about the order in which they occur.
Make sure to include appropriate lock, unlock and commit actions. You should assume that the DBMS is not using update locks, and that it allows shared locks to be upgraded to exclusive locks.
To show that you understand the difference between strict and rigorous locking, at least one of the unlock actions should come before the commit action of the corresponding transaction.
Problems 3 and 4: Coming soon
19 points total
Submitting your work for Part I
Coming soon!
Part II
55 points total
All of Part II is pair-optional, which means that you may complete it with a partner. See the rules for working with a partner on pair-optional problems for details about how this type of collaboration must be structured.
Overview
In this assignment, you will implement portions of a simple relational database management system that supports a subset of the SQL language. We have provided you with two of the three components of the system:
- a SQL parser
- Berkeley DB, an embedded database system that will serve as the storage engine. More specifically, we will use Berkeley DB Java Edition.
Your job is to implement parts of the “middle layer” of the system, which takes the parsed version of a SQL command and performs the necessary lower-level actions to execute the command. To help you, we have given you a code framework for the middle layer that already provides some of the necessary functionality.
Getting started
You should begin by downloading the necessary files and configuring your work environment. The steps for doing so can be found here.
Please do this ASAP, so that you can be sure that you don’t run into any problems later on.
After configuring everything, you should spend some time familiarizing yourself with the files that we have given you in the dbms
folder, and with Berkeley DB. In particular, you should review/read the following resources:
- the overview of the code framework
- the API documentation of the code framework
- the lecture notes on implementing a logical-to-physical mapping (pages 174-186 in the coursepack)
The following additional resources may also be helpful:
Code-reading and design questions
highly recommended
Before you begin coding, we strongly encourage you to answer the questions found here:
code-reading and design questions
General notes
In the code that you write, you must limit yourself to the packages that we’ve imported at the top of the starter files. You must not use classes from any other Java package. In addition, you must not use any Java features that were not present in Java 8.
As discussed on the separate configuration page, you will need to compile and run the code from the command line in the Terminal window of VS Code.
to compile:
javac -cp 'lib/*' -d classes *.java
(see below for the expected warning messages)
to run on Windows:
java -cp 'lib/*;classes' DBMS
to run on macOS:
java -cp 'lib/*:classes' DBMS
Note: The two commands for running the program are almost identical, but in the Windows version there is a semi-colon (
;
) before the wordclasses
, whereas the macOS version uses a colon (:
).You will see one or more warnings when compiling your code (e.g., “Note: Parser.java uses unchecked or unsafe operations.”). These warnings are to be expected and should be ignored. Messages labeled as errors (not warnings) will keep your code from compiling and will need to be addressed. You shouldn’t see any errors when you compile the starter code that we’ve given you. If you do, let us know.
After making changes to the code, you will need to recompile it before you can try to re-run it. When you are at the command line of the Terminal, using the up arrow will allow you to access and reenter previously entered commands without needing to re-type them!
The code that we’ve given you can be run before you make any changes. It will begin by printing the following prompt:
Enter command (q to quit):
If you enter a valid SQL command, the program will parse the command and display a summary of some of the command’s components (see the notes on the DEBUG constant below for how to disable this summary). Entering a lower-case
q
will allow you to quit the program.When you run the program for the first time, it will create a directory called
db
within your code directory. This is the home directory for the Berkeley DB environment, and it will be used to store the files that BDB creates for your database. If your program crashes for any reason, these files may be corrupted. As a result, we recommend that you remove all files from this directory after a crash.There is a constant named
DEBUG
that is defined inDBMS.java
. When it is set totrue
(as it is in the files that we have given you), the values of many of the tokens generated by the parser are printed after each SQL command is entered by the user. You may find this information helpful as you implement the various types of commands. You may also wish to add additional debugging code that is only executed when this constant is set totrue
. To eliminate the debugging messages, setDEBUG
tofalse
.
Problem 5: Marshalling column values
20 points
Important
Before you begin coding, make sure that you have completed the tasks listed under the Getting Started section above, and that you have answered the code-reading and design questions mentioned above.
In order to insert rows into a table, your DBMS needs to be able to marshall a collection of column values into a single Berkeley DB key/value pair. In this problem, you will add support for marshalling by implementing the key method of the InsertRow
class.
As you saw when completing the code-reading questions, an InsertRow
object is used by the execute()
method for INSERT
commands (the one in the InsertStatement
class). That execute()
method creates an InsertRow
object to represent the row to be inserted, and it calls that object’s marshall()
method to prepare the marshalled key/value pair for the row.
We have already implemented some of the other methods of this class for you:
- an
InsertRow
constructor that initializes the state of object. It takes two parameters: an already openedTable
object for the table to which the row will be added, and an array of typeObject
containing the values in the row to be inserted. We assume that the values are in the appropriate order – i.e., that element 0 of the array contains a value for the first column in the table, element 1 contains a value for for the second column in the table, etc. We also assume that the values are valid and that they have been adjusted as needed to correspond to the types of the columns. - a
getKeyBuffer()
method that returns aRowOutput
object for the key portion of the marshalled key/value pair. - a
getValueBuffer()
method that returns aRowOutput
object for the value portion of the marshalled key/value pair. - a
toString()
method that returns aString
representation that includes:- the current contents of the
offsets
field, which we recommend that you use when determining the offset values that will appear at the start of the marshalled value - the current contents of the key buffer (i.e., the array of bytes that is inside the
RowOutput
for the key) - the current contents of the value buffer (i.e., the array of bytes that is inside the
RowOutput
for the value). ThistoString()
method should help you when debugging your marshalling code.
- the current contents of the
You will implement the marshall()
method, which should take the column values of the InsertRow
object and marshall them into byte arrays for the key/value pair that will eventually be inserted into the B+tree for the table.
Important: marshall()
should not interact with Berkeley DB at all. In particular, it should not create any DatabaseEntry
objects or attempt to add them to the BDB database.
Rather, marshall()
should only do the following:
Determine the correct offset values and store them in the array to which the
offsets
field in theInsertRow
object refers.Write the appropriate values into the buffers represented by the
keyBuffer
andvalueBuffer
fields, each of which refers to aRowOutput
object.
See below for more detail about each of these tasks.
Notes:
Each key/value pair should have the format that we discussed in the lecture notes on the logical-to-physical mapping. The key portion of the key/value pair should be based on the value of the primary-key column. The value portion should consist of a header of offsets followed by the values of the non-primary-key, non-null columns.
The key portion of the key/value pair will be stored in the
RowOutput
object assigned to thekeyBuffer
field of theInsertRow
object. The value portion will be stored in theRowOutput
object assigned to thevalueBuffer
field.Because
RowOutput
objects fill their associated byte arrays from left to right, you will need to determine all of the offsets that belong in the header before you begin marshalling the column values themselves. Store these offsets in the array to which theInsertRow
object’soffsets
field refers.Once all of the offsets have been computed and stored in the
offsets
array, you can begin the process of writing into theRowOutput
objects using the appropriate methods.The
InsertRow
constructor takes a reference to the correspondingTable
object as a parameter, and it stores that reference in a field calledtable
. Your code can obtain any column information that it needs from theTable
object and its associatedColumn
objects.The
getLength()
method in aColumn
object gives the actual length in bytes of all columns exceptVARCHAR
s. In the case ofVARCHAR
s, you should determine the length by invoking theString.length()
method on the actual value.Because the column values are stored in an array of type
Object
, you will need to use type casts in order to treat them as objects of their actual types. For example, to treatvalues[i]
as aString
, you would need to do something like(String)values[i]
. Consult theColumn
class for the method you should use to determine the type of a given column.Integer values are stored in the
values
array as objects of Java’sInteger
class, and real values are stored as objects of Java’sDouble
class. When marshalling these values, you will need to convert them to primitive values of typeint
anddouble
, and you should use theInteger.intValue()
andDouble.doubleValue()
methods to do so. For example, if you have anInteger
object namedval
, you can convert it to anint
by making the method callval.intValue()
.The
RowOutput
methods that you will use for writing the offsets and column values are inherited from theDataOutputStream
class, so you should make sure to review the API of that class.When marshalling a
String
value, you should use thewriteBytes()
method, not thewriteUTF()
method.You should assume that all offset values are small enough to be represented by a two-byte integer, and thus you should use the
writeShort()
method for them.To keep the
marshall()
method from getting too large, you may want to add one or more private helper methods that can be called to do part of the overall task.Important: If you write a helper method that uses one or more of the
RowOutput
methods, you must include athrows
clause in the header of the method like the one we’ve given you for themarshall
method:public void marshall()
throws IOException
{
Review the
Table
,Column
,RowOutput
, andDataOutputStream
classes as needed.
Testing your marshall()
method
You should test your marshall()
method thoroughly before proceding to the next problem.
We’ve given you the following tools for doing so:
The
RowOutput
class includes atoString()
method that shows the current contents of the underlying byte array.The
InsertRow
class includes its owntoString()
method that shows the current values in theInsertRow
object’soffsets
array and the contents of the byte arrays underlying theRowOutput
objects assigned to itskeyBuffer
andvalueBuffer
fields.The starter code that we’ve given you in the
execute()
method ofInsertStatement
will create the necessaryInsertRow
object, call yourmarshall()
method, and – if theDEBUG
constant in theDBMS
class istrue
– print theInsertRow
object so that you can examine the values of its fields. (Note that the row won’t actually be inserted until you complete theexecute()
method as part of Problem 4, but the existing code is sufficient for testing themarshall()
method.)
Given these tools, you can:
Compile and run the
DBMS
program as described above.Create a table using a
CREATE TABLE
command. The starter code already includes everything needed to carry out this type of command.Enter one or more
INSERT
commands for the newly created table, and see if the output from printing theInsertRow
object looks correct.
For example, let’s say that you enter these two SQL commands:
CREATE TABLE Movie(id CHAR(7) PRIMARY KEY, name VARCHAR(64), runtime INT); INSERT INTO Movie VALUES ('2294629', 'Frozen', 102);
If your marshall()
command is working correctly, you should see the following as part of the output of the debugging print
statement:
for the
offsets
field:[-2, 8, 14, 18]
Because there are three columns, there are four offsets. The
-2
indicates that the first column (id
) is the primary key. The next two offsets (8
and14
) are the offets of thename
andruntime
column values, and the18
is the offset of the end of the record.for the key buffer (i.e., the
keyBuffer
field):[50, 50, 57, 52, 54, 50, 57]
The numbers in this byte array represent the ASCII codes for the characters in the
id
value'2294629'
:50
for the character'2'
,57
for the character'9'
, etc.for the value buffer (i.e., the
valueBuffer
field):[-1, -2, 0, 8, 0, 14, 0, 18, 70, 114, 111, 122, 101, 110, 0, 0, 0, 102]
This byte array begins with 8 bytes for the offset table:
The first two bytes (
[-1, -2]
) represent the special-2
offset for the primary-key column. When-2
is represented using a two-byte integer, the individual bytes end up being the 8-bit representations of-1
and-2
.In general, when you use multiple bytes to store a negative number whose absolute value is relatively small, the rightmost byte will show the negative number itself, and all of the remaining bytes will show
-1
. For example, if we stored-3
using two bytes, we would see[-1, -3]
as its two bytes. If we stored-10
using four bytes, we would see[-1, -1, -1, -10]
.The next two bytes (
[0, 8]
) represent the offset of thename
column, which has an offset of 8 bytes because it comes immediately after the offset table, which has a length of 4*2 = 8 bytes.The next two bytes (
[0, 14]
) represent the offset of theruntime
column, which has an offset of 8 + 6 = 14 bytes in this particular row.The next two bytes (
[0, 18]
) represent the offset of the end of the record, which is 14 + 4 = 18 in this particular row.The next 6 bytes represent the ASCII codes for
'Frozen'
:70
for'F'
,114
for'r'
, etc.The final 4 bytes (
[0, 0, 0, 102]
) represent the 4-byte integer stored for the runtime value of102
.
Note: When you store larger integers, the resulting bytes can be harder to interpret. Here are some examples:
If you stored a runtime of 150 in the
Movie
table that we created above, you would see the bytes[0, 0, 0, -106]
for the runtime. This stems from the fact that when only one byte is used to store a signed integer (one that could be negative), it can store any value between -128 and 127. When we store 150 using two or more bytes, the 8 bits in the rightmost byte look like they represent a negative number, because 150 can’t actually be represented using an 8-bit signed integer.If you stored a runtime of 300, you would see the bytes
[0, 0, 1, 44]
for the runtime. That’s because we need more than 8 bits to store 300 as a binary number. In fact, when we convert 300 to binary, we get a 9-bit number: 100101100. When these 9 bits are stored as part of a 32-bit integer, we get:00000000 00000000 00000001 00101100
The bits in the rightmost byte represent the integer 44, and the bits in the byte to its left represent the integer 1.
Try inserting other rows as well, and convince yourself that your marshall()
method is working in all cases. For example, does it work correctly when one of the column values is NULL
?
Problem 6: Completing INSERT
commands
7 points
We have given you the start of the execute()
method of the InsertStatement
class, which is used to carry out INSERT
commands. As mentioned earlier, our provided code uses an InsertRow
object to prepare the row for insertion – marshalling it into a key/value pair. You will need to complete the execute()
method by writing code that:
uses the byte arrays from the
RowOutput
objects in theInsertRow
object to construct the necessary Berkeley DB objects for the key/value pairadds the key/value pair to the underlying BDB database.
Notes:
The insertion should fail if there is already a key/value pair with the specified key. You should choose the BDB insertion method that will return a special value when the specified key already exists. Your code should handle this return value by throwing an exception with an appropriate error message. See our
CreateStatement
code for examples of throwing an exception. Note: The error message will be printed by the code that we’ve provided in thecatch
block. You just need to create an exception with the appropriate error message and throw the exception. See our sample interaction below for what the error message should look like.If the insertion is successful, your code should print an appropriate message to indicate that fact. See our sample interaction below for what it should look like.
Review the
Table
andInsertRow
classes as needed, as well as the Berkeley DBDatabase
class.
Problem 7: Table iterators and unmarshalling
20 points
In order to execute a SELECT
command, your DBMS needs to be able to iterate over the rows in one or more tables, and to access the values of the columns in those rows. In this problem, you will complete the implementation of a table iterator that will be able to iterate over all or some of the rows in a single table and access the values of the columns. We can associate a WHERE
clause with such an iterator, in which case it will only visit rows that satisfy the WHERE
clause.
Each table iterator will be an instance of the provided TableIterator
class. We have already implemented most of the methods of this class for you, including:
a
TableIterator
constructor that takes an already opened table object and initializes the state needed by the table iterator, including:- a cursor for the underlying BDB database
- two initially empty
DatabaseEntry
objects calledkey
andvalue
. TheseDatabaseEntry
objects will be used by the cursor methods to retrieve the current key/value pair.
The constructor also examines the columns mentioned in the SQL statement for which this iterator is needed, and it associates this iterator with those columns; doing so allows the code that evaluated the
WHERE
clause to use the iterator to obtain the column values that it needs. - afirst()
method that positions the iterator on the first tuple of the table. - anext()
method that advances the iterator to the next tuple specified by theSELECT
command. - agetColumn()
method that takes an indexn
and returns aColumn
object for then
th column in the table associated with the iterator. The leftmost column has an index of 0. - aclose()
method that closes the cursor associated with the iterator. - aprintAll()
method that will be called to iterate over all rows in the associated table and print them out.
For this assignment, you should implement the method called getColumnVal()
that takes an index n
and returns the value of the n
th column in the tuple on which the iterator is currently positioned. To do so, it will need to unmarshall the appropriate value from the BDB key/value pair associated with that tuple, and it should use the metadata that you included when you marshalled the tuple to efficiently access the value of the specified column. See the notes below for more detail.
Notes:
We have already given you the code needed to handle the two types of exceptions that are mentioned in the comments before the method.
You code that you write should assume that the underlying cursor has already been positioned on an appropriate key/value pair. The key can be accessed using the
DatabaseEntry
object to which theTableIterator
‘skey
field refers, and the value can be accessed using theDatabaseEntry
object to which theTableIterator
‘svalue
field refers.Your code will need to use one or two
RowInput
objects to unmarshall the value of the specified column.For example, to create a
RowInput
object that is based on the value portion of the current key/value pair, you would do something like the following:RowInput valIn = new RowInput(this.value.getData());
Your
getColumnVal()
method should not perform unnecessary reads. Rather, it should only read (1) the offset or offsets needed to determine where the column value is located and (when necessary) the length of the column value, and (2) the column value itself.The
RowInput
class includes two methods for each type of value:- one that reads a value at a specified offset from the start of the byte array (e.g.,
readIntAtOffset()
andreadDoubleAtOffset()
). These methods jump to the specified position in the underlying byte array before performing the read. - one that reads a value at the current offset from the start of the byte array (e.g.,
readNextInt()
andreadNextDouble()
). When theRowInput
object is created, the current offset is set to 0. After each read, the current offset is updated to be the offset of the byte that comes immediately after the value that was just read.
- one that reads a value at a specified offset from the start of the byte array (e.g.,
The
RowInput
class also includes atoString()
method that you may find useful when debugging. It returns a string that includes the contents of the underlying byte array and the current offset within that array.When unmarshalling the key portion of the key-value pair, you can use the
getSize()
method in itsDatabaseEntry
object to determine the key’s length in bytes.Review the
Table
,Column
, andRowInput
classes as needed, as well as the Berkeley DBDatabaseEntry
class.
Problem 8: SELECT *
for a single table
8 points
Implement the execute()
method of the SelectStatement
class, and any necessary helper methods. For this assignment, you will only need to support SELECT *
commands involving a single table.
Your method will need to open the table associated with the SELECT
command by using the open()
method that we have provided for Table
objects. (See the start of the execute()
method for InsertStatement
for an example of this.) The open()
method will get the table’s catalog metadata and add it to the Table
object, and it will also open the underlying BDB database if it isn’t already open.
Your code should then create a TableIterator
for the appropriate table (assigning it to the variable iter
that we have given you) and invoke the printAll()
method on it. This method, which we have provided in TableIterator
, will invoke the appropriate iterator methods to obtain the table’s column values and display them with appropriate formatting. Note that your SELECT
-statement code does not need to advance the iterator by calling next()
; printAll()
already does all of the iteration – and all of the other work – for you, using the TableIterator
method that you wrote.
Your execute()
method should check for currently unsupported SELECT
commands:
- those with more than one table in the
FROM
clause - those with one or more columns specified in the
SELECT
clause.
If either of those cases holds, you should throw an exception with an appropriate error message. See our sample interaction below for what these messages should look like.
In addition, your method should make sure that there is an existing table with the given name; the open()
method of the Table
object should make it easy to do so. When there is no existing table with the specified name, you should create and throw an exception with no error message, because code in the open()
method will already print the necessary error message.
If there are no errors, your method should finish by printing a message that includes the number of tuples selected. See our sample interaction below for what these messages should look like.
Notes:
When creating the
TableIterator
, make sure that you use the local variableiter
that we have declared for you at the top of the method. Doing so will allow you to take advantage of the code that we have given you at the end of the method, which closes the iterator, and thus its underlying cursor.The
TableIterator
constructor takes a reference to an object of typeSQLStatement
. You should pass in a reference to theSelectStatement
object on which theexecute
method was invoked, which you can do by using the implicit parameterthis
. In addition, you should pass intrue
for the third parameter of the constructor:iter = new TableIterator(this, ..., true);
When you call the
printAll()
method, you should pass inSystem.out
as the parameter, so that the results will be displayed on the console. (The reason that we makeprintAll()
take a parameter for this is for added flexibility. If we wanted to, we could pass in a parameter that corresponds to a text file, and the results would be written to that file instead of to the console.)Review the
Table
,TableIterator
, andSQLStatement
classes as needed.
Sample interaction
To give you a sense of what your DBMS’s output should look like, we have provided a sample interaction below. Note: We set the DEBUG
constant to false
in DBMS.java
before we ran these commands.
Enter command (q to quit): CREATE TABLE Course(name VARCHAR(20), enrollment INT); Created table Course. Enter command (q to quit): SELECT * FROM Course; | name | enrollment | --------------------------------------- Selected 0 tuples. Enter command (q to quit): DROP TABLE Course; Dropped table Course. Enter command (q to quit): SELECT * FROM Course; Course: no such table Enter command (q to quit): CREATE TABLE Course(id CHAR(5) PRIMARY KEY, name VARCHAR(20)); Created table Course. Enter command (q to quit): INSERT INTO Course VALUES ('01000', 'CS 460'); Added 1 row to Course. Enter command (q to quit): INSERT INTO Course VALUES ('00050', 'Math 123'); Added 1 row to Course. Enter command (q to quit): INSERT INTO Course VALUES ('02050', NULL); Added 1 row to Course. Enter command (q to quit): INSERT INTO Course VALUES ('00050', 'Physics 211'); There is an existing row with the specified primary key. Could not insert row. Enter command (q to quit): SELECT * FROM Course; | id | name | ---------------------------------- | 00050 | Math 123 | | 01000 | CS 460 | | 02050 | null | Selected 3 tuples. Enter command (q to quit): SELECT name FROM Course; Specifying column names in the SELECT clause is not supported. Enter command (q to quit): CREATE TABLE Foo(id CHAR(5) PRIMARY KEY, year INT); Created table Foo. Enter command (q to quit): SELECT * FROM Course, Foo; Specifying multiple table names in the FROM clause is not supported. Enter command (q to quit): q
Submitting your work for Part II
Coming soon!