PHP/MySQL Basics

Thursday, April 17th, 2008

1. Establishing a connection to a MySQL Database Using PHP:

You will want to create a variable to hold a reference to your DSN (Data Source Name). A DSN is a specific data structure used to describe a connection to a database. The structure of a DSN varies from language to language.

In PHP a basic DSN take the following form:

("$database_name","$database_user","$database_password")

This array is passed as arguments to the mysql_connect function as follows:

$dsn = mysql_connect("$server_name","$database_user","$database_password");

Remember a PHP variable always takes the form of $variable and can hold any kind of information: a string, a number, a function return, a boolean value, etc.

2. Selecting the Database you want to work with

Once you have established your MySQL connection, you will want to select the database with which you will be working. This is done through the mysql_select_db function:

mysql_select_db("my_database",$dsn);

The $dsn variable will now point to the database “my_database” on the server “localhost”. All table level queries will be executed against this database unless we change to a different database using the mysql_select_db function.

Notice how the mysql_select_db() statement is formated. This is the standard format for funtion calls in PHP:

function_name($argument1, $argument2,   $argument3, ...$argument_n);

Now we’re ready to begin executing queries against the tables in our selected database.

3. Basic queries:

MySQL queries in PHP are executed using the mysql_query function. The results of the mysql_query function must them be parsed by any of a number of available methods into user-intelligible output. For efficiency’s sake we assign the results of a MySQL command to a variable.

Here’s an example of a basic MySQL query executed from PHP:

$sql = "select Name, SurfaceArea, Population from Country limit 5";
$result = mysql_query($sql);

We can use the $result variable with a number of other commands to examine the data returned by our query.

mysql_num_rows tells us the number of rows of data (individual entries) returned:

$numRows = mysql_num_rows($result);
var_dump($numRows);
// returns int(5)

mysql_fetch_row retrieves the first row of data from our results (this is a more useful command in queries that are expected to only return one row). The row is returned as an array of the data in each of the fields in that row:

$rowValues = mysql_fetch_row($result);
var_dump($rowValues);
// returns
// array(3)
// { [0]=>  string(11) "Afghanistan"
// [1]=>  string(9) "652090.00"
// [2]=>  string(8) "22720000" }

mysql_fetch_assoc is used to retrieve an associative array of data from the table. The array takes the format of an array of rows represented by individual arrays of field values:

while ($assocValues = mysql_fetch_assoc($result))     {
var_dump($assocValues);
echo "<br />";
}
// returns
// array(3)
// { ["Name"]=>  string(11) "Afghanistan"
// ["SurfaceArea"]=>  string(9) "652090.00"
// ["Population"]=>  string(8) "22720000" }
// array(3)
// { ["Name"]=> string(11) "Netherlands"
// ["SurfaceArea"]=> string(8) "41526.00"
// ["Population"]=> string(8) "15864000" }
// array(3)
// { ["Name"]=> string(20) "Netherlands Antilles"
// ["SurfaceArea"]=> string(6) "800.00"
// ["Population"]=> string(6) "217000" }
// array(3)
// { ["Name"]=> string(7) "Albania"
// ["SurfaceArea"]=> string(8) "28748.00"
// ["Population"]=> string(7) "3401200" }
// array(3)
// { ["Name"]=> string(7) "Algeria"
// ["SurfaceArea"]=> string(10) "2381741.00"
// ["Population"]=> string(8) "31471000" }

MySQL queries executed from PHP use a data pointer object to describe the read/write position within the current results. Because of this, whenever we work with array data we will need to move the data pointer through the results by using a loop (as above) or by explicitly pointing to a particular record.

Moving the data pointer within the result dataset is accomplished using the mysql_data_seek function:

mysql_data_seek($result, 3);
$assocValues = mysql_fetch_assoc($result);
var_dump($assocValues);
// returns
// array(3)
// { ["Name"]=> string(7) "Albania"
// ["SurfaceArea"]=> string(8) "28748.00"
// ["Population"]=> string(7) "3401200" }

mysql_fetch_array allows you to retrieve either a numerical or associative array from your results set:

while($arrayValues = mysql_fetch_array($result, MYSQL_NUM)) {
var_dump($arrayValues);
}
// returns
// array(3)
// { [0]=> string(11) "Afghanistan"
// [1]=> string(9) "652090.00"
// [2]=> string(8) "22720000" }
// array(3)
// { [0]=> string(11) "Netherlands"
// [1]=> string(8) "41526.00"
// [2]=> string(8) "15864000" }
// array(3)
// { [0]=> string(20) "Netherlands Antilles"
// [1]=> string(6) "800.00"
// [2]=> string(6) "217000" }
// array(3)
// { [0]=> string(7) "Albania"
// [1]=> string(8) "28748.00"
// [2]=> string(7) "3401200" }
// array(3)
// { [0]=> string(7) "Algeria"
// [1]=> string(10) "2381741.00"
// [2]=> string(8) "31471000"}
while($arrayValues = mysql_fetch_array($result, MYSQL_ASSOC)) {
var_dump($arrayValues);
}
// returns same as mysql_fetch_assoc
while($arrayValues = mysql_fetch_array($result)) {
var_dump($arrayValues);
}
// returns the both a numeric and an     associative array describing the
// result set

Relational Database Basics

Thursday, April 17th, 2008

a database-
a system which allows for the quick storage, retrieval, and manipulation of data

there are several different types of database systems available, but we will be concentrating on Relational Databases, using the technology MySQL

OK, what is a relational database then? How does it differ from other databases?

A relational database (RDBMS, Relational Database Management System) is a database system which uses a set of tables to hold related data.
A table is a collection of data records in which all records contain the same fields.
A field can be described as a container for a basic unit of data.
An entry in an RDBMS table which defines one set of related fields is referred to as a row.
While the RDBMS model is not hierarchical, the description from set to unit of the system is as follows:

DATABASE -> TABLE -> ROW -> COLUMN -> FIELD

The above example is somewhat misleading, as there is no distinct hierarchy between column and row.
Each table by definition is composed of rows and columns, and field values may be accessed by referring to either
a column or a row or both. Furthermore, the order or columns and rows in the definition of the database is insignificant:
data is retrieved, stored, and manipulated based on field values as particular insertion points as opposed to in any sequential order.

What is an insertion point?
An insertion point is defined as a record within a database at which a particular row/column intersection occurs,
or at which a particular field value is returned. Insertion points are used to specify where to begin read/write operations
for a particular database query, and upon which records in the database to perform the given query’s operation.

What is a query?
A query is the term used to define a command given to a database.
For our purposes, a query is a command given via Structured Query Language (SQL) syntax.

Each of the entities in the relationship described above has its own set of properties, such as name,
value, data type, data length, etc.

Here’s an example:

DATABASE = movies
TABLE = movie_filename
ROW = 3
COLUMN = filename

In the above example, suppose we are dealing with an application which is meant to store and retrieve video files for playback in an online system.
The database storing the information which describes the video files used in the application is called “movies”.
Within the movie database are various tables which describe each aspect of the video files.
One such table is reserved for the filenames of the video files to be accessed.
Given the conditions above, which calls the “filename” field of the third entry in the table “movie_filename”, the following information might be returned:

  1. Field Name = “filename”
  2. Field Value = “dinosaurs.mov”
  3. Field Type = VARCHAR
  4. Field Length = 255
  5. NULL = NOT NULL
  6. Field Index = FALSE

What does the above tell us?

  1. The field we have asked for belongs to the column described as “filename”.
    The designer of the database is letting us know that data appropriate to the description and function of “filename” should be placed in this field.
  2. The value of the field “filename” in the third row of the table “movie_filename” is “dinosaurs.mov”.
  3. The type of data expected by the particular column to which this field belongs. MySQL data is strongly typed- that is, each column is defined as capable of storing only one kind of data. Attempts to store data that do not conform to the columns defined data type will fail.
    In the above example the data type is VARCHAR, or variable character, indicating that a string composed of any character set is permissible.
  4. The maximum length of data to be stored by a given column.
    Each field entry for this column must be no greater than this length; otherwise the entry will be truncated to the maximum column length or fail.
  5. Flags whether NULL values are allowed in the given field’s column.
    In this case the column is defined to not accept NULL values. NULL indicates the absence of data and type. It is recommended to always use real data types as opposed to NULL values to conform to our strongly typed data model. This allows us to test for data type when retrieving information from the database in order to assure the accuracy of our data.
  6. Indicates that the column in question is not an index; i.e. this column is not used to keep track of the order and number of occurrences of all entries within this particular table.
    An index is a close analog to what we might imagine as the index of a cookbook: it is used to look up related categories of data on a granular level. So while a higher level table of contents might give us an arbitrary category for main dishes, an index will give as a granular listing of all recipes which include garbanzo beans.
    An index is an organizational tool for atomic values.

What do you mean by atomic values?
“Atomicity” is an arbitrary term regarding the composition of a particular data field. An atomic value is defined as one that “cannot be decomposed into smaller pieces by the DBMS (excluding certain special functions).” (Codd, E.F. The Relational Model for Database Management Version 2 (Addison-Wesley, 1990), p. 6.)

In practice there is no such thing as an atomic value, as almost any entity can be broken down into component entities (a sentence into words, a word into letters, a letter into a byte-code sequence, etc). The theory of atomic values is best applied by limiting each field entry to only one value of the data type and semantic type expected by the column. For example, in a database for an address book application, a table could be set up to hold information as such:

TABLE contacts
Column: contact_name (VARCHAR(255), NOT NULL)
Column: phone_number (TEXT, NOT NULL)
Column: email (VARCHAR(55), NOT NULL)

The above table has one column used as a primary index for quickly searching and organizing the data in the table, phone_numberID.
The second column is used to store phone numbers.
It has a data type of text, which will allow for the storage of an arbitrarily long entry of mixed characters.

Suppose one of the entries in the address book contained a home number, a work number, and cell number.
A storage method ignoring the concept of atomicity using the above table would place all three numbers into one entry as such:

contact_name: “john smith”
phone_number: “773-555-1212, 773-555-1234, 773-555-5678″
email: jsmith@null.net

The above fails the atomicity test as three distinct values for phone number are stored in the phone_number field.

This example also violates another principle of database design, Normalization.
Normalization is a database design technique that emphasizes the use of multiple tables to reduce the amount of duplicate information in a database.

The above example disregards the concept of normalization by storing the contact’s name in the same table as the phone number and email, and by storing more than one phone number in the phone_number field. This makes retrieving and parsing the data difficult, as one is more likely to retrieve more information than required to perform a specific task. Failure to adhere to this practice also makes updating the database more difficult, as updating a contact’s email address in the above requires modifications to the row containing all of the user’s information.

A normalized version of the above address book database would look something like this:

TABLE contacts
Column: contact_ID (INT(11), NOT NULL, PRIMARY KEY, AUTO INCREMENT)
Column: contact_fname (VARCHAR(255), NOT NULL)
Column: contact_lname (VARCHAR(255), NOT NULL)

TABLE phone_number
Column: phone_number_ID (INT(11), NOT NULL, PRIMARY KEY, AUTO INCREMENT)
Column: contact_ID (INT(11), NOT NULL, FOREIGN KEY)
Column: phone_number (VARCHAR(12), NOT NULL)

TABLE email
Column: email_ID (INT(11), NOT NULL, PRIMARY KEY, AUTO INCREMENT)
Column: contact_ID (INT(11), NOT NULL, FOREIGN KEY)
Column: email (VARCHAR(55), NOT NULL)

And the above entry would exist in the normalized database as follows:

TABLE contacts
Column: contact_ID: 1
Column: contact_fname: john
Column: contact_lname: smith

TABLE phone_number
Column: phone_number_ID: 1
Column: contact_ID: 1
Column: phone_number: “773-555-1212″
Column: phone_number_ID: 2
Column: contact_ID: 1
Column: phone_number: “773-555-5678″
Column: phone_number_ID: 3
Column: contact_ID: 1
Column: phone_number: “773-555-1234″

TABLE email
Column: email_ID: 1
Column: contact_ID: 1
Column: email: jsmith@null.net

The above design simplifies the data access methods necessary to retrieve the contacts information.
If we want to grab only the users first name and the second instance of their phone number, we can do so by simply
asking for those two fields as opposed to asking for an entire row from our table and then parsing it.

We can also more easily update and refine the information in our database.
If our contact gets a second email address, it can be easily inserted into the email table.
If the contact shuts of their LAN line we can simply remove that entry from the phone numbers table. All without ever touching our main entry for the contact.

If we want to know whether a phone number is a home or cell or business number, we can add the following table, with the following entries:

phone_number_type
column: phone_number_type_ID (INT(11), NOT NULL, PRIMARY KEY, AUTO INCREMENT)
column: phone_number_type (VARCHAR(4),NOT NULL)

phone_number_type_ID: 1
phone_number_type: home
phone_number_type_ID: 2
phone_number_type: cell
phone_number_type_ID: 3
phone_number_type: office

We can then update our phone_number table as follows:
1. add a new field, phone_number_type_ID
2. update our entries to include values for phone_number_type_ID

Again, all of this can be done without modifying any other tables or entries in the database. A similar procedure could be followed for email.
Other tables could be added for addresses, IMs, titles, associations, etc, again never modifying the structure of any of the existing tables and never compromising the integrity of our data.

The ability to relate and process information across multiple tables as in the above example is at the heart of what make relational databases tick.
Thoughtful, pragmatic database design begins with the concepts of atomicity and normalization.
Following these methods from the beginning of your project will save you more than a few grey hairs when you are in the midst of building your application.

links to online projects

Heat and the Heartbeat of the City, Andrea Polli
http://turbulence.org/Works/heat/index2.html

Earthquake Simulations:
http://memento.ieor.berkeley.edu/seismo.html?
http://www.ncedc.org/ftp/outgoing/userdata/quicklook/BKS.BK.LHZ.current.gif
http://memento.ieor.berkeley.edu/earthquake3.html
http://earthquake.usgs.gov/regional/nca/1906/18april/got_seismogram.php
http://www.abag.ca.gov/bayarea/eqmaps/eqtrans/result.html
http://gis.abag.ca.gov/?go=liq
http://americahurrah.com/images/EQmap.jpg
http://www.lib.berkeley.edu/EART/1906/g4364_s5_1906_w3_1.jpg
http://earthquake.usgs.gov/regional/nca/virtualtour/earthquake.php
http://www.ieor.berkeley.edu/~goldberg/art/Ballet-Mori/videos/Ballet-Mori-AVI.avi
http://allshookup.org/mpgvideo/mpgvideo.htm