Learning SQL for Arctos

(See these queries and more in a single SQL file)

1. The Arctos SQL interface tool is here

2. Write your SQL in an editor with syntax coloring, then paste it into the box. Good, free coding editors are BBEdit for Mac, and NotePad++ for Windows, but there are thousands of others to choose from.

3. As an Arctos user (i.e., not a developer) we can only use the SELECT statement of SQL (Not INSERT, DELETE, UPDATE…). This greatly reduces what you have to learn. Access via SQL is read only so you cannot mess anything up!

4. You will find that almost all the data you need to query is in a single table in Arctos called flat - this is a composite, flattened table made from the tens (hundreds) of tables in Arctos. This table also greatly reduces what you have to learn (at first) - you do not need to make cross-table JOINs in your SQL. To see the fields in flat enter:

SELECT * FROM flat LIMIT 1

(do not omit the LIMIT 1!) Or… see the field list here.

5. A basic SELECT statement goes like this:

SELECT   <output fields>
FROM     <table>
WHERE    <some set of logical filters>
ORDER BY <some output fields to sort by>
LIMIT    <how many lines you want>

An example:

SELECT guid, scientific_name, family
FROM flat
WHERE (dec_lat  > 50   AND
       dec_lat  < 60   AND
       dec_long < -160 AND
       dec_long > -170)
  AND  began_date > '2010-01-01'
  AND  guid_prefix = 'UAM:Herb'
ORDER BY family, scientific_name
LIMIT 100

6. Click the CSV button if you want to download the results.

7. Sometimes you will find that not all the data you need is in the flat table.

There are hundreds of tables in Arctos, linked together via shared field values and identifiers. The basic goal in a relational database is to “normalize” the data, which means that no information element is stored redundantly (e.g., in two or fields or two records). For example, if there are two collections from the same place, the longitude and latitude of that place should not be stored twice in the same collections table. Note: the cached flat table is denormalized table with lots of redundancy of this kind.

The link between tables can be visualized in an Entity-Relationship (“ER”) diagram. There is no recent ER diagram for Arctos, but here is one from 2013. You can however see a list of tables here; in the section ‘Constraints’ below the fields you can see which fields (‘OriginatesFrom’) link to fields in other tables (‘ReferencesColumn’).

See the SQL examples for how to JOIN tables.

8. As you start to write WHERE filters you will probably soon find Regular Expressions to be invaluable. These strings of wildcard symbols allow the matching of just about any substring you can imagine. In PostgreSL, you can apply them with the ~ operator. E.g. WHERE guid ~ 'UAMb?:(Herb|ALA)' will match ‘UAM:Herb’, ‘UAMb:Herb’ and ‘UAM:Alg’.

9. Even though this are ‘next level’ SQL, be aware that you can nest SELECT statements to query a table you create with an interior query: SELECT * FROM (SELECT ....) AS a WHERE a.x ... ;. Powerful stuff!

10. More resources: