Getting started with SKiN

 

Best practices for setting up data import

 

Now that you have SKiN, you want to get your data loaded, so that you can see your business on the map. Follow these simple guidelines to get there quickly.

 

1. FILE FORMAT

 

SKiN imports the following table files:   

 

Microsoft Excel format files .xls and .xlsx  

 

Comma Separated Values .csv files 

2. GEOGRAPHICAL REFERENCE

 

In order to map your data within SKiN you data needs to contain a geographical indicator, so that SKiN can geocode the data so that it can be represented on the map. Geocoding describes a way of bringing non-map data into SKiN such that its geographic properties can be identified and the records positioned in space.

 

The main 3 types of Geographical Reference for SKiN are:  

 

Postcodes for the UK, SKiN will geocode the postcodes in your data against postcode data held in SKiN. SKiN stores the Grid Coordinates for every postcode in the country. The Grid Coordinates for the postcodes get copied across to join the data. This creates a new layer that can be added to a map within SKiN.    

 

X, Y , Latitude, Longitude Coordinates held in LLWGS84 Coordinate system, there are many different types of coordinate systems, each will place the “same” point at a different location. 

 

SKiN uses LLWGS84 Coordinate system to map data geographically therefore in order to ensure that your data is mapped in the location you would expect, you will need to ensure that you provide a valid X and Y coordinates for each row of data that you bring into SKiN. And that these coordinates are held in LLWGS84.    

 

Google Address Matching rather than using a postcode or x,y coordinates to geocode your records, you can use the Google address matching  which is the process of converting addresses (like "4059 Mt Lee Dr. Hollywood, CA 90068”) into geographic coordinates (like latitude 37.423021 and longitude -122.083739), which you can use to position your data on the map.    

 

With all 3 methods of Geocoding the level of accuracy is driven by how accurately SKiN can match the data you put in against the data within the system. 

3. DATA SIZE

 

SKiN imports data most effectively into the system when that data set has 10,000 rows, there are 10 or less populated columns and the fields are no more than 255 characters.   

 

Rows: To address the 10,000 rows there are a number of actions you can take to ensure the data is set up most effectively:  

 

Check the data for duplicate records - Use the Microsoft Excel Function dedupe to remove duplicates  

 

Can the data to aggregated together - Use the Microsoft Excel Pivot tables to aggregate data where there are duplicate records that contain different information (eg to rows of store data representing sales volumes from two accounts).  

 

Check for misspelt records, two rows should be the same, but misspelt information means that the data is duplicated.  

 

Check for empty rows or rows without a Geocode  - Ensure that you are not asking SKiN to bring in data that will not be useable.   

 

In some cases you can significantly reduce the size of your data be going through these simple steps. If you require further information regarding the techniques mentions above please refer to the separate documents are speak to a member of the Geoplan Support team who will be able to assist you.  

 

Columns:  To get the best out of SKiN it is important that the data put into the system is relevant to the objectives that the output is trying to fulfil. With this in mind it is recommended that the number of data columns within a dataset is limited both for ease of use but also so that it does not comprise the speed of use of the data within the system. There are a few key items that you need to consider when checking the number of columns:  

 

Does the column add anything to the analysis? If not then remove the column.  

 

Can the data be split in to a separate file? If your data set if very big (10,000 rows or more) then by splitting out the data columns set s into separate files will help with performance.  

 

Field Format: SKiN reads the first 255 field characters within a cell. Felds with more than 255 are not readable. The best course of action to deal with this is:  

 

Abbreviate the data  

 

Manually truncate  

 

Split and fields longer than 255 characters 

4. DATA AND FILE STRUCTURE

 

When you work in Excel you quite rightly will format you data in a certain way, especially if you are using it for reporting or presentation purposes. However if you wish to bring data into SKiN from Excel or even csv format, you will need to check and cleanse your data for the following items:  

 

Date Field Formats: SKiN represents the date in UK date format i.e. DD/MM/YY/ HH:MM am/pm. If date/time data is important, format the input spreadsheet using a standard Excel date/time format Number field types  

 

Column names:  To import and excel spreadsheet or csv file into SKiN, it must have just one row of column names in the first row, and the data should start directly below that in the second row.  The contents of the first row will be used as the names for the fields in your layer.  Any blank lines, titles, or header information must be removed it is imported.  Also, each column must have its own field name, so any field names that span two or more columns should be replaced with names for each field. 

 

Reserved words: If your data contains special characters such as those in other languages like Japanese, Arabic, or Greek or other special characters like smart quotes (i.e., curvy quotation marks) you will need to either remove these.    

 

Worksheets: Microsoft Excel has a number of worksheets that you can work within, the tabs along the bottom of the Excel file indicates these worksheets. SKiN will read the first worksheet from an Excel file, when you import the data. Therefore if you regularly use Excel documents to import data into SKiN, it is good practice to save these as separate files.    

Data set-up

 

There are a number of steps you need to go through prior to adding your data into SKiN, as previously covered in earlier sections. At a top level, you will have at this stage worked through the following steps regarding data:

Step One

Checked that you have data and the necessary information to achieve your objectives

Step Two

You have defined the factors that the data need to represent

Step Three

You have considered the quality of your data and assessed whether or not it will compromise the quality of the required output.

Step Four

You are now at the stage where by you need to set up the data ready for input into SKiN.

Data quality

 

There are 4 main areas where data quality can be comprised, and therefore it is always important to consider these when using data within SKiN to help you with your analysis, decision making and production of results.

 

Original data collection:

The data is only as good as the source of collection, once it has been collected it is almost impossible to improve its quality.

Data input to SKiN

There are a number of ways data can be input into SKiN each method has its advantages, therefore the most appropriate method should be chosen for the give piece of work. The decision making loop can be used a method of verification for this.

Data analysis

Any data manipulation within SKiN can influence the quality of data, in particular changing the scale of the data and overlaying several layers.

Presentation of data

How you represent you data within SKiN and also out of SKiN for that matter can affect the outcome of the decisions made when looking or analysing it.  For example the choice of colour or size used may influence the significance of a particular feature.