ENGAGE-320: Refactor the import code for McCord data

Metadata

Source
ENGAGE-320
Type
Bug
Priority
Critical
Status
Closed
Resolution
Won't Fix
Assignee
N/A
Reporter
Michelle D'Souza
Created
2010-02-03T16:33:14.000-0500
Updated
2017-12-22T09:44:29.975-0500
Versions
  1. 0.3b
  2. 0.3
Fixed Versions
  1. 0.5
Component
  1. Data Import

Description

The import code that has been used to import the McCord data requires heavy refactoring. The overall structure of the code may need to be rearchitected. Here are some code review notes for the current code:

There is too much functionality in 'main'. At minimum it should be put into a method in the utilities file but may actually be better factored in a different class.

line 25
a hardcoded config file directory

  • change to using a relative path to the current directory

line 40

empty response is hardcoded - a change to what an empty response returned would cause this to break

config file contains el paths to data block that are in HTML

  • we should document this or change the structure of the config file to be more intuitive

prepareImport should use a 'converter' to externalize the actual conversion so that CouchDbUtility can be used for CSV, XML etc. This means adding to the XMLConverter an API which uses an URL

line 67 and 71
hardcoded URLs to the McCord data sources

line 79 and 80
hardcoded URLs to the database

line 84
hardcoded ids to exhibitions

line 90 and 98
hardcoded config file name

line 92
change the strategy of getting to the accessnumber to use an EL path

line 107
change the error handling to use logging

Comments

  • Michelle D'Souza commented 2010-02-03T16:34:07.000-0500

    The current code is in a patch on ENGAGE-290 but we will commit it to the scratchpad now so that other people can run imports more easily.

  • Colin Clark commented 2010-02-04T14:40:50.000-0500

    Yura and I have done some superficial refactoring of the import code to make it easier to read and somewhat better factored. This included:

    • Making hard coded values (URLs, database names, config file paths) constants so it is clear that they are hard coded
    • Breaking the main import loop into several methods that are more clearly named

    There's still more work to be done here, in particular the ability to swap out strategies for different import formats, better exception handling, and the like.

  • Justin Obara commented 2017-12-22T09:44:29.974-0500

    The repository has been archived.