Importing dates with pig into hive from a csv can be tricky. Depending on the format, the default datetime pig parser likely won't work.
No worries though, just import the date field as a standard chararray and then parse the string using the ToDate format.
For example, say we have a csv in the format:
"Name","City","State","Close Date" "My Cool Co","San Francisco","CA","31-Jan-10" "My Fast Co","Palo Alto","CA","15-Feb-12"
Create the table in hive:
DROP TABLE IF EXISTS financings; CREATE TABLE financings ( name STRING, city STRING, state STRING, close_date TIMESTAMP );
Here's the pig script, you'll want to run it with
-- load the piggybank library REGISTER 'hdfs:///usr/lib/pig/piggybank.jar' -- and the from the piggybank, the standard csvloader DEFINE CSVLoader org.apache.pig.piggybank.storage.CSVLoader(); -- import the csvs, KEY is your iam key and SECRET the associated secret, bucket-name is your bucket -- we could import csvs from an hdfs path as well, depending on where your data is stored -- this assumes my csvs are all in s3 in bucket-name/my/cool/path and are named in the format "ABunchOfCsvs-2015-01-30.csv" raw_financings_csvs = LOAD 's3n://KEY:[email protected]/my/cool/path/ABunchOfCsvs*.csv' using CSVLoader() AS (name:chararray, city:chararray, state:chararray, close_date:chararray); -- since the CSVLoader doesn't skip the csv headers, filter them out filtered_raw_financings_csvs = FILTER raw_financings_csvs BY (close_date != 'Close Date'); -- now parse the date string, transforming it into a pig date type -- we want to return null if close_date is empty, hence the ternary operation financings_csvs = FOREACH filtered_raw_financings_csvs GENERATE name, city, state, (close_date==''?NULL:ToDate(close_date, 'dd-MMM-yy')) AS close_date; -- write the results to hive STORE financings_csvs INTO 'financings' USING org.apache.hive.hcatalog.pig.HCatStorer();