Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check: data format valid, including character encoding issues #12

Open
mbjones opened this issue Feb 8, 2023 · 2 comments
Open

Check: data format valid, including character encoding issues #12

mbjones opened this issue Feb 8, 2023 · 2 comments
Labels
Milestone

Comments

@mbjones
Copy link
Member

mbjones commented Feb 8, 2023

Purpose

To ensure that the content in a data file is valid with respect to its data format. Relates to checks #9 and #2

Components

For each object in a package, ensure the following:

  1. all characters in text files are valid within the declared character encoding for that file
    • e.g., ASCII files only contain characters in the range \x00 to \xFF
    • e.g., Unicode encoded text files only contain characters in the correct range (e.g., for UTF-8)
  2. specialized text formats are valid according to their subtype
    • e.g., CSV files match the CSV format
    • e.g., XML files are well-formed XML
    • e.g., JSON files are well-formed JSON
  3. binary files validate against the formatId for which it is declared
    • e.g., hdf5 files are valid HDF5

We might consider whether the text versus binary checks in the list above might be better handled as separate checks.

Result

  • SUCCESS if all files in the package are valid
  • FAILURE if one or more files are not valid
    • output should include reason for validity problem, and, if posisble, line number and listing showing the problematic data
  • ERROR if the check system fails

Example:

This example ADC data package contains a data file that should be ASCII-formatted CSV data, but contains erroneous non-ASCII characters, as shown below:

$ cat ASDN_Bird_nests.csv | pcregrep --color='auto' -n "[\x80-\xFF]"
1201:,2005,bylo,05byloagplbl01,amgp,blalibert��,20-Jun-05,171,73.18611,-79.95256,camp1_2kx4k,out,,08-Jul-05,189,systematic search,,,-,-,-,-,-,-,-,-,-,-,0,,,4,4,20-Jun-05,8,20-Jun-05,171,24-Jun-05,175,10-Jun-05,161,13-Jun-05,,,float,inc,failed,observer,0,0,0,0,0,0,0,,0,NA,,NA,,,,,,nest stepped on :(,out,0
1202:,2005,bylo,05byloagplbl02,amgp,blalibert��,23-Jun-05,174,73.14646,-80.0083,camp1_2kx4k,in,,16-Jul-05,197,systematic search,,,-,-,-,-,-,-,-,-,-,-,0,,,4,4,23-Jun-05,3,28-Jun-05,179,3-Jul-05,184,18-Jun-05,169,21-Jun-05,,,float,inc,failed,predation,0,0,0,0,0,4,0,,0,NA,,NA,,,,,,,in,0
1203:,2005,bylo,05byloagplbl03,amgp,blalibert��,29-Jun-05,180,73.14302,-80.0186,camp1_2kx4k,in,,16-Jul-05,197,systematic search,,,-,-,-,-,-,-,-,-,-,-,0,,,4,4,29-Jun-05,9,29-Jun-05,180,3-Jul-05,184,18-Jun-05,169,21-Jun-05,,,float,inc,failed,predation,0,0,0,0,0,4,0,,0,NA,,NA,,,,,,,in,0
1205:,2005,bylo,05byloagplmaude,amgp,mgrahamsauv��,09-Jul-05,190,73.16003,-79.95324,camp1_2kx4k,in,,17-Jul-05,198,systematic search,,,-,-,-,-,-,-,-,-,-,-,0,,,4,4,9-Jul-05,18,18-Jul-05,199,18-Jul-05,199,20-Jun-05,171,23-Jun-05,18-Jul-05,199,hatch,inc,hatched,,0,0,0,0,0,0,0,,0,NA,,NA,,,,,,"1 chick, 2 starred, 1 holed on 199",in,0
1208:,2005,bylo,05bylobasabl04,basa,blalibert��,23-Jun-05,174,73.15517,-79.9869,camp1_2kx4k,in,,06-Jul-05,187,systematic search,,,-,-,-,-,-,-,-,-,-,-,0,,,4,4,23-Jun-05,8,23-Jun-05,174,28-Jun-05,179,13-Jun-05,164,16-Jun-05,,,float,inc,failed,predation,0,0,0,0,0,4,0,,0,NA,,NA,,,,,,,in,0
1209:,2005,bylo,05bylobasabl05,basa,blalibert��,23-Jun-05,174,73.14708,-80.00437,camp1_2kx4k,in,,01-Jul-05,182,systematic search,,,-,-,-,-,-,-,-,-,-,-,0,,,4,4,23-Jun-05,13,23-Jun-05,174,28-Jun-05,179,8-Jun-05,159,11-Jun-05,,,float,inc,failed,predation,0,0,0,0,0,4,0,,0,NA,,NA,,,,,,,in,0
1218:,2005,bylo,05bylowrsabl02,wrsa,blalibert��,19-Jun-05,170,73.1607,-79.95419,camp1_2kx4k,in,,05-Jul-05,186,systematic search,,,-,-,-,-,-,-,-,-,-,-,0,,,4,4,22-Jun-05,8,27-Jun-05,178,3-Jul-05,184,11-Jun-05,162,15-Jun-05,,,float,inc,unknown,,0,0,0,0,0,0,0,,0,NA,,NA,,,,,,,in,0
1219:,2005,bylo,05bylowrsabl05,wrsa,blalibert��,21-Jun-05,172,73.15458,-80.00137,camp1_2kx4k,in,,11-Jul-05,192,SYSTEMATIC SEARCH,,,-,-,-,-,-,-,-,-,-,-,0,,,4,4,21-Jun-05,1,6-Jul-05,187,8-Jul-05,189,18-Jun-05,169,21-Jun-05,,,float,inc,failed,predation,0,0,0,0,0,4,0,,0,NA,,NA,,,,,,,in,0
1220:,2005,bylo,05bylowrsabl06,wrsa,blalibert��,26-Jun-05,177,73.13774,-80.04837,camp1_2kx4k,out,,06-Jul-05,187,systematic search,,,-,-,-,-,-,-,-,-,-,-,0,,,4,4,26-Jun-05,11,29-Jun-05,180,3-Jul-05,184,13-Jun-05,164,16-Jun-05,,,float,inc,failed,predation,0,0,0,0,0,4,0,,0,NA,,NA,,,,,,,out,0
1221:,2005,bylo,05bylowrsabl08,wrsa,blalibert��,02-Jul-05,183,73.15022,-79.96249,camp1_2kx4k,in,,12-Jul-05,193,systematic search,,,-,-,-,-,-,-,-,-,-,-,0,,,4,4,6-Jul-05,15,10-Jul-05,191,11-Jul-05,192,19-Jun-05,170,22-Jun-05,,,float,inc,failed,predation,0,0,0,0,0,4,0,,0,NA,,NA,,,,,,"starred on 190 and 191, empty on 192",in,0
1234:,2005,bylo,05bylowrsanew,wrsa,blalibert��,09-Jul-05,190,73.15642,-80.00924,camp1_2kx4k,in,,,NA,incidental,,,-,-,-,-,-,-,-,-,-,-,0,,,4,4,,,9-Jul-05,190,9-Jul-05,190,16-Jun-05,167,19-Jun-05,9-Jul-05,190,hatch,hatch,hatched,,0,0,0,0,0,0,0,,0,NA,,NA,,,,,,found with chicks in still wet on 190,in,0
1244:,2006,bylo,06bylobasaagm02,basa,agu��rettemontminy,15-Jun-06,166,73.16071,-79.94466,camp1_2kx4k,in,,03-Jul-06,184,incidental,,,-,-,-,-,-,-,-,-,-,-,0,,,4,4,16-Jun-06,4,16-Jun-06,167,17-Jun-06,168,10-Jun-06,161,13-Jun-06,,,float,inc,failed,predation,0,0,0,0,0,4,0,,0,NA,,NA,,,,,,,in,0
1245:,2006,bylo,06bylobasaagm04,basa,agu��rettemontminy,19-Jun-06,170,73.15542,-79.98701,camp1_2kx4k,in,,27-Jun-06,178,incidental,,,-,-,-,-,-,-,-,-,-,-,0,,,4,4,20-Jun-06,14,20-Jun-06,171,26-Jun-06,177,4-Jun-06,155,7-Jun-06,,,float,inc,failed,predation,0,0,0,0,0,4,0,,0,NA,,NA,,,,,,,in,0
1246:,2006,bylo,06bylobasabl01,basa,blalibert��,02-Jul-06,183,73.15481,-79.94515,camp1_2kx4k,in,,07-Jul-06,188,incidental,,,-,-,-,-,-,-,-,-,-,-,0,,,4,4,2-Jul-06,16,4-Jul-06,185,8-Jul-06,189,13-Jun-06,164,17-Jun-06,,,float,inc,unknown,,0,0,0,0,0,0,0,,0,NA,,NA,,,,,,,in,0
1258:,2006,bylo,06bylobasamg01,basa,mgrahamsauv��,20-Jun-06,171,73.15113,-79.9744,camp1_2kx4k,in,,,NA,incidental,,,-,-,-,-,-,-,-,-,-,-,0,,,4,4,,,20-Jun-06,171,21-Jun-06,172,,,,,,,inc,failed,predation,0,0,0,0,0,4,0,,0,NA,,NA,,,,,,,in,0
1262:,2006,bylo,06bylowrsaagm02,wrsa,agu��rettemontminy,22-Jun-06,173,73.14624,-79.97444,camp1_2kx4k,in,,,NA,incidental,,,-,-,-,-,-,-,-,-,-,-,0,,,4,4,,,22-Jun-06,173,27-Jun-06,178,,,,,,,inc,failed,predation,0,0,0,0,0,4,0,,0,NA,,NA,,,,,,,in,0
1263:,2006,bylo,06bylowrsabl01,wrsa,blalibert��,28-Jun-06,179,73.15834,-79.96612,camp1_2kx4k,in,,05-Jul-06,186,incidental,,,-,-,-,-,-,-,-,-,-,-,0,,,4,4,29-Jun-06,15,29-Jun-06,180,2-Jul-06,183,12-Jun-06,163,15-Jun-06,,,float,inc,failed,predation,0,0,0,0,0,4,0,,0,NA,,NA,,,,,,,in,0
1305:,2008,bylo,08byloplarec01,bbpl,echalifour,24-Jun-08,176,72.85269,-79.91654,camp 2,,,15-Jul-08,197,incidental,,,-,-,-,-,-,-,-,-,-,-,0,,,4,4,24-Jun-08,4,14-Jul-08,196,14-Jul-08,196,16-Jun-08,168,20-Jun-08,14-Jul-08,196,hatch,inc,hatched,,0,0,0,0,0,0,0,,0,NA,,NA,,,0,,,"4 eggs on 192, 3 on 191, 2 on 190 starred, 1 on 200 with hole���",out,0
1315:,2008,bylo,08byloplbrmpm08,amgp,mpatenaudemonette,06-Jul-08,188,72.88495,-79.97607,camp 2,,,,NA,systematic search,,,-,-,-,-,-,-,-,-,-,-,0,,,4,4,6-Jul-08,,12-Jul-08,194,14-Jul-08,196,14-Jun-08,166,17-Jun-08,12-Jul-08,194,hatch,inc,hatched,,0,0,0,0,0,0,0,,0,NA,,NA,,,0,,,"stars on 192, no eggs on 196���",out,0
1343:,2010,bylo,10byloamgpejf02,amgp,echalifour,02-Jul-10,183,72.86557,-79.95411,tombatudlik,,,23-Jul-10,204,incidental,,,-,-,-,-,-,-,-,-,-,-,0,,,4,4,2-Jul-10,5,18-Jul-10,199,18-Jul-10,199,24-Jun-10,175,28-Jun-10,,,float,inc,unknown,,0,0,0,0,0,0,0,,0,NA,,NA,,,,,,AUCUN SIGNE D'��CLOSION (199),out,0
1347:,2010,bylo,10byloamgpjb06,amgp,jbety,02-Jul-10,183,72.86,-79.92543,tombatudlik,,,19-Jul-10,200,incidental,,,-,-,-,-,-,-,-,-,-,-,0,,,4,4,2-Jul-10,9,2-Jul-10,183,2-Jul-10,183,20-Jun-10,171,24-Jun-10,,,float,inc,unknown,,0,0,0,0,0,0,0,,0,NA,,NA,,,,,,1 OEUF BRIS�� PAR JB-RETIR��,out,0
1400:,2010,bylo,10bylopesa2,pesa,jflamarre,25-Jun-10,176,72.88213,-79.84883,camp 2,,,,NA,incidental,,,-,-,-,-,-,-,-,-,-,-,0,,,4,4,,,25-Jun-10,176,6-Jul-10,187,,,,,,,inc,unknown,,0,0,0,0,0,0,0,,0,NA,,NA,,,,,,NID GRATT��,out,0
1417:,2010,bylo,10byloplbrec09,amgp,echalifour,04-Jul-10,185,72.80629,-79.6411,dufour,,,12-Jul-10,193,incidental,,,-,-,-,-,-,-,-,-,-,-,0,,,4,4,4-Jul-10,18,NA,NA,,,13-Jun-10,164,17-Jun-10,,,float,inc,undetermined,,0,0,0,0,0,0,0,,0,NA,,NA,,,,,,V��RIFIER POUR NT AVEC CHALIF,out,0
1418:,2010,bylo,10byloplbrec10,amgp,echalifour,04-Jul-10,185,72.80466,-79.62079,dufour,,,20-Jul-10,201,incidental,,,-,-,-,-,-,-,-,-,-,-,0,,,4,4,4-Jul-10,10,NA,NA,,,21-Jun-10,172,25-Jun-10,,,float,inc,undetermined,,0,0,0,0,0,0,0,,0,NA,,NA,,,,,,V��RIFIER POUR NT AVEC CHALIF,out,0
1422:,2010,bylo,10byloplbrec14,amgp,echalifour,05-Jul-10,186,72.79911,-79.55968,dufour,,,14-Jul-10,195,incidental,,,-,-,-,-,-,-,-,-,-,-,0,,,4,4,5-Jul-10,17,NA,NA,,,15-Jun-10,166,19-Jun-10,,,float,inc,undetermined,,0,0,0,0,0,0,0,,0,NA,,NA,,,,,,V��RIFIER POUR NT AVEC CHALIF,out,0
1432:,2010,bylo,10bylowrsa1,wrsa,echalifour,26-Jun-10,177,72.88298,-79.93615,camp 2,,,,NA,incidental,,,-,-,-,-,-,-,-,-,-,-,0,,,4,4,,,NA,NA,,,,,,,,,inc,undetermined,,0,0,0,0,0,0,0,,0,NA,,NA,,,,,,EN ��CLOSION (197--->198),out,0
1440:,2010,bylo,10bylowrsajfl04,wrsa,jflamarre,24-Jun-10,175,73.15967,-79.93832,camp1_2kx4k,in,,09-Jul-10,190,incidental,,,-,-,-,-,-,-,-,-,-,-,0,,,4,4,24-Jun-10,6,28-Jun-10,179,3-Jul-10,184,16-Jun-10,167,19-Jun-10,,,float,inc,failed,predation,0,0,0,0,0,4,0,,0,NA,,NA,,,,,,PR��DATEUR AVIAIRE,in,0
1455:,2011,bylo,11byloamgpec07,amgp,echalifour,04-Jul-11,185,72.80245,-79.52228,dufour,,,,NA,incidental,,,-,-,-,-,-,-,-,-,-,-,0,,,4,4,,,NA,NA,,,,,,,,,inc,undetermined,,0,0,0,0,0,0,0,,0,NA,,NA,,,,,,FLOTTAISON OUBLI��E!,out,0
1470:,2011,bylo,11byloamgpjl02,amgp,jflamarre,22-Jun-11,173,73.15541,-79.92507,camp1_2kx4k,in,,14-Jul-11,195,incidental,,,-,-,-,-,-,-,-,-,-,-,0,,,4,4,22-Jun-11,4,7-Jul-11,188,9-Jul-11,190,15-Jun-11,166,19-Jun-11,,,float,inc,failed,abandoned,0,0,0,0,0,0,4,,4,NA,,NA,hummocky,B14,0,,,ABANDONN��,in,0
1499:,2011,bylo,11bylobasafb03,basa,fbilodeau,21-Jun-11,172,73.15149,-79.97725,camp1_2kx4k,in,,09-Jul-11,190,incidental,,,-,-,-,-,-,-,-,-,-,-,0,,,4,4,22-Jun-11,3,11-Jul-11,192,11-Jul-11,192,17-Jun-11,168,21-Jun-11,11-Jul-11,192,hatch,inc,hatched,,0,0,0,0,0,0,0,,0,NA,,NA,,,,,,NID VIDE AVEC PETITS FRAGMENTS D'��CAILLES ET UNE ��CAILLE CISAILL��ES PAR UN JEUNE AVEC MEMBRANE,in,0
1503:,2011,bylo,11bylobasajb02,basa,jbety,08-Jul-11,189,72.83354,-79.64134,camp 2,,,,NA,incidental,,,-,-,-,-,-,-,-,-,-,-,0,,,4,4,,,NA,NA,,,,,,,,,inc,undetermined,,0,0,0,0,0,0,0,,0,NA,,NA,,,,,,EN ��CLOSION=FEN��TRES,out,0
1520:,2011,bylo,11bylopesajo01,pesa,jotis,06-Jul-11,187,73.16264,-79.93579,camp1_2kx4k,in,,,NA,incidental,,,-,-,-,-,-,-,-,-,-,-,0,,,4,4,6-Jul-11,,10-Jul-11,191,12-Jul-11,193,,,,,,float,inc,unknown,,0,0,0,0,0,0,0,,0,NA,,NA,lcp>0.5m,M5,40,,,"PEUT-ETRE PR��DAT��, MAIS ��UF AVAIENT COMMENC��S A ��CLORENT",in,0
1629:,2013,bylo,13byloamgpjb01,amgp,jbety,27-Jun-13,178,72.89016,-79.96943,camp 2,,,14-Jul-13,195,incidental,,,-,-,-,-,-,-,-,-,-,-,0,,,4,4,27-Jun-13,9,27-Jun-13,178,30-Jun-13,181,16-Jun-13,167,19-Jun-13,,,float,inc,failed,predation,0,0,0,0,0,4,0,,0,NA,,NA,,,,,,r��ginald=male!,out,0
1648:,2013,bylo,13byloamgpjl01,amgp,jflamarre,27-Jun-13,178,72.89508,-79.96201,camp 2,,,14-Jul-13,195,incidental,,,-,-,-,-,-,-,-,-,-,-,0,,,4,4,27-Jun-13,9,30-Jun-13,181,7-Jul-13,188,16-Jun-13,167,19-Jun-13,,,float,inc,failed,predation,0,0,0,0,0,4,0,,0,NA,,NA,,,,,,Nid pas trouv�� lors de la caract��risation de la v��g��tation,out,0
1888:,2014,bylo,14bylocrpljt01,crpl,jtherien,13-Jul-14,194,73.1303,-80.11757,camp1_2kx4k,out,,,NA,incidental,,,-,-,-,-,-,-,-,-,-,-,0,,,2,2,,0,13-Jul-14,194,27-Jul-14,208,,,,,,none,unknown,unknown,,0,0,0,0,0,0,0,,0,NA,,NA,,,,,,egg bits found but without membrans could suggest predation���,out,0
1920:3.06,2012,cakr,12cakrdunl204,dunl,edastous,06-Jun-12,158,67.11209,-163.48256,sepe,in,,24-Jun-12,176,searcher,,,-,-,-,-,-,-,-,-,-,-,0,,,4,4,6-Jun-12,3,6-Jun-12,158,11-Jun-12,163,2-Jun-12,154,5-Jun-12,,,float,inc,failed,predation,0,0,0,,0,4,0,0,0,0,0,NA,,wst,30,,,Only one bird seen at this nest. We know one bird is banded but we don���t know his combination. We didn���t see the mate before de depredation ,in,0
5050:7.03,2009,barr,09barrdunl504,dunl,cgovernali,09-Jun-09,160,71.2671,-156.52546,brw3,in,l9,01-Jul-09,182,single,,,-,-,-,-,-,-,-,-,-,-,0,,,4,3,,,14-Jun-09,165,14-Jun-09,165,7-Jun-09,158,10-Jun-09,,,lay,lay,failed,observer,0,4,0,0,0,0,0,,0,0,0,0,,,,,,All eggs fused together after being smashed�perhaps stepped on by human??; Eggs most likely stepped on by humans; ; ,in,0
5108:,2006,bylo,06bylorephc201,reph,mgrahamsauv��,25-Jun-06,176,72.8906,-79.88954,camp 2,,,14-Jul-06,195,incidental,,,-,-,-,-,-,-,-,-,-,-,0,,,4,3,,,2-Jul-06,183,7-Jul-06,188,23-Jun-06,174,26-Jun-06,,,lay,lay,failed,predation,0,0,0,0,0,4,0,,0,NA,,NA,,,,,,,out,0
5123:,2011,bylo,11bylowrsajo01,wrsa,jotis,15-Jun-11,166,73.15971,-79.91361,camp1_2kx4k,in,,06-Jul-11,187,incidental,,,-,-,-,-,-,-,-,-,-,-,0,,,4,3,,,16-Jun-11,167,20-Jun-11,171,13-Jun-11,164,16-Jun-11,,,lay,lay,failed,unknown,0,0,0,0,0,4,0,,0,NA,,NA,frost boil,U8,0,,,PR��DAT��,out,0
5129:,2012,bylo,12bylobasasc03,basa,scoulombe,22-Jun-12,174,73.15774,-79.95564,camp1_2kx4k,in,,13-Jul-12,195,incidental,,,-,-,-,-,-,-,-,-,-,-,0,,,4,3,29-Jun-12,7,4-Jul-12,186,9-Jul-12,191,20-Jun-12,172,23-Jun-12,,,float,inc,failed,predation,0,0,0,0,0,4,0,,0,NA,,NA,,,,,,RIGOUREUSEMENT: CC INCERTAIN; LAYING DATE �� 172 SEMBLE LE MEILLEUR COMPROMIS ENTRE L'ESTIMATION PAS FLOTAISON ET L'ESTIMATION PAR BACKWARD COUNTING,in,0
5157:,2013,cakr,13cakrwesa030,wesa,jkardiak,09-Jun-13,160,67.10956,-163.48288,sepe,in,,01-Jul-13,182,searcher,,,-,-,-,-,-,-,-,-,-,-,0,,,3,2,,,10-Jun-13,161,11-Jun-13,162,7-Jun-13,158,,,,lay,lay,failed,predation,0,0,0,0,unknown,3,0,,0,NA,,NA,non-patterned,dest,10,,,concealment post-depredation. Bird nested within 10 m of poop tent��� stood no chance with fox in area.,in,0
5291:,2013,bylo,13byloamgppr02,amgp,proyerboutin,19-Jun-13,170,72.90307,-79.87282,camp 2,,,16-Jul-13,197,incidental,,,-,-,-,-,-,-,-,-,-,-,0,,,4,2,,,19-Jun-13,170,23-Jun-13,174,18-Jun-13,169,21-Jun-13,,,lay,lay,failed,predation,0,0,0,0,0,4,0,,0,NA,,NA,,,,geolocator_adult,,"2 parent pr��sent, male avec geo",out,0
5533:1.38,2011,colv,11colvsesa163,sesa,bwilkinson,17-Jun-11,168,70.42884,-150.67238,south,in,,09-Jul-11,190,searcher,,,-,-,-,-,-,-,-,-,-,-,0,,,3,1,,,10-Jul-11,191,11-Jul-11,192,17-Jun-11,168,20-Jun-11,10-Jul-11,191,lay,lay,hatched,,0,0,0,0,,0,1,,1,NA,,NA,hcp<.5m,b2,50,,,didn���t check for embryo in the one egg left after hatch,in,0
6097:,2014,eaba,14eababbe02,bbpl,,28-Jun-14,179,63.969,-81.67456,b,,,18-Jul-14,199,,,,-,-,-,-,-,-,-,-,-,-,0,,,4,4,28-Jun-14,5,16-Jul-14,197,16-Jul-14,197,18-Jun-14,169,,16-Jul-14,199,float,inc,hatched,,,0,0,,,0,,,0,NA,,NA,,,,,,"The day that we did the nest survey for BBE02, we saw the last chick in the nest. Parents were still in trhe area. All chickes hatched. (Don�t have date for final hatched chick though.",out,0
6443:,2005,bylo,05bylowrsabl01,wrsa,blalibert��,18-Jun-05,169,73.15775,-79.98205,camp1_2kx4k,in,,01-Jul-05,182,SYSTEMATIC SEARCH,,195149401,-,-,-,-,-,"r,lg",-,m,"wf,dg",-,1,,195149401,4,4,23-Jun-05,13,NA,NA,9-Jul-05,190,7-Jun-05,158,11-Jun-05,1-Jul-05,182,float,inc,hatched,,0,0,0,0,0,0,0,,0,NA,,NA,,,,,,,in,0
6444:,2005,bylo,05bylowrsabl03,wrsa,blalibert��,20-Jun-05,171,73.16901,-80.0028,north river,out,,03-Jul-05,184,systematic search,,195149402,-,-,-,-,-,"y,r",-,m,"wf,dg",-,1,,195149402,4,4,20-Jun-05,8,24-Jun-05,175,1-Jul-05,182,10-Jun-05,161,13-Jun-05,,,float,inc,failed,predation,0,0,0,0,0,4,0,,0,NA,,NA,,,,,,,out,0
6445:,2005,bylo,05bylowrsabl04,wrsa,blalibert��,20-Jun-05,171,73.16838,-80.00104,north river,out,,09-Jul-05,190,systematic search,,195149403,-,-,-,-,-,"y,y",-,m,"wf,dg",-,1,,195149403,4,4,20-Jun-05,2,24-Jun-05,175,30-Jun-05,181,16-Jun-05,167,19-Jun-05,,,float,inc,failed,predation,0,0,0,0,0,4,0,,0,NA,,NA,,,,,,,out,0
6449:,2005,bylo,05bylowrsabl07,wrsa,blalibert��,26-Jun-05,177,73.14569,-80.03134,camp1_2kx4k,out,,02-Jul-05,183,systematic search,,195149407,-,-,-,-,-,"o,db",-,m,"wf,dg",-,1,,195149407,4,4,26-Jun-05,15,29-Jun-05,180,3-Jul-05,184,9-Jun-05,160,12-Jun-05,,,float,inc,failed,predation,0,0,0,0,0,4,0,,0,NA,,NA,,,,,,,out,0
6450:,2005,bylo,05bylowrsamg01,wrsa,mgrahamsauv��,25-Jun-05,176,73.15887,-79.9612,camp1_2kx4k,in,,07-Jul-05,188,incidental,,195149408,-,-,-,-,-,"o,lg",-,m,"wf,dg",-,1,,195149408,4,4,1-Jul-05,15,5-Jul-05,186,9-Jul-05,190,13-Jun-05,164,17-Jun-05,,,float,inc,unknown,,0,0,0,0,0,0,0,,0,NA,,NA,,,,,,,in,0
6451:,2005,bylo,05bylowrsamn01,wrsa,mgrahamsauv��,27-Jun-05,178,73.16268,-79.94869,camp1_2kx4k,in,,06-Jul-05,187,incidental,,195149409,-,-,-,-,-,"o,dg",-,m,"wf,dg",-,1,,195149409,4,4,1-Jul-05,16,2-Jul-05,183,5-Jul-05,186,12-Jun-05,163,16-Jun-05,,,float,inc,unknown,,0,0,0,0,0,0,0,,0,NA,,NA,,,,,,,in,0
6452:,2005,bylo,05bylowrsamg178,wrsa,mgrahamsauv��,27-Jun-05,178,73.15062,-79.98135,camp1_2kx4k,in,,11-Jul-05,192,incidental,,195149410,-,-,-,-,-,"r,r",-,m,"wf,dg",-,1,,195149410,4,4,6-Jul-05,16,6-Jul-05,187,9-Jul-05,190,17-Jun-05,168,21-Jun-05,,,float,inc,unknown,,0,0,0,0,0,0,0,,0,NA,,NA,,,,,,,in,0
6453:,2005,bylo,05bylowrsamn02,wrsa,mgrahamsauv��,27-Jun-05,178,73.14923,-79.96953,camp1_2kx4k,in,,12-Jul-05,193,incidental,,195149411,-,-,-,-,-,"r,y",-,m,"wf,dg",-,1,,195149411,4,4,2-Jul-05,13,NA,NA,,,16-Jun-05,167,20-Jun-05,,,float,inc,undetermined,,0,0,0,0,0,0,0,,0,NA,,NA,,,,,,"eggs starred on 191, not checked afterwards",in,0
6456:,2005,bylo,05bylowrsabl09,wrsa,blalibert��,05-Jul-05,186,73.16387,-79.94289,camp1_2kx4k,in,,09-Jul-05,190,systematic search,,195149414,-,-,-,-,-,"r,dg",-,m,"wf,dg",-,1,,195149414,4,4,9-Jul-05,&gt;80%,11-Jul-05,192,11-Jul-05,192,17-Jun-05,168,21-Jun-05,11-Jul-05,192,hatch,inc,hatched,,0,0,0,0,0,0,0,,0,NA,,NA,,,,,,"starred and tapping on 190, 4 chicks around nest area with banded adult on 192",out,0
6457:,2005,bylo,05bylowrsabl10,wrsa,blalibert��,05-Jul-05,186,73.17669,-79.86963,camp1_2kx4k,out,,,NA,systematic search,,195149415,-,-,-,-,-,"y,o",-,m,"wf,dg",-,1,,195149415,4,4,5-Jul-05,&gt;80%,10-Jul-05,191,11-Jul-05,192,17-Jun-05,168,20-Jun-05,10-Jul-05,191,hatch,inc,hatched,,0,0,0,0,0,0,0,,0,NA,,NA,,,,,,"eggs starred on 190, nest empty on 192, banded adult with chicks around 192",out,0
6458:,2005,bylo,05bylowrsamg007,wrsa,mgrahamsauv��,30-Jun-05,181,73.14957,-79.97548,camp1_2kx4k,in,,12-Jul-05,193,incidental,,195149416,-,-,-,-,-,"y,db",-,m,"wf,dg",-,1,,195149416,4,4,6-Jul-05,15,6-Jul-05,187,9-Jul-05,190,19-Jun-05,170,22-Jun-05,,,float,inc,failed,predation,0,0,0,0,0,4,0,,0,NA,,NA,,,,,,,in,0
6463:,2006,bylo,06bylowrsaagm01,wrsa,agu��rettemontminy,24-Jun-06,175,73.14699,-79.97087,camp1_2kx4k,in,,11-Jul-06,192,incidental,,195149421,-,-,-,-,-,"db,o",-,m,"wf,dg",-,1,,195149421,4,4,27-Jun-06,7,28-Jun-06,179,2-Jul-06,183,18-Jun-06,169,21-Jun-06,,,float,inc,failed,predation,0,0,0,0,0,4,0,,0,NA,,NA,,,,,,,in,0
6628:,2010,bylo,10byloplarec01,bbpl,echalifour,01-Jul-10,182,72.8644,-79.96859,tombatudlik,,,19-Jul-10,200,incidental,,200346802,-,-,-,-,-,m,-,-,-,-,1,,200346802,4,4,1-Jul-10,7,18-Jul-10,199,18-Jul-10,199,21-Jun-10,172,25-Jun-10,,,float,inc,unknown,,0,0,0,0,0,0,0,,0,NA,,NA,,,,,,199-��TOILES SUR 2 OEUFS,out,0
7146:,2008,bylo,08bylowrsagen01,wrsa,gouelletcauchon,30-Jun-08,182,73.16454,-79.91686,camp1_2kx4k,in,,06-Jul-08,188,incidental,,805164502,-,-,-,-,-,-,"db,db",m,"wf,dg",-,1,,805164502,4,4,1-Jul-08,16,1-Jul-08,183,5-Jul-08,187,12-Jun-08,164,16-Jun-08,,,float,inc,failed,abandoned,0,0,0,0,0,0,4,,4,NA,,NA,,,80,,,"abandoned, conceal was indicated 8��� guessed it coded for 80%",out,0
7860:,2012,bylo,12byloamgpmt06,amgp,mtrudel,05-Jul-12,187,72.89371,-79.97793,tombatudlik,,,26-Jul-12,208,incidental,160307904,,m,"wf,dg",geo,"dg,o",-,-,-,-,-,-,1,160307904,,4,4,5-Jul-12,5,5-Jul-12,187,7-Jul-12,189,28-Jun-12,180,1-Jul-12,,,float,inc,failed,predation,0,0,0,0,0,4,0,,0,NA,,NA,,,,geolocator_adult,,"R125 ET R500 TIR��E DE CARAC TRANSECT AU POINT TR083F, geo changed",out,0
7861:,2011,bylo,11byloamgpjb04,amgp,jbety,17-Jun-11,168,72.89155,-79.93819,tombatudlik,,,14-Jul-11,195,incidental,160307905,,m,"wf,dg",geo,"y,r",-,-,-,-,-,-,1,160307905,,4,3,1-Jul-11,12,1-Jul-11,182,14-Jul-11,195,16-Jun-11,167,19-Jun-11,,,lay,lay,failed,unknown,0,0,0,0,0,4,0,,0,NA,,NA,,,,geolocator_adult,,��TAIT AMGPPY01,out,0
7888:,2012,bylo,12byloamgpmc01,amgp,mcfrenette,08-Jul-12,190,73.14106,-80.02582,camp1_2kx4k,out,,25-Jul-12,207,incidental,160307947,,geo,"wf,dg",m,"r,dg",-,-,-,-,-,-,1,160307947,,4,4,8-Jul-12,9,8-Jul-12,190,10-Jul-12,192,27-Jun-12,179,30-Jun-12,,,float,inc,failed,predation,0,0,0,0,0,4,0,,0,NA,,NA,,,,geolocator_adult,,G��OLOC (RESIGHTING),out,0
7889:,2011,bylo,11byloamgpjo04,amgp,jotis,21-Jun-11,172,73.14354,-80.02256,camp1_2kx4k,in,,18-Jul-11,199,incidental,160307947,,geo,"wf,dg",m,"r,dg",-,-,-,-,-,-,1,160307947,,4,3,,,15-Jul-11,196,18-Jul-11,199,20-Jun-11,171,23-Jun-11,,,lay,lay,unknown,,0,0,0,0,0,0,0,,0,NA,,NA,veg dune,B2,0,geolocator_adult,,1 ��UF CASS�� LORS DU TRAPPAGE,out,0
7890:,2011,bylo,11byloamgpjo05,amgp,jotis,21-Jun-11,172,73.14562,-80.02816,camp1_2kx4k,out,,18-Jul-11,199,incidental,160307948,,geo,"wf,dg",m,"y,o",-,-,-,-,-,-,1,160307948,,4,3,,,7-Jul-11,188,10-Jul-11,191,19-Jun-11,170,23-Jun-11,,,lay,lay,failed,predation,0,0,0,0,0,4,0,,0,NA,,NA,hummocky,B14,0,geolocator_adult,,PR��DAT�� (RE.RD),out,0
7891:,2012,bylo,12byloamgpjl03,amgp,jflamarre,23-Jun-12,175,73.1543,-79.94709,camp1_2kx4k,in,,16-Jul-12,198,incidental,160307949,,geo,-,m,"db,r",-,-,-,-,-,-,1,160307949,,4,4,23-Jun-12,3,23-Jun-12,175,24-Jun-12,176,18-Jun-12,170,21-Jun-12,,,float,inc,failed,predation,0,0,0,0,0,4,0,,0,NA,,NA,,,,geolocator_adult,,OISEAU BAGU�� G��OLOC (RESIGHTING),in,0
7892:,2011,bylo,11byloamgpjo06,amgp,jotis,05-Jul-11,186,73.16776,-79.89841,camp1_2kx4k,out,,,NA,incidental,160307950,,"db,lg",-,m,"wf,dg",-,-,-,-,-,-,1,160307950,,4,4,5-Jul-11,,10-Jul-11,191,12-Jul-11,193,,,,,,float,inc,failed,predation,0,0,0,0,0,4,0,,0,NA,,NA,hummocky,B3,0,,,PR��DAT��,out,0
8870:,2005,bylo,05bylobasabl01,basa,blalibert��,13-Jun-05,164,73.15665,-79.97123,camp1_2kx4k,in,,05-Jul-05,186,systematic search,224125001,,"r,r",-,m,"wf,dg",-,-,-,-,-,-,1,224125001,,4,4,18-Jun-05,4,18-Jun-05,169,23-Jun-05,174,12-Jun-05,163,15-Jun-05,,,float,inc,failed,predation,0,0,0,0,0,4,0,,0,NA,,NA,,,,,,,in,0
8872:,2005,bylo,05bylobasabl03,basa,blalibert��,18-Jun-05,169,73.15635,-80.01438,camp1_2kx4k,in,,05-Jul-05,186,systematic search,224125008,,"y,db",-,m,"wf,dg",-,-,-,-,-,-,1,224125008,,4,4,18-Jun-05,4,4-Jul-05,185,4-Jul-05,185,10-Jun-05,161,14-Jun-05,4-Jul-05,185,hatch,inc,hatched,,0,0,0,0,0,0,0,,0,NA,,NA,,,,,,"1 chick in nest, 1 chick in area on 185",in,0
8874:,2005,bylo,05bylobasamg02,basa,mgrahamsauv��,22-Jun-05,173,73.14922,-79.97513,camp1_2kx4k,in,,01-Jul-05,182,incidental,224125009,,"y,dg",-,m,"wf,dg",-,-,-,-,-,-,1,224125009,,4,4,27-Jun-05,17,2-Jul-05,183,6-Jul-05,187,7-Jun-05,158,11-Jun-05,,,float,inc,unknown,,0,0,0,0,0,0,0,,0,NA,,NA,,,,,,,in,0
8875:,2006,bylo,06bylobasaagm03,basa,agu��rettemontminy,19-Jun-06,170,73.15566,-79.97574,camp1_2kx4k,in,,11-Jul-06,192,incidental,224125009,,"y,dg",-,m,"wf,dg",-,-,-,-,-,-,1,224125009,,4,3,,,23-Jun-06,174,26-Jun-06,177,18-Jun-06,169,21-Jun-06,,,float,inc,failed,predation,0,0,0,0,0,4,0,,0,NA,,NA,,,,,,second nest for same bird,in,1
8877:,2005,bylo,05bylobasamg04,basa,mgrahamsauv��,27-Jun-05,178,73.14843,-79.97861,camp1_2kx4k,in,,03-Jul-05,184,incidental,224125011,,"r,lg",-,m,"wf,dg",-,-,-,-,-,-,1,224125011,,4,4,27-Jun-05,15,2-Jul-05,183,6-Jul-05,187,9-Jun-05,160,13-Jun-05,,,float,inc,unknown,,0,0,0,0,0,0,0,,0,NA,,NA,,,,,,,in,0
8878:,2005,bylo,05bylobasamn01,basa,mgrahamsauv��,27-Jun-05,178,73.146,-79.96274,camp1_2kx4k,in,,,NA,incidental,224125012,,"y,r",-,m,"wf,dg",-,-,-,-,-,-,1,224125012,,4,4,2-Jul-05,&gt;80%,2-Jul-05,183,6-Jul-05,187,,,,,,,inc,unknown,,0,0,0,0,0,0,0,,0,NA,,NA,,,,,,,in,0
8879:,2005,bylo,05bylobasabl08,basa,blalibert��,03-Jul-05,184,73.13899,-80.0403,camp1_2kx4k,out,,05-Jul-05,186,systematic search,224125013,,"y,y",-,m,"wf,dg",-,-,-,-,-,-,1,224125013,,4,4,,,5-Jul-05,186,6-Jul-05,187,11-Jun-05,162,15-Jun-05,5-Jul-05,186,hatch,inc,hatched,,0,0,0,0,0,0,0,,0,NA,,NA,,,,,,,out,0
8882:,2006,bylo,06bylobasaagm01,basa,agu��rettemontminy,15-Jun-06,166,73.15695,-79.98936,camp1_2kx4k,in,,02-Jul-06,183,incidental,224125020,,"o,dg",-,m,"wf,dg",-,-,-,-,-,-,1,224125020,,4,4,15-Jun-06,4,20-Jun-06,171,26-Jun-06,177,9-Jun-06,160,12-Jun-06,,,float,inc,failed,predation,0,0,0,0,0,4,0,,0,NA,,NA,,,,,,,in,0
11233:,2011,bylo,11byloamgpjo02,amgp,jotis,20-Jun-11,171,73.16514,-79.92277,camp1_2kx4k,in,,13-Jul-11,194,incidental,160307901,160307903,m,"wf,dg",geo,"o,db",-,"lg,y",-,m,"wf,dg",-,2,160307901,160307903,4,4,20-Jun-11,3,12-Jul-11,193,12-Jul-11,193,12-Jun-11,163,16-Jun-11,11-Jul-11,192,hatch,inc,hatched,,0,0,0,0,0,0,0,,0,NA,,NA,frost boil,B3,0,geolocator_adult,,1 ��UF CASS�� LORS DU TRAPPAGE ET ENLEV��,out,0
11239:,2012,bylo,12byloamgper03,amgp,ereed,12-Jul-12,194,72.90274,-79.94324,tombatudlik,,,,NA,incidental,160307940,160307944,m,"wf,dg",-,-,-,geo,"wf,dg",m,"r,o",-,2,160307940,160307944,4,4,12-Jul-12,,14-Jul-12,196,15-Jul-12,197,,,,,,float,inc,unknown,,0,0,0,0,0,0,0,,0,NA,,NA,,,,geolocator_adult,,"HERMA &amp; PHRODITE: PREMIER INDIVIDU CAPTUR�� BAGU�� COMME UNE FEMELLE, POSSIBLEMENT UN M��LE FI.LEMENT PARCE QUE L&apos;AUTRE INDIVIDU A AUSSI L&apos;AIR D&apos;UNE FEMELLE (ET EST PLUS PESANT)",out,0
11588:3.22,2012,cakr,12cakrwesa116,wesa,mvanderheyden,11-Jun-12,163,67.10913,-163.48199,sepe,in,,26-Jun-12,178,searcher,225189211,197100714,"gfe,o",-,"dg,y,o",m,ek,"gfe,o",-,m,"dg,lb,y",k0,2,225189211,197100714,3,3,11-Jun-12,5,27-Jun-12,179,27-Jun-12,179,5-Jun-12,157,7-Jun-12,27-Jun-12,179,hatch,inc,hatched,,0,1,0,0,0,0,0,1,1,0,2,NA,,dest,30,,,"1 egg broken by observer and finally retreived by the bird (We don���t know if they were an embryo development, and another egg abandon by the bird (left with one chick)",in,0
11671:,2013,cakr,13cakrsesa039,sesa,jkardiak,10-Jun-13,161,67.11324,-163.47762,nepe,in,,02-Jul-14,183,searcher,197100867,197100868,geo,-,m,-,-,geo,-,m,-,-,2,197100867,197100868,4,2,,,17-Jun-13,168,2-Jul-13,183,7-Jun-13,158,10-Jun-13,,,hatch,lay,unknown,,2,0,0,3,unknown,1,0,,0,NA,,NA,non-patterned,dest,80,geolocator_adult,,"2 eggs disappeared between 17 jun and 24 jun (likely depredation). One more gone between 24 jun and 29 jun (hatch? Possibly��� small 1 mm egg fragments found in nest). One egg failed to hatch (hole and dead chick seen in egg), but parents continued to incubate until it disappeared.",in,0
11672:,2013,cakr,13cakrsesa099,sesa,jkardiak,17-Jun-13,168,67.11441,-163.48059,nepe,in,,24-Jun-13,175,rope,197100758,197100869,-,-,m,"y,dg,lb",-,geo,-,m,-,-,2,197100758,197100869,3,3,,,29-Jun-13,180,25-Jun-13,176,3-Jun-13,154,5-Jun-13,,,hatch,inc,unknown,,1,0,0,3,3,0,0,,0,NA,,NA,non-patterned,bles w/ moss,30,geolocator_adult,,unknown fate but suspect hatch. Original hatch date incorrectly calculated based upon stage found being lay��� not true! This was found in incubate and not floated.,in,0
11674:,2013,cakr,13cakrwesa042,wesa,flin,03-Jun-13,154,67.11118,-163.47868,sepe,in,,25-Jun-13,176,searcher,198103374,197100874,m,"r,y,o",-,"gfe,o",1cp,m,"r,y,r",-,"gfe,o",78,2,198103374,197100874,4,2,,,27-Jun-13,178,26-Jun-13,177,2-Jun-13,153,,26-Jun-13,177,lay,lay,hatched,,2,0,0,1,4,0,0,,0,NA,,NA,non-patterned,dest,40,,,"26 jun-3 chicks in the nest, didn���t see the 4th egg or chick; 27 Jun-while rope dragging we confirmed r,y,r parent with 3 banded chicks",in,0
12118:,2005,bylo,05bylobasamg01,basa,mgrahamsauv��,20-Jun-05,171,73.14647,-79.97477,camp1_2kx4k,in,,05-Jul-05,186,incidental,224125006,224125010,"y,o",-,m,"wf,dg",-,"o,y",-,m,"wf,dg",-,2,224125006,224125010,4,4,25-Jun-05,11,NA,NA,8-Jul-05,189,11-Jun-05,162,15-Jun-05,5-Jul-05,186,float,inc,hatched,,0,0,0,0,0,0,0,,0,NA,,NA,,,,,,,in,0
12119:,2005,bylo,05bylobasabl02,basa,blalibert��,17-Jun-05,168,73.15585,-79.98234,camp1_2kx4k,in,,07-Jul-05,188,systematic search,224125004,224125014,"r,db",-,m,"wf,dg",-,"lg,r",-,m,"wf,dg",-,2,224125004,224125014,4,4,17-Jun-05,1,6-Jul-05,187,7-Jul-05,188,13-Jun-05,164,17-Jun-05,,,float,inc,unknown,,0,0,0,0,0,0,0,,0,NA,,NA,,,,,,"eggs starred on 187, cup empty on 188",in,0
12131:,2008,bylo,08bylobasamav01,basa,mavaliquette,28-Jun-08,180,73.15356,-79.98344,camp1_2kx4k,in,,,NA,incidental,224125077,224125078,-,"dg,r",m,"wf,dg",-,-,"dg,y",m,"wf,dg",-,2,224125077,224125078,4,4,28-Jun-08,,1-Jul-08,183,2-Jul-08,184,,,,,,,inc,failed,predation,0,0,0,0,0,4,0,,0,NA,,NA,,,50,,,"starred on 183, predated 184, conceal was indicated 5��� guessed it coded for 50%",in,0
12132:,2008,bylo,08bylobasamad01,basa,mdoiron,20-Jun-08,172,73.15047,-79.96031,camp1_2kx4k,in,,28-Jun-08,180,incidental,224125073,224125079,-,"r,o",m,"wf,dg",-,-,"dg,o",m,"wf,dg",-,2,224125073,224125079,4,4,21-Jun-08,14,2-Jul-08,184,2-Jul-08,184,8-Jun-08,160,12-Jun-08,2-Jul-08,184,hatch,inc,hatched,,0,0,0,0,0,0,0,,0,NA,,NA,,,10,,,"2 chicks, 2 eggs in nest on 184, conceal was indicated 1��� guessed it coded for 10%",in,0
12133:,2008,bylo,08bylobasamc01,basa,mcmartin,30-Jun-08,182,73.15271,-79.94172,camp1_2kx4k,in,,,NA,incidental,224125052,224125080,m,"wf,dg","r,dg",-,-,-,"dg,dg",m,"wf,dg",-,2,224125052,224125080,4,4,,,3-Jul-08,185,3-Jul-08,185,10-Jun-08,162,13-Jun-08,3-Jul-08,185,hatch,inc,hatched,,0,0,0,0,0,0,0,,0,NA,,NA,,,10,,,"3 chicks out of nest on 186, ukn1 was banded in 2007, conceal was indicated 1��� guessed it coded for 10%",in,0
12995:,2013,chur,13TL01,sepl,,13-Jun-13,164,58.70394527,-81.8429727,,,,,,,96143339,234164237,,,,,,,,,,,2,96143339,234164237,4,4,,,1-Jul-13,182,4-Jul-13,185,,,,4-Jul-13,185,,inc,hatched,,,,,,,,,,,,,,,,,,,ONE EGG NEVER HATCHED � UNFERTILIZED?,,
@mbjones mbjones added the check label Feb 8, 2023
@mbjones
Copy link
Member Author

mbjones commented Feb 8, 2023

See related report for the ADC dataset above in RT here: https://support.nceas.ucsb.edu/rt/Ticket/Display.html?id=25790

@jeanetteclark
Copy link
Collaborator

This all sounds good. I think there are a few separate checks here

  • check text characters valid
  • check text file valid
    + I already have this implemented for delimited text files, so could just add xml and json to the existing function
  • check binary file valid

we may also consider breaking up the valid (there might be a better word for that, parsable?) checks into specific file types. not sure if lumping or splitting will be more maintainable. Broadly, I think that the most common data file types we see that we should check are:

  • csv
  • netcdf
  • geojson
  • geotiff
  • shp? maybe a bit of a minefield here
  • others??

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants