Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(packages/CREC-*/granules/*/summary): missing fields in members #149

Open
ryparker opened this issue Apr 3, 2024 · 10 comments
Open

(packages/CREC-*/granules/*/summary): missing fields in members #149

ryparker opened this issue Apr 3, 2024 · 10 comments
Assignees

Comments

@ryparker
Copy link

ryparker commented Apr 3, 2024

I noticed some missing members data in a congressional daily record granule summary for CREC-2024-02-28-pt1-PgH732.

curl --location 'https://api.govinfo.gov/packages/CREC-2024-02-28/granules/CREC-2024-02-28-pt1-PgH732/summary' \
--header 'X-Api-Key: <API_KEY>'

Response:

{
    "dateIssued": "2024-02-28",
    "packageId": "CREC-2024-02-28",
    "packageLink": "https://api.govinfo.gov/packages/CREC-2024-02-28/summary",
    "collectionCode": "CREC",
    "detailsLink": "https://www.govinfo.gov/app/details/CREC-2024-02-28/CREC-2024-02-28-pt1-PgH732",
    "title": "WELCOMING THE HONORABLE THOMAS R. SUOZZI TO THE HOUSE OF REPRESENTATIVES",
    "collectionName": "Congressional Record",
    "granuleClass": "HOUSE",
    "granuleId": "CREC-2024-02-28-pt1-PgH732",
    "download": {
        "premisLink": "https://api.govinfo.gov/packages/CREC-2024-02-28/premis",
        "txtLink": "https://api.govinfo.gov/packages/CREC-2024-02-28/granules/CREC-2024-02-28-pt1-PgH732/htm",
        "zipLink": "https://api.govinfo.gov/packages/CREC-2024-02-28/zip",
        "modsLink": "https://api.govinfo.gov/packages/CREC-2024-02-28/granules/CREC-2024-02-28-pt1-PgH732/mods",
        "pdfLink": "https://api.govinfo.gov/packages/CREC-2024-02-28/granules/CREC-2024-02-28-pt1-PgH732/pdf"
    },
    "bookNumber": "1",
    "pagePrefix": "H",
    "relatedLink": "https://api.govinfo.gov/related/CREC-2024-02-28-pt1-PgH732",
    "subGranuleClass": "ALLOTHER",
    "members": [
        {
            "role": "SPEAKING",
            "chamber": "H",
            "congress": "118",
            "bioGuideId": "N000002",
            "memberName": "Nadler, Jerrold",
            "state": "NY",
            "party": "D"
        },
        {
            "role": "SPEAKING",
            "chamber": "H",
            "congress": "118"
        }
    ],
    "time": {
        "from": "18:58:00",
        "to": "19:15:00"
    },
    "docClass": "CREC",
    "lastModified": "2024-03-01T12:04:40Z",
    "category": "Proceedings of Congress and General Congressional Publications",
    "granulesLink": "https://api.govinfo.gov/packages/CREC-2024-02-28/granules?offsetMark=*&pageSize=100",
    "granuleDate": "2024-02-28"
}

Notice one of the members is missing some common keys such as bioGuideId and memberName:

 "members": [
        {
            "role": "SPEAKING",
            "chamber": "H",
            "congress": "118",
            "bioGuideId": "N000002",
            "memberName": "Nadler, Jerrold",
            "state": "NY",
            "party": "D"
        },
        {
            "role": "SPEAKING",
            "chamber": "H",
            "congress": "118"
        }
    ],

When I look at the granule's mods.xml I notice that the broken member data is supposed to be Mr. SUOZZI:

mods request:

curl --location 'https://api.govinfo.gov/packages/CREC-2024-02-28/granules/CREC-2024-02-28-pt1-PgH732/mods' \
--header 'X-Api-Key: <API_KEY>'

mods response:

<?xml version="1.0" encoding="UTF-8"?>
<mods xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.loc.gov/mods/v3" version="3.3" xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-3.xsd" ID="id-CREC-2024-02-28-pt1-PgH732">
    <name type="corporate">
        <namePart>United States Government Publishing Office</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">publisher</roleTerm>
            <roleTerm authority="marcrelator" type="code">pbl</roleTerm>
        </role>
        <role>
            <roleTerm authority="marcrelator" type="text">distributor</roleTerm>
            <roleTerm authority="marcrelator" type="code">dst</roleTerm>
        </role>
    </name>
    <name type="corporate">
        <namePart>United States</namePart>
        <namePart>Congress</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
            <roleTerm authority="marcrelator" type="code">aut</roleTerm>
        </role>
        <description>Government Organization</description>
    </name>
    <typeOfResource>text</typeOfResource>
    <genre authority="marcgt">government publication</genre>
    <language>
        <languageTerm type="code" authority="iso639-2b">eng</languageTerm>
    </language>
    <extension>
        <collectionCode>CREC</collectionCode>
        <category>Proceedings of Congress and General Congressional Publications</category>
        <waisDatabaseName>2024_record</waisDatabaseName>
        <branch>legislative</branch>
        <dateIngested>2024-03-01</dateIngested>
    </extension>
    <titleInfo>
        <title>WELCOMING THE HONORABLE THOMAS R. SUOZZI TO THE HOUSE OF REPRESENTATIVES</title>
        <partName>House</partName>
    </titleInfo>
    <relatedItem type="otherFormat" xlink:href="https://www.govinfo.gov/content/pkg/CREC-2024-02-28/html/CREC-2024-02-28-pt1-PgH732.htm">
        <identifier type="FDsys Unique ID">D09002ee1c54405ab</identifier>
    </relatedItem>
    <relatedItem type="otherFormat" xlink:href="https://www.govinfo.gov/content/pkg/CREC-2024-02-28/pdf/CREC-2024-02-28-pt1-PgH732.pdf">
        <identifier type="FDsys Unique ID">D09002ee1c5440662</identifier>
    </relatedItem>
    <identifier type="congressional record citation">170 Cong. Rec. H732</identifier>
    <identifier type="uri">https://www.govinfo.gov/app/details/CREC-2024-02-28/CREC-2024-02-28-pt1-PgH732</identifier>
    <identifier type="former granule identifier">cr28fe24-80</identifier>
    <location>
        <url displayLabel="Content Detail" access="object in context">https://www.govinfo.gov/app/details/CREC-2024-02-28/CREC-2024-02-28-pt1-PgH732</url>
        <url access="raw object" displayLabel="PDF rendition">https://www.govinfo.gov/content/pkg/CREC-2024-02-28/pdf/CREC-2024-02-28-pt1-PgH732.pdf</url>
        <url access="raw object" displayLabel="HTML rendition">https://www.govinfo.gov/content/pkg/CREC-2024-02-28/html/CREC-2024-02-28-pt1-PgH732.htm</url>
    </location>
    <physicalDescription>
        <extent>1 p.</extent>
    </physicalDescription>
    <part type="article">
        <extent unit="pages">
            <start>H732</start>
            <end>H732</end>
        </extent>
    </part>
    <name type="personal">
        <namePart>Jerrold Nadler</namePart>
        <affiliation>United States House of Representatives</affiliation>
        <role>
            <roleTerm type="text">speaking</roleTerm>
        </role>
        <description>United States Congress Member</description>
    </name>
    <name type="personal">
        <namePart>Mr. SUOZZI</namePart>
        <affiliation>United States House of Representatives</affiliation>
        <role>
            <roleTerm type="text">speaking</roleTerm>
        </role>
        <description>United States Congress Member</description>
    </name>
    <identifier type="preferred citation">170 Cong. Rec. H732</identifier>
    <extension>
        <searchTitle>WELCOMING THE HONORABLE THOMAS R. SUOZZI TO THE HOUSE OF REPRESENTATIVES; Congressional Record Vol. 170, No. 36</searchTitle>
        <granuleClass>HOUSE</granuleClass>
        <accessId>CREC-2024-02-28-pt1-PgH732</accessId>
        <subGranuleClass>ALLOTHER</subGranuleClass>
        <pagePrefix>H</pagePrefix>
        <bookNumber>1</bookNumber>
        <chamber>HOUSE</chamber>
        <granuleDate>2024-02-28</granuleDate>
        <time from="18:58:00" to="19:15:00"/>
        <congMember bioGuideId="N000002" chamber="H" congress="118" party="D" role="SPEAKING" state="NY">
            <name type="parsed">Mr. NADLER</name>
            <name type="authority-fnf">Jerrold Nadler</name>
            <name type="authority-lnf">Nadler, Jerrold</name>
        </congMember>
        <congMember chamber="H" congress="118" role="SPEAKING">
            <name type="parsed">Mr. SUOZZI</name>
        </congMember>
    </extension>
    <relatedItem type="host" ID="P0b002ee1c53604a1">
        <titleInfo>
            <title>Congressional Record</title>
            <partNumber>Vol. 170, no. 36</partNumber>
        </titleInfo>
        <location>
            <url displayLabel="Content Detail" access="object in context">https://www.govinfo.gov/app/details/CREC-2024-02-28</url>
            <url displayLabel="PDF rendition" access="raw object">https://www.govinfo.gov/content/pkg/CREC-2024-02-28/pdf/CREC-2024-02-28.pdf</url>
        </location>
        <part type="issue">
            <extent unit="pages">
                <start>H699</start>
                <end>H740</end>
            </extent>
            <extent unit="pages">
                <start>S1019</start>
                <end>S1050</end>
            </extent>
            <extent unit="pages">
                <start>E183</start>
                <end>E191</end>
            </extent>
            <extent unit="pages">
                <start>D183</start>
                <end>D188</end>
            </extent>
        </part>
        <originInfo>
            <publisher>U.S. Government Publishing Office</publisher>
            <dateIssued encoding="w3cdtf">2024-02-28</dateIssued>
            <issuance>continuing</issuance>
            <frequency authority="marcfrequency">daily</frequency>
        </originInfo>
        <physicalDescription>
            <note type="source content type">deposited</note>
            <digitalOrigin>born digital</digitalOrigin>
            <extent>89 p.</extent>
        </physicalDescription>
        <classification authority="sudocs">X 1.1/A:</classification>
        <classification authority="sudocs">X/A.</classification>
        <identifier type="uri">https://www.govinfo.gov/app/details/CREC-2024-02-28</identifier>
        <identifier type="local">P0b002ee1c53604a1</identifier>
        <identifier type="stock number">752-002-00000-2</identifier>
        <identifier type="ILS system id">000568013</identifier>
        <identifier type="former package identifier">cr28fe24</identifier>
        <recordInfo>
            <recordContentSource authority="marcorg">DGPO</recordContentSource>
            <recordCreationDate encoding="w3cdtf">2024-03-01</recordCreationDate>
            <recordChangeDate encoding="w3cdtf">2024-03-01</recordChangeDate>
            <recordIdentifier source="DGPO">CREC-2024-02-28</recordIdentifier>
            <recordOrigin>machine generated</recordOrigin>
            <languageOfCataloging>
                <languageTerm type="code" authority="iso639-2b">eng</languageTerm>
            </languageOfCataloging>
        </recordInfo>
        <accessCondition type="GPO scope determination">fdlp</accessCondition>
        <extension>
            <docClass>CREC</docClass>
            <accessId>CREC-2024-02-28</accessId>
            <volume>170</volume>
            <issue>36</issue>
            <congress>118</congress>
            <session>2</session>
            <bookCount>1</bookCount>
        </extension>
    </relatedItem>
</mods>

Notice the congMember blocks where we can assume Mr.SUOZZI is the broken member:

      <congMember bioGuideId="N000002" chamber="H" congress="118" party="D" role="SPEAKING" state="NY">
          <name type="parsed">Mr. NADLER</name>
          <name type="authority-fnf">Jerrold Nadler</name>
          <name type="authority-lnf">Nadler, Jerrold</name>
      </congMember>
      <congMember chamber="H" congress="118" role="SPEAKING">   
          <name type="parsed">Mr. SUOZZI</name>
      </congMember>

Given the title of the granule is "WELCOMING THE HONORABLE THOMAS R. SUOZZI TO THE HOUSE OF REPRESENTATIVES" this seems to be an instance of new member of the house. According to congress.gov this member was sworn in on the same day so perhaps the bioGuideId wasn't available (we now know it is S001201).


Few questions based on these findings:

  • Are there docs available that describe which props/values are guaranteed and optional across the JSON and XML APIs (related to CREC)?
  • If no docs are available, should we expect members to always have the following props?
    • role
    • chamber
    • congress
    • bioGuideId `
    • memberName
    • state
    • party
  • Do fields like memberName, state, party, rely on the bioGuideId? i.e. if bioGuideId is not available, should we expect these other fields to not be included?
  • Does this data automatically get reprocessed?
  • If there is a reprocess job, when is it scheduled/how often?
  • If there is a reprocess job, does that provide missing data once it is available? e.g. since Mr. Suozzi was new on the day this CREC was published, the bioGuideId may not have been available, so would the reprocessing job backfill the missing data?
@llaplant
Copy link
Member

llaplant commented Apr 3, 2024

Hi, thank you for your questions.

GPO gets member authority information from the Congress.gov API.

For this example, see https://api.congress.gov/v3/member/S001201?api_key=DEMO_KEY.

  <item>
  <memberType> Representative </memberType>
  <congress> 118 </congress>
  <chamber> House of Representatives </chamber>
  <stateCode> NY </stateCode>
  <stateName> New York </stateName>
  <startYear> 2024 </startYear>
  <district> 3 </district>
  </item>

This member authority information is used to supplement information that GovInfo parses from the CREC text files.

If member information is able to be parsed from the CREC text file and a corresponding entry is found in our member authority information, the GovInfo metadata and API response will include the fields you listed. Example:

  <congMember bioGuideId="N000002" chamber="H" congress="118" party="D" role="SPEAKING" state="NY">
      <name type="parsed">Mr. NADLER</name>
      <name type="authority-fnf">Jerrold Nadler</name>
      <name type="authority-lnf">Nadler, Jerrold</name>
  </congMember>

If member information is able to be parsed from the CREC text file and a corresponding entry is not found in our member authority information, the GovInfo metadata and API response will include a subset of fields. Example:

  <congMember chamber="H" congress="118" role="SPEAKING">   
      <name type="parsed">Mr. SUOZZI</name>

We periodically reprocess entire GovInfo collections to enhanced parsed metadata, but the frequency is dependent upon the changes or enhancements that were recently deploy for a specific collection.

We'll ensure our authority information is up to date for S001201, kick off reprocessing for this package, and let you know when the GovInfo metadata and API response are updated.

@llaplant llaplant self-assigned this Apr 3, 2024
@llaplant
Copy link
Member

llaplant commented Apr 3, 2024

Reprocessed CREC-2024-02-28. See https://api.govinfo.gov/packages/CREC-2024-02-28/granules/CREC-2024-02-28-pt1-PgH732/mods?api_key=DEMO_KEY. I also reprocessed CREC-2024-02-29. MODS looks good from CREC-2024-03-05 onward.

@ryparker
Copy link
Author

ryparker commented Apr 4, 2024

Thank you

@ryparker ryparker closed this as completed Apr 4, 2024
@ryparker
Copy link
Author

ryparker commented Apr 6, 2024

Follow up question: I noticed a {time} in the middle of the speech, do you know what this is?

Speech:

<html>
<head>
<title>Congressional Record, Volume 170 Issue 36 (Wednesday, February 28, 2024)</title>
</head>
<body><pre>
[Congressional Record Volume 170, Number 36 (Wednesday, February 28, 2024)]
[House]
[Page H732]
From the Congressional Record Online through the Government Publishing Office [<a href=\"https://www.gpo.gov\">www.gpo.gov</a>]
       WELCOMING THE HONORABLE THOMAS R. SUOZZI TO THE HOUSE OF 
                            REPRESENTATIVES
  The SPEAKER. Without objection, the gentleman from New York (Mr. 
Nadler) is recognized for 1 minute.
  There was no objection.
  Mr. NADLER. Mr. Speaker, as dean of the New York delegation, it is my 
distinct honor to rise today to welcome my good friend Congressman Tom 
Suozzi back to the people's House.
  The people of New York's Third Congressional District have elected a 
Representative with the experience, character, and commitment to solve 
problems confronting everyday Americans and deliver for his 
constituents.
  Tom is also a great family man. He is a devoted father, husband, and 
public servant who upholds the values instilled in him by his family. 
He has devoted most of his adult life to public service: first as the 
mayor of his hometown, Glen Cove, for 8 years; then as the county 
executive of Nassau County for 8 years, before serving as a United 
States Congressman for 6 years.
  From working tirelessly to secure investments for the Northport VA 
Medical Center in Long Island to helping secure billions in Federal 
support for New York in pandemic relief and infrastructure funding, 
Tom's outstanding record in Congress speaks for itself.
  Tom loves New York, he loves his country, and his love for public 
service runs deep. He is the kind of person we need serving in this 
House at this moment, and it gives me great pleasure to reintroduce him 
as our colleague, the gentleman from New York, Tom Suozzi.
  Mr. Speaker, I now yield to Mr. Suozzi.
  Mr. SUOZZI. Mr. Speaker, I thank Jerry and the New York delegation, 
all my colleagues, and my friends and supporters who are here tonight.
  Mr. Speaker, I never thought I would be back here, but the Lord works 
in mysterious ways, and God made a way when there was no way.
  I thank God for blessing me with this great responsibility, and I 
thank God for my best friend and partner for 30 years, my wife, Helene. 
She hates that.
  Mr. Speaker, on the night of my election victory, I promised the 
people of Long Island and Queens I would deliver a simple message to 
this Chamber: Wake up. The people are sick and tired of the finger-
pointing and the petty partisan bickering. They want us to work 
together.
  They want you guys to work together, too. What are you doing? You are 
supposed to be clapping for that.
  Mr. Speaker, I know there are so many good people in this Chamber on 
both sides of the aisle, but people are worried about the cost of 
living; they are worried about the chaos at the border; they are 
worried about Israel, Gaza, and Ukraine.
  They look to Congress, and what do they see? The extremists get all 
the attention. We are letting ourselves be bullied by our base. We 
aren't getting anything done. We need less chaos and more common sense.
  The last few months, I have talked with Democrats, Republicans, and 
Independents, and they all ask the same thing: What about me? What are 
you doing for me? Enough with the theater and the drama, enough with 
the hyperbole and the histrionics, enough with the shutdowns and the 
put-downs.
  The people aren't paying us to make things worse. The people pay us 
to be in the solutions business.
  Mr. Speaker, you and I came to Congress together in 2017. I remember 
when you founded the Honor and Civility Caucus. You said at the time it 
was to restore collegiality and encourage productive dialogue. Sign me 
up. Sign me up right away. Mr. Speaker, I know you believe in 
collegiality and productive dialogue. We need more of that and less of 
the hot air fanning the flames of anger that happens much too often in 
our country these days.
  Mr. Speaker, after my recent election you said something I must 
gently take exception to. You said: Tom Suozzi ran like a Republican. 
Now, I know you meant that as a compliment. Let me be clear. Mr. 
Speaker, I am a true blue, dyed-in-the-wool Democrat; but more 
important, like you, Mr. Speaker, and the men and women in this 
Chamber, I am a true blue, dyed-in-the-wool American.
  Like any patriot of the greatest country on Earth, I am willing to 
compromise to try and solve problems like the chaos at the border. The 
bipartisan Senate bill doesn't have everything I wanted. I believe that 
Dreamers and TPS recipients should be granted a pathway to citizenship, 
and millions of others should have a path to legalization, but I will 
support a bipartisan compromise.
                              {time}  1915
  To not do so will keep the border open, will endanger peace in 
Israel, and will empower Vladimir Putin.
  I know compromise is hard in this town, Mr. Speaker, but bring a 
bipartisan compromise to the floor, and I guarantee it will pass.
  All of the issues we face in this country are complicated, every 
single one of them, and you can't solve anything in an environment of 
fear and anger. We can't fix them with a tweet or a press conference or 
even a speech.
  I know many of you in this Chamber. I know a whole lot of you. You 
are inspired to do this work because of the command: Love thy neighbor. 
Let's actually do that. Let's do the hard work and get back to the 
solutions business.
  Sadly, many of the people in America believe Democrats and 
Republicans can't work together. They have told me, Tom, wake up. You 
have to face the real world.
  The real world is not something we must simply face. The real world 
is something that we as free men and women actively create. We make the 
real world.
  I love this country. My father came here from Italy as a young boy, 
was awarded the Distinguished Flying Cross during World War II, and 
went to Harvard Law School on the GI Bill.
  It is hard to imagine today, but he faced rampant discrimination as 
an Italian immigrant, and no one would hire him, even though he went to 
Harvard, so he started his own law firm.
  At 28 years old, he ran for city court judge and became the youngest 
judge in the history of New York State. What a country.
  My father lived a great American success story like many of the 
stories in this room, and I will do everything I can to honor my 
father's legacy. More importantly, I will do everything I can to honor 
this Nation's legacy.
  We all know what politics has become. Let's think about what it could 
be. While I may be the only one being sworn in today, what if we all 
see this as a fresh start?
  What if we all took this chance to break some of our bad habits? What 
if today we remembered why we ran for office in the first place? Let's 
get back into the solutions business.
  God bless the men and women of this Chamber. God bless the important 
work we do. God bless the United States of America.
                          ____________________
</pre></body>
</html>

@jonquandt
Copy link
Member

In certain House granules, the time is recorded as part of the transcript, so this should mean that that portion of the speech occurred at 7:15 pm.

@ryparker
Copy link
Author

ryparker commented Apr 6, 2024

Thanks. It seems a little random in this example. What typically prompts this? Is it ordered or left to the clerk's discretion?

@ryparker
Copy link
Author

ryparker commented Apr 6, 2024

@llaplant I've collected a list of CDR granules from the 118th congress that have partial member data (missing bioGuideId). Could you trigger a reprocessing on these?

  • CREC-2023-12-14-pt1-PgH6947-3
  • CREC-2023-06-14-pt1-PgH2894-4
  • CREC-2023-04-17-pt1-PgH1744-2
  • CREC-2023-02-27-pt1-PgS505-2
  • CREC-2023-02-08-pt1-PgH762-6
  • CREC-2023-02-02-pt1-PgH635
  • CREC-2023-01-25-pt1-PgH326-6
  • CREC-2023-01-03-pt1-PgE1367-2

@ryparker ryparker reopened this Apr 7, 2024
@jonquandt
Copy link
Member

Good morning,

I have reprocessed the affected packages and am doing some manual reviews and updates where needed - for instance, the https://www.govinfo.gov/content/pkg/CREC-2023-12-14/html/CREC-2023-12-14-pt1-PgH6947-3.htm granule appears to have a typo referring to Mr. Rogers of Washington, when it should be Mr. Rogers of Alabama (who is correctly recorded in the metadata). I have notified the appropriate folks to get that updated within the text, but I have moved the erroneous listing for the time being.

In some cases, I needed to make manual metadata edits. For example:
https://www.govinfo.gov/app/details/CREC-2023-01-25/CREC-2023-01-25-pt1-PgH326-6/summary
https://api.govinfo.gov/packages/CREC-2023-01-25/granules/CREC-2023-01-25-pt1-PgH326-6/summary?api_key=DEMO_KEY

did not have enough information within the granule text to directly identify that Mr. Jackson of North Carolina had submitted the amendment text. I manually added the additional information. This seems to have also added some additional blank fields in the metadata that shouldn't be propagating to the public side. I am entering a backlog item to resolve this.

in this instance, chamberIdCode, gpoId, authorityId, and houseRefId should not be present

    "members": [{
        "chamberIdCode": "",
        "gpoId": "",
        "authorityId": "",
        "role": "SPEAKING",
        "chamber": "H",
        "congress": "118",
        "bioGuideId": "J000308",
        "memberName": "Jackson, Jeff",
        "houseRefId": "",
        "state": "NC",
        "party": "D"
    }],

Would it be helpful to have all forms of the member name incorporated in the member listing, like this:

"members": [{
        "chamberIdCode": "",
        "gpoId": "",
        "authorityId": "",
        "role": "SPEAKING",
        "chamber": "H",
        "congress": "118",
        "bioGuideId": "J000308",
        "memberName": "Jackson, Jeff",
        "name": ["parsed": "Mr. Jackson",
            "authority-lnf": "Jackson, Jeff",
            "authority-fnf": "Jeff Jackson"],
        "houseRefId": "",
        "state": "NC",
        "party": "D"
    }
],

We wouldn't remove the memberName value to avoid a breaking change.

@ryparker
Copy link
Author

Would it be helpful to have all forms of the member name incorporated in the member listing, like this:

I'm typically on the side of more data is better. It could be helpful. It would be useful to help identify missing bioguide ids in the future.

How do you guys parse the authority/names from the doc? Do you use something like regex on the PDF text or do you have access to some private metadata?

@jonquandt
Copy link
Member

jonquandt commented Apr 10, 2024

We parse from the text version (available at the html endpoint for the granule) using established patterns. Based on the parsed name, our parser performs a lookup against authority files for Members. These authority files are primarily derived from the Congressional Research Service and the Congress.gov API. Here's a sample top-level request:

https://api.congress.gov/v3/member&api_key=DEMO_KEY

At a high level, when the parser recognizes "Mr. Jackson of North Carolina" in a 2023 Congressional Record granule, it can determine that the correct BioGuideID and authority information based on the 118th Congress, date of the granule being within the terms of Member service, and in this case, state to disambiguate. The parser then adds the standard authority information for fnf and lnf name, party, etc from our authority files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants