New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(packages/CREC-*/granules/*/summary
): missing fields in members
#149
Comments
Hi, thank you for your questions. GPO gets member authority information from the Congress.gov API. For this example, see https://api.congress.gov/v3/member/S001201?api_key=DEMO_KEY.
This member authority information is used to supplement information that GovInfo parses from the CREC text files. If member information is able to be parsed from the CREC text file and a corresponding entry is found in our member authority information, the GovInfo metadata and API response will include the fields you listed. Example:
If member information is able to be parsed from the CREC text file and a corresponding entry is not found in our member authority information, the GovInfo metadata and API response will include a subset of fields. Example:
We periodically reprocess entire GovInfo collections to enhanced parsed metadata, but the frequency is dependent upon the changes or enhancements that were recently deploy for a specific collection. We'll ensure our authority information is up to date for S001201, kick off reprocessing for this package, and let you know when the GovInfo metadata and API response are updated. |
Reprocessed CREC-2024-02-28. See https://api.govinfo.gov/packages/CREC-2024-02-28/granules/CREC-2024-02-28-pt1-PgH732/mods?api_key=DEMO_KEY. I also reprocessed CREC-2024-02-29. MODS looks good from CREC-2024-03-05 onward. |
Thank you |
Follow up question: I noticed a Speech: <html>
<head>
<title>Congressional Record, Volume 170 Issue 36 (Wednesday, February 28, 2024)</title>
</head>
<body><pre>
[Congressional Record Volume 170, Number 36 (Wednesday, February 28, 2024)]
[House]
[Page H732]
From the Congressional Record Online through the Government Publishing Office [<a href=\"https://www.gpo.gov\">www.gpo.gov</a>]
WELCOMING THE HONORABLE THOMAS R. SUOZZI TO THE HOUSE OF
REPRESENTATIVES
The SPEAKER. Without objection, the gentleman from New York (Mr.
Nadler) is recognized for 1 minute.
There was no objection.
Mr. NADLER. Mr. Speaker, as dean of the New York delegation, it is my
distinct honor to rise today to welcome my good friend Congressman Tom
Suozzi back to the people's House.
The people of New York's Third Congressional District have elected a
Representative with the experience, character, and commitment to solve
problems confronting everyday Americans and deliver for his
constituents.
Tom is also a great family man. He is a devoted father, husband, and
public servant who upholds the values instilled in him by his family.
He has devoted most of his adult life to public service: first as the
mayor of his hometown, Glen Cove, for 8 years; then as the county
executive of Nassau County for 8 years, before serving as a United
States Congressman for 6 years.
From working tirelessly to secure investments for the Northport VA
Medical Center in Long Island to helping secure billions in Federal
support for New York in pandemic relief and infrastructure funding,
Tom's outstanding record in Congress speaks for itself.
Tom loves New York, he loves his country, and his love for public
service runs deep. He is the kind of person we need serving in this
House at this moment, and it gives me great pleasure to reintroduce him
as our colleague, the gentleman from New York, Tom Suozzi.
Mr. Speaker, I now yield to Mr. Suozzi.
Mr. SUOZZI. Mr. Speaker, I thank Jerry and the New York delegation,
all my colleagues, and my friends and supporters who are here tonight.
Mr. Speaker, I never thought I would be back here, but the Lord works
in mysterious ways, and God made a way when there was no way.
I thank God for blessing me with this great responsibility, and I
thank God for my best friend and partner for 30 years, my wife, Helene.
She hates that.
Mr. Speaker, on the night of my election victory, I promised the
people of Long Island and Queens I would deliver a simple message to
this Chamber: Wake up. The people are sick and tired of the finger-
pointing and the petty partisan bickering. They want us to work
together.
They want you guys to work together, too. What are you doing? You are
supposed to be clapping for that.
Mr. Speaker, I know there are so many good people in this Chamber on
both sides of the aisle, but people are worried about the cost of
living; they are worried about the chaos at the border; they are
worried about Israel, Gaza, and Ukraine.
They look to Congress, and what do they see? The extremists get all
the attention. We are letting ourselves be bullied by our base. We
aren't getting anything done. We need less chaos and more common sense.
The last few months, I have talked with Democrats, Republicans, and
Independents, and they all ask the same thing: What about me? What are
you doing for me? Enough with the theater and the drama, enough with
the hyperbole and the histrionics, enough with the shutdowns and the
put-downs.
The people aren't paying us to make things worse. The people pay us
to be in the solutions business.
Mr. Speaker, you and I came to Congress together in 2017. I remember
when you founded the Honor and Civility Caucus. You said at the time it
was to restore collegiality and encourage productive dialogue. Sign me
up. Sign me up right away. Mr. Speaker, I know you believe in
collegiality and productive dialogue. We need more of that and less of
the hot air fanning the flames of anger that happens much too often in
our country these days.
Mr. Speaker, after my recent election you said something I must
gently take exception to. You said: Tom Suozzi ran like a Republican.
Now, I know you meant that as a compliment. Let me be clear. Mr.
Speaker, I am a true blue, dyed-in-the-wool Democrat; but more
important, like you, Mr. Speaker, and the men and women in this
Chamber, I am a true blue, dyed-in-the-wool American.
Like any patriot of the greatest country on Earth, I am willing to
compromise to try and solve problems like the chaos at the border. The
bipartisan Senate bill doesn't have everything I wanted. I believe that
Dreamers and TPS recipients should be granted a pathway to citizenship,
and millions of others should have a path to legalization, but I will
support a bipartisan compromise.
{time} 1915
To not do so will keep the border open, will endanger peace in
Israel, and will empower Vladimir Putin.
I know compromise is hard in this town, Mr. Speaker, but bring a
bipartisan compromise to the floor, and I guarantee it will pass.
All of the issues we face in this country are complicated, every
single one of them, and you can't solve anything in an environment of
fear and anger. We can't fix them with a tweet or a press conference or
even a speech.
I know many of you in this Chamber. I know a whole lot of you. You
are inspired to do this work because of the command: Love thy neighbor.
Let's actually do that. Let's do the hard work and get back to the
solutions business.
Sadly, many of the people in America believe Democrats and
Republicans can't work together. They have told me, Tom, wake up. You
have to face the real world.
The real world is not something we must simply face. The real world
is something that we as free men and women actively create. We make the
real world.
I love this country. My father came here from Italy as a young boy,
was awarded the Distinguished Flying Cross during World War II, and
went to Harvard Law School on the GI Bill.
It is hard to imagine today, but he faced rampant discrimination as
an Italian immigrant, and no one would hire him, even though he went to
Harvard, so he started his own law firm.
At 28 years old, he ran for city court judge and became the youngest
judge in the history of New York State. What a country.
My father lived a great American success story like many of the
stories in this room, and I will do everything I can to honor my
father's legacy. More importantly, I will do everything I can to honor
this Nation's legacy.
We all know what politics has become. Let's think about what it could
be. While I may be the only one being sworn in today, what if we all
see this as a fresh start?
What if we all took this chance to break some of our bad habits? What
if today we remembered why we ran for office in the first place? Let's
get back into the solutions business.
God bless the men and women of this Chamber. God bless the important
work we do. God bless the United States of America.
____________________
</pre></body>
</html> |
In certain House granules, the time is recorded as part of the transcript, so this should mean that that portion of the speech occurred at 7:15 pm. |
Thanks. It seems a little random in this example. What typically prompts this? Is it ordered or left to the clerk's discretion? |
@llaplant I've collected a list of CDR granules from the 118th congress that have partial member data (missing
|
Good morning, I have reprocessed the affected packages and am doing some manual reviews and updates where needed - for instance, the https://www.govinfo.gov/content/pkg/CREC-2023-12-14/html/CREC-2023-12-14-pt1-PgH6947-3.htm granule appears to have a typo referring to Mr. Rogers of Washington, when it should be Mr. Rogers of Alabama (who is correctly recorded in the metadata). I have notified the appropriate folks to get that updated within the text, but I have moved the erroneous listing for the time being. In some cases, I needed to make manual metadata edits. For example: did not have enough information within the granule text to directly identify that Mr. Jackson of North Carolina had submitted the amendment text. I manually added the additional information. This seems to have also added some additional blank fields in the metadata that shouldn't be propagating to the public side. I am entering a backlog item to resolve this. in this instance, chamberIdCode, gpoId, authorityId, and houseRefId should not be present "members": [{
"chamberIdCode": "",
"gpoId": "",
"authorityId": "",
"role": "SPEAKING",
"chamber": "H",
"congress": "118",
"bioGuideId": "J000308",
"memberName": "Jackson, Jeff",
"houseRefId": "",
"state": "NC",
"party": "D"
}], Would it be helpful to have all forms of the member name incorporated in the member listing, like this: "members": [{
"chamberIdCode": "",
"gpoId": "",
"authorityId": "",
"role": "SPEAKING",
"chamber": "H",
"congress": "118",
"bioGuideId": "J000308",
"memberName": "Jackson, Jeff",
"name": ["parsed": "Mr. Jackson",
"authority-lnf": "Jackson, Jeff",
"authority-fnf": "Jeff Jackson"],
"houseRefId": "",
"state": "NC",
"party": "D"
}
], We wouldn't remove the memberName value to avoid a breaking change. |
I'm typically on the side of more data is better. It could be helpful. It would be useful to help identify missing bioguide ids in the future. How do you guys parse the authority/names from the doc? Do you use something like regex on the PDF text or do you have access to some private metadata? |
We parse from the text version (available at the html endpoint for the granule) using established patterns. Based on the parsed name, our parser performs a lookup against authority files for Members. These authority files are primarily derived from the Congressional Research Service and the Congress.gov API. Here's a sample top-level request: https://api.congress.gov/v3/member&api_key=DEMO_KEY At a high level, when the parser recognizes "Mr. Jackson of North Carolina" in a 2023 Congressional Record granule, it can determine that the correct BioGuideID and authority information based on the 118th Congress, date of the granule being within the terms of Member service, and in this case, state to disambiguate. The parser then adds the standard authority information for fnf and lnf name, party, etc from our authority files. |
I noticed some missing members data in a congressional daily record granule summary for
CREC-2024-02-28-pt1-PgH732
.Response:
Notice one of the
members
is missing some common keys such asbioGuideId
andmemberName
:When I look at the granule's mods.xml I notice that the broken member data is supposed to be Mr. SUOZZI:
mods request:
mods response:
Notice the
congMember
blocks where we can assume Mr.SUOZZI is the broken member:Given the title of the granule is "WELCOMING THE HONORABLE THOMAS R. SUOZZI TO THE HOUSE OF REPRESENTATIVES" this seems to be an instance of new member of the house. According to congress.gov this member was sworn in on the same day so perhaps the
bioGuideId
wasn't available (we now know it isS001201
).Few questions based on these findings:
members
to always have the following props?role
chamber
congress
bioGuideId
`memberName
state
party
memberName
,state
,party
, rely on thebioGuideId
? i.e. ifbioGuideId
is not available, should we expect these other fields to not be included?bioGuideId
may not have been available, so would the reprocessing job backfill the missing data?The text was updated successfully, but these errors were encountered: