The data set I chose to review is the St. Louis monthly seasonal temperatures since 1874. It does not have a credited author, but the web address is a .gov meaning the author is a government agency. If that is to be true, they are also the source of the data as they most likely researched and recorded the data themselves. I think the data has been compiled for recordkeeping and to observe for any possible changes or irregularities. Climate change is an obvious motivation to see if indeed temperatures have been rising because of climate change. The data does not elaborate on how it has been used. I would assume it has been used by scientists and researchers to support the reality mankind is increasingly warming the planet. The format is entirely data columns and tables. The data is structured very straightforwardly, conveying it is entirely measurements and mathematics. It may be structured that way to make it easier for other researchers, but anyone who needs the data, to pull from it for the use of sources. In general, it is structured for easy viewing and access. The effect this structure has on how the data can be used is ease of access to show the public temperatures have been rising over time. This data set has no description from the creator on how it was measured and calculated. This choice may have more negative consequences than anticipated. If the data does not have details on its measurement and calculation, then it may be open for scrutiny. The argument for untrustworthiness is opened if the data does not provide the calculations and simply wants any viewer to trust the calculations were correct and precise. The benefit of the government being the creator of this data is the absence of profit. In other words, this data is not available for other individuals to profit from, but to simply exist to inform the public. The government may also want this data to be public to inspire action against climate change. I would use this data for that purpose exactly.
Tag: Data review
For my data set, I examined the “Monthly and Seasonal Temperatures, St. Louis (since 1874).” Although there is no creator explicitly credited to this data set, it comes from a .gov website, so it is reasonable to infer that it is coming from the National Weather Service. The National Weather Service is a government department responsible for providing weather forecasts and warnings for the United States. No sources of data are stated either, but based on assumption, various tools were probably used to measure and collect temperature and other data. It was more than likely created to find what the average temperatures in St. Louis are by season, which makes me wonder if this data is also being used to examine topics such as climate change in the St. Louis region. The data is formatted in a table.
The data is structured in a table with the seasons and their corresponding months at the top and the years on the side. I think the data set is very inclusive of a lot of information because it includes the temperatures month by month and year by year, so there are a lot of different ways to use this information. This data could help answer questions about what the average temperature might be on a given day, how the temperature changes throughout the year, and at what rate, or even if there is climate change happening in the region.
Although the actual data seems to provide a decent amount of information, there is not a lot of information about where the data is coming from or who created it. There are definitely gaps in knowing how the data was cleaned, what it is supposed to be used for, and how the people creating it may have different perspectives and biases that impact how they collect and display data. I personally would use this data to predict temperatures for a given month and to explore climate change and how it is impacting the region.
Data set profile:
- Who is credited as the creator and/or contributors of this data set? Who are they?
- The Project Coordinators are listed as Rebecca Maloney, National Tombstone Project Coordinator, and Debra Crosby, Tombstone Photo Project Coordinator. Though there isn’t a single person listed as the creator, the contributors are meant to be volunteers, and to donate the data to the USGenWeb Project Archives of their area. The specific tombstone project I looked at was the Upper Sect 101 in Alton Cemetery in Alton, Illinois, which was led by someone named Sue Williams.
- What are the sources of their data?
- The web site lists that the source of the data should be the graveyard/cemetery itself, and should be organized according to the layout or plot of the grave. The site explains that cleaning may need to be done in order to accurately plot. Plotting the graveyard or cemetery might need to be done before transcribing the graves, since not every site will have a layout to use.
- Why did they create or compile it?
- There is a stress on compiling data for preservation purposes, as well as for access, but also as a “tribute to our ancestors.”
- How has it been used?
- It seems as if it is primarily used for genealogical purposes, as well as just for data preservation.
- What format is the data set in?
- It is a raw text file that has tombstones labeled by sect, with all included text (name, birth/death dates, inscriptions) and in some of the files descriptions of the stone or marker.
Data set evaluation:
- Take a look at the data itself. How have they structured it? What fields have they chosen? What effect might that have on how it can be used?
- The data is really just a string of data, seemingly dumped into a text file just going down the line, with the first “column” of data having a number to denote the order in which the graves were plotted, and the second “column” containing information about what the text of the inscription says, and maybe a description of the stone or marker. Since the data only has the two columns, it is a little hard to parse through, and difficult to read since it is a very narrow and long “document.”
- Read the creators’ description of the data set. Have they described the choices they made in cleaning the data, and if so, how? What effect might those choices have on the data?
- There are rules for submitting data files on the Illinois county portal (three links are provided, but they all go to the same page). There are multiple types of data submission forms, depending on the way in which the data is being used, and what is being recorded.
- Consider the creators’ identities and goals in creating the data set. How might those things have shaped the data, either intentionally or inadvertently?
- I think that it is heavily focused on preservation for posterity, which is interesting – I would love to know more about the inception of the site, as it started in 1997, and where the person was located. Culturally speaking, I know that there are a few different communities wherein ancestor and family mapping is really popular (The Church of Latter Day Saints/Mormons chief among them) however this is a project that I’m not super familiar with. I know that there are a few influencers on TikTok whose primary content is restoration of stones in graveyards/cemeteries, and in doing so they often will provide information about the person, and their life and death, should they have it. I think that because the project focuses on data consolidation, unintentionally it removes the data from the people it is intended to preserve, as it doesn’t include anything about their lives.
- What would you use this data for?
- As I mentioned, there are people who volunteer and help to restore and preserve the physical graves, so this information would be helpful long term for keeping record of sites that have deteriorated past reading. I think that this is a helpful project for contextualizing death, as a society that has a hard time reconciling death, as a death avoidant culture.
The Illinois land cover dataset from the early 1800s, credited to the Illinois Natural History Survey (INHS), uses historical land survey records like the Public Land Survey System (PLSS) to recreate the pre-settlement environment of Illinois. This data aids ecological restoration, historical comparison, and understanding landscape changes due to agriculture and urbanization. It is used by scientists, land managers, and educators for conservation and environmental planning. Formatted as GIS-compatible shapefiles or raster datasets, it maps and analyzes spatial patterns of historical land cover. The dataset includes geographic units such as township-range sections, dominant vegetation types, estimated extent, and surveyor notes. Although useful for spatial analysis, it may oversimplify local variations in land cover. Data cleaning involved resolving inconsistent terminology and correcting biases, but assumptions made during vegetation classification and boundary definitions introduced uncertainty. The creators’ focus on native vegetation enhances ecological accuracy while potentially excluding other perspectives. This dataset is valuable for environmental research but requires careful application to avoid misleading conclusions.
Data Set Profile:
For my Data Set Review I decided to cover the average weekly hours of all employees in the STL, Metro area. Trading Economics is the only group credited with providing the data set. The primary source for this data is the United States Federal Reserve, it has been updated as recently as March of 2025. On Trading Economic’s about us page they mention wanting to provide consistent accurate economic data to people all around the world, this specific data set is just one of thousands they have created. It has been used for statistical analysis but specifically to find patterns or correlations to hours work per week based on timeline. Finally, the data is formatted in two different forms, for an individual year the average hours per week is shown as a bar graph, with a different data amount for each month. Then for a year the data is presented in a line graph.


Data Set Evaluation:
The Data set has been structured with a simple time frame to hours worked ratio. This allows for simple analyzation of the data presented, you can gather patterns easier and see a visual understanding of the increase/decrease of hours based on a per month time frame or per year. The creators do not specify their choices in gathering or cleaning up the data, rather presenting examples of why they may have created it, by showing the drastic difference between any time frame. Trading Economics advertises themselves as a website to find accurate data about all areas of the economy in any area in the world. They mention their constant fact checking of material, and use of only first hand sources, nothing third party. They also mention the site has been viewed and trusted by over 2 Billion people across the world. This makes me feel safe in trusting their sources and data that they present. I think it would be interesting to use this data to visualize the differences between work hours based on time of year or differences year to year. There is an interesting dip in hours in October of 2022 specifically in this data that interests me as to why that could be. There is also a clear dip in the entirety of the 2nd-4th quarters of 2020 for obvious reasons. But seeing it presented in this visual sense makes it much easier to see the sheer difference.
The data set I reviewed was The Tombstone Transcription Project. The tombstone Transcription project is a lasting tribute to our ancestors. They will transcribe tombstone inscriptions and have that work archived for the future generations and made easily accessible to all. The tombstone project coordinators are Debra Crosby and Rebecca Maloney. In this project there was a poem titled The Recording Of A Cemetery by Thelma Greene Reagan. The poem highlights the importance of remembering the dead and preserving history. This ensures the that the future generation can remember and honor those who have passed.
Data set profile:
- Who is credited as the creator and/or contributors of this data set? Who are they? They are mentioned above.
- What are the sources of their data? records of tombstone inscriptions, genealogical societies, gravestones/headstones.
- Why did they create or compile it? They created it because many of the gravestones at cemeteries were becoming difficult to read and have already fainted making it hard to identify. This was due to time and weather.
- How has it been used? It has been used to identify ancestors and preserve cemetery data for the future generations. It also can be used in genealogical societies.
- What format is the data set in? Although I’m not sure what format this data set would fall under, I would think a database or spreadsheet.
Data set evaluation:
- Take a look at the data itself. How have they structured it? What fields have they chosen? What effect might that have on how it can be used? The chosen field is historical ancestors grave sites. It can be used to analyze and identify as well as genealogical purposes.
- Read the creators’ description of the data set. Have they described the choices they made in cleaning the data, and if so, how? What effect might those choices have on the data? N/A
- Consider the creators’ identities and goals in creating the data set. How might those things have shaped the data, either intentionally or inadvertently? I honestly don’t believe that their identities contributes this data set, however their goals yes. Thelma’s primary goal was to preserve gravestone inscriptions both intentionally and inadvertently.
- What would you use this data for? Preservation and historical purposes
Data set profile:
- The creator of this data set is University of Illinois Urbana-Champaign’s Prairie Research Institute. They are described as a “Multidisciplinary research institute charged with providing objective research, expertise, and data on the natural and cultural resources of Illinois. “
- It seems they don’t have a source for their work. (Or at least I can’t find it in the mound of metadata.)
- “The purpose of this map is to provide a georeferenced characterization of vegetation in the early stages of Euro-American settlement. One of the research uses for the surveys nationally is for presettlement vegetation. This data can be used to analyze presettlement vegetation patterns for the purpose of determining natural community potential, productivity indexes, and patterns of natural disturbance.” (A direct quote found under the “Identification_Information” tab in metadata.)
- The original use was to create a more complete map of Illinois using the data collected by multiple cartographers. Now it seems to be used to study patterns and inconsistencies along Illinois’ body.
- The data is in map form, but the index for specific definitions of abbreviations and words is in graph form.
Data set evaluation:
- There is two different types of data they have collected, cartography and land definitions. The cartography has been turned into an interactive map, where you can zoom in and see the topology, rivers, and biomes. The land definitions have been input into a table, where they gathered the land code value, definition, and land cover label from the original plat map. With choosing these specific fields, they may have lost some of the original data, or had to summarize a definition.
- There is no no direct reference to cleaning the data, they only mention how they scanned it and corrected the things that were wrong. Though, through scanning, some things could’ve been mistranslated considering these were captured from cartographer’s notebooks. However, they mention the revisions that were needed due to data being incorrect, so it gives the impression that whatever data could potentially be incorrect would be corrected.
- The original data was from the early 1800s. We don’t use the same language as we did in the past, so someone trying to create a data set using that would have a notable risk of misinterpretation. In todays language, we tend to cut out a lot of the fluff they would use in the past, so in that regard, descriptors and things of the like could’ve been lost.
- There is a huge table of abbreviations and types of land masses, which I find very interesting. I’ve heard most of the words, though there is a few that I have never heard or heard be used to describe land.
The data set on “Average Weekly Hours of All Employees: Total Private in St. Louis, MO-IL (MSA)” is curated by Trading Economics, a financial and economic data platform that compiles indicators from government sources. The primary source of this data is the U.S. Bureau of Labor Statistics (BLS) through its Current Employment Statistics (CES) program. The CES conducts surveys with businesses and government agencies to track employment, hours, and earnings trends.
The primary goal of this data set is to provide insights into labor market trends in the St. Louis metropolitan area. Policymakers, economists, and businesses use it to understand workforce dynamics, assess economic health, and inform decision-making. The data has been applied in economic reports, industry studies, and regional workforce analyses to evaluate economic growth and stability.
The data set is formatted as a time series, recording the average weekly hours worked by private-sector employees at regular intervals. This structure makes it easy to analyze trends over time, but the data set lacks details on industry-specific trends, demographic breakdowns, or distinctions between full-time and part-time employees.
The BLS applies rigorous data cleaning processes, including seasonal adjustments and validation methods. However, there are still potential sources of error, such as sampling limitations and revisions to initial data. The way Trading Economics presents the data could also introduce bias, as commercial platforms may highlight certain trends over others.
The data’s structure and intent influence how it can be used. Since it only tracks total private-sector employees, it may not capture sector-specific shifts or employment disparities within the workforce. A deeper analysis would require combining this data with other sources, such as wage trends or employment rates by industry.
This data set is useful for tracking labor trends, particularly for identifying economic expansions or contractions based on changes in work hours. However, its limitations require careful interpretation, ensuring that conclusions about the St. Louis labor market are contextualized within broader economic data.
Sometimes we think of data as something that just exists, that we discover or access. But data sets are shaped by the people who create them, who structure them, and who use them. In order to produce visualizations and analysis that are accurate, critical, and transparent about data sources and bias, we have to evaluate our data sets.
Choose a data set from the list below and answer the following questions in about 300-400 words total. The first four questions are about the sources of the data. You can answer these relatively briefly. If you can’t find the answers to any of them, note that in your review. The last questions are about how we can or should use it, challenges with the data, and any perceptible biases in it. These will take more reflection and should accordingly make up the bulk of your review. Submit the review as a blog post on the class website by Wednesday, March 19 at 1:30, with the tag “Data review.”
Data set profile:
- Who is credited as the creator and/or contributors of this data set? Who are they?
- What are the sources of their data?
- Why did they create or compile it?
- How has it been used?
- What format is the data set in?
Data set evaluation:
- Take a look at the data itself. How have they structured it? What fields have they chosen? What effect might that have on how it can be used?
- Read the creators’ description of the data set. Have they described the choices they made in cleaning the data, and if so, how? What effect might those choices have on the data?
- Consider the creators’ identities and goals in creating the data set. How might those things have shaped the data, either intentionally or inadvertently?
- What would you use this data for?
Data sets to choose from:
Average Weekly Hours of All Employees, STL Metro Statistical Area
Monthly and Seasonal Temperatures, St. Louis (since 1874)
Washington Park Cemetery North Reinterment Index
Tombstone Transcription Project (choose a local cemetery that’s been transcribed)
Illinois Landcover in the Early 1800s
If you have another data set you’d like to review instead, please check with me (the earlier, the better).