SoFunction
Updated on 2024-11-15

Solving the problem of extra rows using table joins

The goal of this paper is to match Table 1 papers and publications with Table 2 publications and their indicators.

表1 论文出版物信息(存在空值)

表2 出版物指标信息

Use pandas merge function to realize left outer join of table. Left outer join is the outer join of the left table, the left table remains unchanged, matching the rows of the right table (after merging the right table can exist null values).

paperPublicationIndicator = (paperPublication,publicationIndicator,
    how='left',left_on='Publications',right_on='Name',sort=False)

It was found that there were 13 more rows in the linked table than in Table 1 for paper publications. Exploration reveals that there are duplicate rows for publications such as Publication A-Indicator 1 and Publication A-Indicator 2 in Table 2 Publication Indicator Information. This occurs after matching:

publications  name (of a thing)  norm
publicationsA  publicationsA  norm1
publicationsA  publicationsA  norm2   # Redundant lines

The option is to pre-delete the duplicate publication information in Table 2, Publication Indicators.

publicationIndicator.drop_duplicates(subset=['Name'],
		keep='first',inplace=True)

That way the result is not a problem.

to this article on the use of the table to connect to the problem of excess lines to solve the problem of the article is introduced to this, more related to the table to connect to the problem of excess lines of content, please search for my previous articles or continue to browse the following related articles I hope that you will support me in the future more!