First of all, the team leader of the company gave me a task, take part of the contents of a txt file, store it into a table with the same structure of the existing table in hive. So I have three main processes, first, process the data into the same structure as the hive table data, and then modeled after the existing hive table structure and then create a new data table, and finally the local txt file uploaded to the hive in the new data table.
1: The structure of the existing data table and the structure in the hive table do not match at all, the following figure is the structure of the table in the original hive and the structure of the table in the txt given to me by the team leader:
As you can see, our original hive table has a total of 17 fields, while the team leader gave me a total of 9 fields in the table, the last of which is a json structure, and the order is not right, so we have to filter the corresponding field to the corresponding position, corresponding to the field does not write empty.
We should pay attention to a few places, the original data is divided according to tab, so we have to count the number of corresponding tab, so as to calculate the actual location of the data information, and then we in accordance with the original hive table in the order of the data, re-arrange the order of the data in our new table, the following to see the results:
Where line[0]=null, line[1]=102, and so on for everyone.
3: We import the local txt file into the hive table. First we have to create a new data table with the same structure as the original hive table, and then import our data into the table.
hive> creat table new_sft(x1 string,x2 string ,...,xn string) partitioned by (d string);
Once the table has been created, import the data into the new table.
hive> load data local inpath‘/home/opendev/' into table new_sft;
Finally, I'll show you my final results:
Above this python processing data, stored into the hive table is all I have shared with you, I hope to give you a reference, and I hope you support me more.