A travel route recommendation algorithm based on interest theme and distance matching

To solve the problem of low accuracy of traditional travel route recommendation algorithm, a travel route recommendation algorithm based on interest theme and distance matching is proposed in this paper. Firstly, the real historical travel footprints of users are obtained through analysis. Then, the user’s preferences of interest theme and distance matching are proposed based on the user’s stay in each scenic spot. Finally, the optimal travel route calculation method is designed under the given travel time limit, starting point, and end point. Experiments on the real data set of the Flickr social network showed that the proposed algorithm has a higher accuracy rate and recall rate, compared with the traditional algorithm that only considers the interest theme and the algorithm which only considers the distance matching.


Introduction
In recent years, the research of recommender system has developed rapidly.Various recommendation systems are also widely used in e-commerce, social networking sites, e-tourism, Internet advertising, and many other fields, and these recommendation systems show superior effects and prospects [1][2][3].With the rise of more and more online travel websites (such as Expedia, Travelzoo, tuniu), more and more online data can describe users' interests and preferences.This makes tourism product recommendation become one of the hotspots of recommendation system research [4,5].
At present, many mature recommendation algorithms have been widely used in traditional product recommendation, such as collaborative filtering algorithm [6], contentbased recommendation algorithm [7], and hybrid recommendation algorithm [8].However, a large number of existing studies show that tourism product recommendation is very different from traditional film product recommendation [9][10][11], and the differences are as follows.
First of all, users usually do not often or batch purchase tourism products, which leads to the sparse correlation matrix of "user products."Secondly, the description of tourism product information is diverse and complex.Small parameter changes will lead to completely different tourism products, such as scenic spots and schedule, hotel, and vehicle selection.However, this kind of tourism product with inherent relevance points to the common interests of users.Third, users often do not pay attention to tourism products for a long time, that is, they leave visit records on e-tourism websites, and often after browsing objectives and arrangements.Then, they began to browse tourism products, which led to a large number of cold start users in online travel data.Therefore, the traditional recommendation algorithm is difficult to apply to tourism recommendations.Generally speaking, the abovementioned recommendation technologies solve the problem of travel recommendation to a certain extent.However, this recommendation technology is only suitable for tourism data with relatively simple data structure, or relies on geographic information data, so it is difficult to fully capture users' real-time interest preferences.
The data used in this paper is the real web server log of tourism enterprises, which contains rich tourism product information and a large number of user behavior click records.The recommendation engine can be built to accurately capture users' interests based on their real-time click stream.Then, personalized tourism products are recommended to users according to their interests.In order to improve the accuracy of the recommendation system, a travel path recommendation algorithm based on interest and distance matching is proposed.The core idea is to calculate the user's preference topic and acceptable distance according to each user's travel history data and add them as weights into the recommendation model to get a new personalized travel path recommendation algorithm.
The rest of the paper is organized as follows.We review the related work in the first section.The second section gives the definition and algorithm of necessary concepts.The third section gives the experimental results.Finally, the fourth part draws a conclusion.

The related work
With the continuous development of social networks, users' social network information is increasing.How to effectively mine valuable information from social network information plays an irreplaceable role in the development of social network [12].In social networks, users can upload text information, location information, and time information and share this information with friends and nearby people.At present, more and more scholars recognize the importance of social network information and devote themselves to the research of social network information mining.
The idea of social network data mining is similar to GPS trajectory data mining.In GPS trajectory data mining, the main applications include association rules, abnormal behavior, travel mode, and GPS trajectory recommendation [13].The data acquisition time is strictly limited by equal time intervals, which is reflected in shahed [14].In social network trajectory data mining, applications mainly include location recommendation, path recommendation, and behavior preference recommendation [15].Data collection time is discrete and random, which is the main difference between social network trajectory data and GPS trajectory data.
At present, there are many processing methods of social network data mining, including clustering, classification, and other traditional technologies.Among them, the clustering method is used to discover group pattern mining methods in social networks, and it has a good effect in recommending user path and location.MapReduce [16] framework is widely used in the large-scale data processing.At present, the method of combining clustering algorithm with MapReduce framework for big data analysis and processing is gradually developed, such as DBSCAN clustering algorithm based on MapReduce, which has achieved good results.
The mining methods of group pattern mainly include group, group, escort, assembly, and platoon.The literature [17] introduces different mining methods of group patterns in detail.Swarm is a group pattern mining technology with weak time constraints.It only needs to satisfy the condition that the number of different tracks appearing at the same point is greater than the set threshold.Although team formation and escort have more time constraints than team formation, this strong constraint will also lead to the decline of accuracy.The row model is described in [18].The platoon model combines the advantages of the above group models and adapts to different applications by allowing control of continuous-time constraints.
Personalized recommendation methods mainly include content-based recommendation, collaborative filtering recommendation, association rule-based recommendation, utility-based recommendation, knowledge-based recommendation, and combination recommendation [19].At the same time, there are many recommendation strategies, and different recommendation strategies produce different recommendation results.However, in the personalized travel route recommendation based on group pattern, due to the lack of semantic information, the traditional group pattern mining leads to an incomplete personalized recommendation.

Basic definition
Given a directed weighted graph G = (E, V), V is the set of nodes, and E is the set of edges.As shown in Fig. 1, a node p ∈ V represents a POI.Each POI p has category attributes Cat p (such as church, museum, beach), longitude, and latitude, and the value on the node p represents the score of POI p .C represents a collection of categories of all POIs.In the attribute (c i , D i ) of the node p i , c i represents the category attribute of the POI, and D i represents the distance of the POI.Each directed edge (p i , p j ) represents a feasible route between two POIs, the number of edges is |E|, and the weight on the edge represents the travel time (in h) of the continuous access to the two POIs.
A travel route is a sequence consisting of multiple travel POIs, denoted as R = {p 1 , p 2 , …, p N }, where p i is the tourist location included in the route, and N is the number of locations.
The travel time between two POIs.The travel time required by the user from POI P x to POI P y can be defined as follows: where Dist(p x , p y ) represents the distance between p x and p y , which is calculated by the Haversine formula [20].Suppose the user walks to play and takes a speed of 6 km/ h.
The preference vector of a user u is expressed as IntP(u) = 〈Int(u, c 1 ), Int(u, c 2 ), …, Int(u, c i )〉, where Int(u, c i ) represents the degree of preference of the user u for the POI category c i .
Given a user u and the POI collection he/she has been to define his/her historical The travel footprint S u = (p 1 , p 2 , …, p n ) of the user u is known, and if t a p xþ1 −t d p x > τ, S u is divided into a number of individual travel sequences (that is, sub-sequences of S u ).In other words, if the time between consecutive accesses of two POIs is greater than the threshold τ, the travel footprint is divided into a number of different tourist subsequences.The time threshold value τ is selected in this document as 8h.
This paper gives a POI scoring method considering location distance and user preferences.The score for POI p i is expressed as score(p i ): Cheng EURASIP Journal on Advances in Signal Processing (2021) 2021:57 where c i is the category of POI p i , D(p i ) is the distance of POI p i , and α is the user adjustment parameter, which is used to adjust the proportion of user interest preference and POI distance in the route.

Tourist route recommendation framework
As shown in Fig. 2, the travel route recommendation in this paper is divided into the construction of the POI association graph and the learning of the user's interest preference, as well as the route recommendation.The construction of the POI association graph and the learning of the user's interest preference are performed offline, and the distance of the POI and the user's interest preference can be obtained by analyzing the photos taken by the user.The route recommendation is online, assuming that the user wants to go to the city with mPOIs, recorded as P = {p 1 , p 2 , …, p m }.According to the POI set P, time budget B, starting point POI p 1 , and ending point POI p n , the route with the highest score is recommended to users by using the proposed algorithm which combines user interest preferences and POI distance based on the orientation problem.

Construction of POI correlation graph
The construction of the POI association diagram takes place offline.In this paper, POI in the tourism sequence of all users is used as the node in the graph, which represents the tourism place, and the continuous access of users in the tourism sequence generates the edges in the graph.
The structure of the photo data shared by the user is (PhotoID, UserID, time, longitude, latitude, category).From this structure, the photo data contains the exact spacetime location information of the user.Based on the longitude and latitude of each photo, the Haversine formula [20] is used to calculate the distance between each photo shared by the user and each POI in the city visited.If the distance is less than 200m, the photograph is considered to be taken at POI, so as to obtain the list of users' POI S u = (p 1 , p 2 , …, p n ).
A time-based user interest preference is presented by using the user's historical travel footprint.When a user goes to a POI to play, he/she will stay at the POI for a certain period of time.The access time (i.e., the stay time) of each POI that each user has visited is calculated from the historical travel footprint of all users according to definition 4, so that the average time required for any user to access any one of the POIs can be calculated.In the travel route recommendation in this article, V ðpÞ is used to indicate the average access time of his/her POI p for any user.The average access time required for each POI p is as follows: where U represents all users, and n represents the number of users accessing p in U.
However, the average access time of the user at each POI does not truly reflect his/ her degree of interest preference for this type of POI.Therefore, a time-based user interest preference is proposed in this paper.The preference degree of user u to category attribute c of POI is calculated from the following equation.
where Cat p represents the category attribute of POIp,σðCat p The above equation determines the interest of user u in the category attribute c for a particular POI.Relative to the average access time of all users in the same POI, it is calculated based on the time spent by users in each POI with category attributec.In other words, a user may spend more time accessing the type of POI he or she is interested in, which in turn determines the user's level of interest in such POI.

Proposed algorithm
Orienteering problem (OP) is a directional problem, which is described as follows.In a directed weighted graph G(V,E), V is the set of all points on the graph, and E is the set of all edges on the graph.Each point has its score (score, which can be expressed as gain), and each edge has its weight (weight, which is the walking time between two points).The start and end points are specified.Select partial points from G.Then, plan a path through the selected points, the starting points, and ending points.At the same time, under the premise of not exceeding a certain time budget, the total weight score of the path is maximized.OP has been widely used in travel route recommendations.The route recommendation algorithm is proposed in consideration of POI distance and user interest based on the orientation problem.Based on the set P, the time budget B, the starting point POI p 1 , and the end point POI p n , the proposed algorithm recommends a route R = {p 1 , p 2 , …, p n } that satisfies the time budget B and has the highest score.The time budget is calculated by function Cost(p x , p y ), Therefore, the travel route recommendation model in this paper can satisfy the integer programming problem with multiple constraints, which is expressed as follows: where x i, j = 1 indicates the route from i to j, which goes through the edge (p i , p j ), otherwise x i, j = 0.The above equation satisfies the following constraints: Equation ( 5) is an objective function that maximizes the POI distance and user interest preferences in the recommended route.Eqs. ( 6) to (10) are the constraints of Eq. ( 5).Equation (6) ensures that the user's travel starting point is p 1 and the end point is p n .Equation (7) ensures that the user's travel route is coherent and that each POI in the route has only been visited once.Equation (8) guarantees that the time spent by the user on the entire trip is within budget B. Assuming that u x is the location of POIx in route R, Eq. ( 9) and Eq.(10) ensure that there are no sub-patrol routes in the integer programming problem proposed in this paper.

Experimental data and the data preprocess
This article uses the Flickr dataset of web and mobile data management (WAMDM), which contains two aspects.One is 319,110 photos of New York City, including photo ID (identity), photo location (latitude and longitude), photo time, photo tag information, and user ID.The other is the address book of 12,991 Flickr users.In this experiment, we use the leap one out cross-validation method [21] used in most recommender system research to verify the algorithm.circulates one data in the whole data set as the test set and the other as the training set and calculates various evaluation indexes according to the prediction results of each cycle.In order to facilitate the experiment, the data are processed as follows: (1) delete users who take less than 5 photos per poi and (2) delete users who have only visited one or two POIs.After data preprocessing, the final experimental data set consists of 55,451 images taken by users and 165,830 images of Geotag.

Evaluation index
The accuracy of recommendation is the most important index of the evaluation algorithm.In this paper, precision and recall are used as criteria for measuring the pros and cons of the algorithm, which are expressed as follows.
Precision represents the probability that the user is interested in the recommended route, and recall represents the probability that a user's favorite POI is recommended.The higher the accuracy and recall, the better the recommended effect.P r represents the POI set in the recommended route, and P v represents the POI set visited by users in the real tourism sequence.

Experiment result
In order to verify the effectiveness of the algorithm, this paper compares the traditional travel path recommendation algorithm.In the traditional travel route recommendation algorithm, POI distance and user interest preference are considered as the criteria.Under different time budgets B, the traditional algorithm is compared with the algorithm proposed in this paper, and a travel route recommendation algorithm based on POI distance and user interest preference is proposed.The experimental results are shown in Figs. 3 and 4. In terms of accuracy, it can be seen from Fig. 3 that the proposed algorithm has higher accuracy than the traditional algorithm only considering user interests and poi topics.In of recall rate, as shown in Fig. 4, this algorithm has a higher recall rate than the traditional algorithm which only considers user interests and poi topics.One of the influencing factors is that the proposed algorithm and the algorithm only considering the user's interest consider the user's interest, because users prefer to visit the places they are interested in.The high accuracy and recall rate of the algorithm show that the algorithm can more accurately recommend the path reflecting the real travel sequence of users.

Conclusion
Based on the positioning problem, this paper establishes a travel recommendation model and proposes a personalized travel route recommendation algorithm.The algorithm comprehensively considers POI distance and user interest preference, recommends the most suitable route to users, and realizes the tourism route recommendation framework by using Flickr photo set with geographical tags.Finally, the experimental results show that the proposed algorithm has a higher recommendation accuracy and recall rate than the traditional algorithms which only consider POI distance or user interest preference.The next step is to further study the intelligent optimization algorithm to solve the orientation problem, so as to improve the efficiency of the algorithm and reduce the cost.

Fig. 1
Fig. 1 An example of a route recommendation