 Research
 Open access
 Published:
A travel route recommendation algorithm based on interest theme and distance matching
EURASIP Journal on Advances in Signal Processing volume 2021, Article number: 57 (2021)
Abstract
To solve the problem of low accuracy of traditional travel route recommendation algorithm, a travel route recommendation algorithm based on interest theme and distance matching is proposed in this paper. Firstly, the real historical travel footprints of users are obtained through analysis. Then, the user’s preferences of interest theme and distance matching are proposed based on the user’s stay in each scenic spot. Finally, the optimal travel route calculation method is designed under the given travel time limit, starting point, and end point. Experiments on the real data set of the Flickr social network showed that the proposed algorithm has a higher accuracy rate and recall rate, compared with the traditional algorithm that only considers the interest theme and the algorithm which only considers the distance matching.
1 Introduction
In recent years, the research of recommender system has developed rapidly. Various recommendation systems are also widely used in ecommerce, social networking sites, etourism, Internet advertising, and many other fields, and these recommendation systems show superior effects and prospects [1,2,3]. With the rise of more and more online travel websites (such as Expedia, Travelzoo, tuniu), more and more online data can describe users’ interests and preferences. This makes tourism product recommendation become one of the hotspots of recommendation system research [4, 5].
At present, many mature recommendation algorithms have been widely used in traditional product recommendation, such as collaborative filtering algorithm [6], contentbased recommendation algorithm [7], and hybrid recommendation algorithm [8]. However, a large number of existing studies show that tourism product recommendation is very different from traditional film product recommendation [9,10,11], and the differences are as follows.
First of all, users usually do not often or batch purchase tourism products, which leads to the sparse correlation matrix of “user products.” Secondly, the description of tourism product information is diverse and complex. Small parameter changes will lead to completely different tourism products, such as scenic spots and schedule, hotel, and vehicle selection. However, this kind of tourism product with inherent relevance points to the common interests of users. Third, users often do not pay attention to tourism products for a long time, that is, they leave visit records on etourism websites, and often after browsing objectives and arrangements. Then, they began to browse tourism products, which led to a large number of cold start users in online travel data. Therefore, the traditional recommendation algorithm is difficult to apply to tourism recommendations. Generally speaking, the abovementioned recommendation technologies solve the problem of travel recommendation to a certain extent. However, this recommendation technology is only suitable for tourism data with relatively simple data structure, or relies on geographic information data, so it is difficult to fully capture users’ realtime interest preferences.
The data used in this paper is the real web server log of tourism enterprises, which contains rich tourism product information and a large number of user behavior click records. The recommendation engine can be built to accurately capture users’ interests based on their realtime click stream. Then, personalized tourism products are recommended to users according to their interests. In order to improve the accuracy of the recommendation system, a travel path recommendation algorithm based on interest and distance matching is proposed. The core idea is to calculate the user’s preference topic and acceptable distance according to each user’s travel history data and add them as weights into the recommendation model to get a new personalized travel path recommendation algorithm.
The rest of the paper is organized as follows. We review the related work in the first section. The second section gives the definition and algorithm of necessary concepts. The third section gives the experimental results. Finally, the fourth part draws a conclusion.
2 The related work
With the continuous development of social networks, users’ social network information is increasing. How to effectively mine valuable information from social network information plays an irreplaceable role in the development of social network [12]. In social networks, users can upload text information, location information, and time information and share this information with friends and nearby people. At present, more and more scholars recognize the importance of social network information and devote themselves to the research of social network information mining.
The idea of social network data mining is similar to GPS trajectory data mining. In GPS trajectory data mining, the main applications include association rules, abnormal behavior, travel mode, and GPS trajectory recommendation [13]. The data acquisition time is strictly limited by equal time intervals, which is reflected in shahed [14]. In social network trajectory data mining, applications mainly include location recommendation, path recommendation, and behavior preference recommendation [15]. Data collection time is discrete and random, which is the main difference between social network trajectory data and GPS trajectory data.
At present, there are many processing methods of social network data mining, including clustering, classification, and other traditional technologies. Among them, the clustering method is used to discover group pattern mining methods in social networks, and it has a good effect in recommending user path and location. MapReduce [16] framework is widely used in the largescale data processing. At present, the method of combining clustering algorithm with MapReduce framework for big data analysis and processing is gradually developed, such as DBSCAN clustering algorithm based on MapReduce, which has achieved good results.
The mining methods of group pattern mainly include group, group, escort, assembly, and platoon. The literature [17] introduces different mining methods of group patterns in detail. Swarm is a group pattern mining technology with weak time constraints. It only needs to satisfy the condition that the number of different tracks appearing at the same point is greater than the set threshold. Although team formation and escort have more time constraints than team formation, this strong constraint will also lead to the decline of accuracy. The row model is described in [18]. The platoon model combines the advantages of the above group models and adapts to different applications by allowing control of continuoustime constraints.
Personalized recommendation methods mainly include contentbased recommendation, collaborative filtering recommendation, association rulebased recommendation, utilitybased recommendation, knowledgebased recommendation, and combination recommendation [19]. At the same time, there are many recommendation strategies, and different recommendation strategies produce different recommendation results. However, in the personalized travel route recommendation based on group pattern, due to the lack of semantic information, the traditional group pattern mining leads to an incomplete personalized recommendation.
3 Algorithm implementation
3.1 Basic definition
Given a directed weighted graph G = (E, V), V is the set of nodes, and E is the set of edges. As shown in Fig. 1, a node p ∈ V represents a POI. Each POI_{p} has category attributes Cat_{p} (such as church, museum, beach), longitude, and latitude, and the value on the node p represents the score of POI_{p}. C represents a collection of categories of all POIs. In the attribute (c_{i}, D_{i}) of the node p_{i}, c_{i} represents the category attribute of the POI, and D_{i} represents the distance of the POI. Each directed edge (p_{i}, p_{j}) represents a feasible route between two POIs, the number of edges is E, and the weight on the edge represents the travel time (in h) of the continuous access to the two POIs.
A travel route is a sequence consisting of multiple travel POIs, denoted as R = {p_{1}, p_{2}, …, p_{N}}, where p_{i} is the tourist location included in the route, and N is the number of locations.
The travel time between two POIs. The travel time required by the user from \( {POI}_{P_x} \) to \( {POI}_{P_{\begin{array}{l}y\\ {}\end{array}}} \) can be defined as follows:
where Dist(p_{x}, p_{y}) represents the distance between p_{x} and p_{y}, which is calculated by the Haversine formula [20]. Suppose the user walks to play and takes a speed of 6 km/h.
The preference vector of a user u is expressed as IntP(u) = 〈Int(u, c_{1}), Int(u, c_{2}), …, Int(u, c_{i})〉, where Int(u, c_{i}) represents the degree of preference of the user u for the POI category c_{i}.
Given a user u and the POI collection he/she has been to define his/her historical travel footsteps in chronological order \( {S}_u=\left(\left({\mathrm{p}}_1,{t}_{p_1}^a,{t}_{p_1}^d\right),\left({p}_2,{t}_{p_2}^a,{t}_{p_2}^d\right)\dots, \left({p}_n,{t}_{p_n}^a,{t}_{p_n}^d\right)\right) \). Each triplet \( \left({p}_x,{t}_{p_x}^a,{t}_{p_x}^d\right) \) consists of a \( {POI}_{P_x} \) that the user has visited, a time \( {t}_{p_x}^a \) that reaches p_{x}, and a time \( {t}_{p_x}^d \) that leaves p_{x}, consisting of three elements. The first photo taken by the user in each POI is the time of the user’s arrival and the last photo is the time of the user’s departure. The user’s access time in p_{x} (that is, the user u’s stay in p_{x}) can be valued by the difference between \( {t}_{p_x}^a \) and \( {t}_{p_x}^d \). Similarly, for the travel sequence S_{u}, \( {t}_{p_1}^a \) and \( {t}_{p_n}^d \) represent the start and end times of the journey, respectively. For simplicity, this paper represents \( {S}_u=\left(\left({p}_1,{t}_{p_1}^a,{t}_{p_1}^d\right),\left({p}_2,{t}_{p_2}^a,{t}_{p_2}^d\right)\dots, \left({p}_n,{t}_{p_n}^a,{t}_{p_n}^d\right)\right) \) as S_{u} = (p_{1}, p_{2}, …, p_{n}).
The travel footprint S_{u} = (p_{1}, p_{2}, …, p_{n}) of the user u is known, and if \( {t}_{p_{x+1}}^a{t}_{p_x}^d>\tau \), S_{u} is divided into a number of individual travel sequences (that is, subsequences of S_{u}). In other words, if the time between consecutive accesses of two POIs is greater than the threshold τ, the travel footprint is divided into a number of different tourist subsequences. The time threshold value τ is selected in this document as 8h.
This paper gives a POI scoring method considering location distance and user preferences. The score for \( {POI}_{p_i} \) is expressed as score(p_{i}):
where c_{i} is the category of \( {POI}_{p_i} \), D(p_{i}) is the distance of \( {POI}_{p_i} \), and α is the user adjustment parameter, which is used to adjust the proportion of user interest preference and POI distance in the route.
3.2 Tourist route recommendation framework
As shown in Fig. 2, the travel route recommendation in this paper is divided into the construction of the POI association graph and the learning of the user’s interest preference, as well as the route recommendation. The construction of the POI association graph and the learning of the user’s interest preference are performed offline, and the distance of the POI and the user’s interest preference can be obtained by analyzing the photos taken by the user. The route recommendation is online, assuming that the user wants to go to the city with mPOIs, recorded as P = {p_{1}, p_{2}, …, p_{m}}. According to the POI set P, time budget B, starting point \( {POI}_{p_1} \), and ending point \( {POI}_{p_n} \), the route with the highest score is recommended to users by using the proposed algorithm which combines user interest preferences and POI distance based on the orientation problem.
3.3 Construction of POI correlation graph
The construction of the POI association diagram takes place offline. In this paper, POI in the tourism sequence of all users is used as the node in the graph, which represents the tourism place, and the continuous access of users in the tourism sequence generates the edges in the graph.
The structure of the photo data shared by the user is (PhotoID, UserID, time, longitude, latitude, category). From this structure, the photo data contains the exact spacetime location information of the user. Based on the longitude and latitude of each photo, the Haversine formula [20] is used to calculate the distance between each photo shared by the user and each POI in the city visited. If the distance is less than 200m, the photograph is considered to be taken at POI, so as to obtain the list of users’ POI S_{u} = (p_{1}, p_{2}, …, p_{n}).
A timebased user interest preference is presented by using the user’s historical travel footprint. When a user goes to a POI to play, he/she will stay at the POI for a certain period of time. The access time (i.e., the stay time) of each POI that each user has visited is calculated from the historical travel footprint of all users according to definition 4, so that the average time required for any user to access any one of the POIs can be calculated. In the travel route recommendation in this article, \( \overline{V}(p) \) is used to indicate the average access time of his/her POI_{p} for any user. The average access time required for each POI_{p} is as follows:
where U represents all users, and n represents the number of users accessing p in U. \( \sigma \left({p}_x=p\right)=\left\{\begin{array}{c}1,{p}_x=p\\ {}0,\mathrm{other}\end{array}\right. \).
However, the average access time of the user at each POI does not truly reflect his/her degree of interest preference for this type of POI. Therefore, a timebased user interest preference is proposed in this paper. The preference degree of user u to category attribute c of POI is calculated from the following equation.
where Cat_{p} represents the category attribute of POIp,\( \sigma \left({Cat}_{p_x}=c\right)=\left\{\begin{array}{c}1,{Cat}_{p_x}=c\\ {}0, other\end{array}\right. \). The above equation determines the interest of user u in the category attribute c for a particular POI. Relative to the average access time of all users in the same POI, it is calculated based on the time spent by users in each POI with category attributec. In other words, a user may spend more time accessing the type of POI he or she is interested in, which in turn determines the user’s level of interest in such POI.
3.4 Proposed algorithm
Orienteering problem (OP) is a directional problem, which is described as follows. In a directed weighted graph G(V,E), V is the set of all points on the graph, and E is the set of all edges on the graph. Each point has its score (score, which can be expressed as gain), and each edge has its weight (weight, which is the walking time between two points). The start and end points are specified. Select partial points from G. Then, plan a path through the selected points, the starting points, and ending points. At the same time, under the premise of not exceeding a certain time budget, the total weight score of the path is maximized.
OP has been widely used in travel route recommendations. The route recommendation algorithm is proposed in consideration of POI distance and user interest based on the orientation problem. Based on the set P, the time budget B, the starting point \( {POI}_{p_1} \), and the end point \( {POI}_{p_n} \), the proposed algorithm recommends a route R = {p_{1}, p_{2}, …, p_{n}} that satisfies the time budget B and has the highest score. The time budget is calculated by function Cost(p_{x}, p_{y}), \( Cost\left({p}_x,{p}_y\right)={T}^{Travel}\left({p}_x,{p}_y\right)+\overline{V}\left({p}_y\right) \). Therefore, the travel route recommendation model in this paper can satisfy the integer programming problem with multiple constraints, which is expressed as follows:
where x_{i, j} = 1 indicates the route from i to j, which goes through the edge (p_{i}, p_{j}), otherwise x_{i, j} = 0. The above equation satisfies the following constraints:
Equation (5) is an objective function that maximizes the POI distance and user interest preferences in the recommended route. Eqs. (6) to (10) are the constraints of Eq. (5). Equation (6) ensures that the user’s travel starting point is p_{1} and the end point is p_{n}. Equation (7) ensures that the user’s travel route is coherent and that each POI in the route has only been visited once. Equation (8) guarantees that the time spent by the user on the entire trip is within budget B. Assuming that u_{x} is the location of POIx in route R, Eq. (9) and Eq. (10) ensure that there are no subpatrol routes in the integer programming problem proposed in this paper.
4 Experimental results and analysis
4.1 Experimental data and the data preprocess
This article uses the Flickr dataset of web and mobile data management (WAMDM), which contains two aspects. One is 319,110 photos of New York City, including photo ID (identity), photo location (latitude and longitude), photo time, photo tag information, and user ID. The other is the address book of 12,991 Flickr users. In this experiment, we use the leap one out crossvalidation method [21] used in most recommender system research to verify the algorithm. It circulates one data in the whole data set as the test set and the other as the training set and calculates various evaluation indexes according to the prediction results of each cycle. In order to facilitate the experiment, the data are processed as follows: (1) delete users who take less than 5 photos per poi and (2) delete users who have only visited one or two POIs. After data preprocessing, the final experimental data set consists of 55,451 images taken by users and 165,830 images of Geotag.
4.2 Evaluation index
The accuracy of recommendation is the most important index of the evaluation algorithm. In this paper, precision and recall are used as criteria for measuring the pros and cons of the algorithm, which are expressed as follows.
Precision represents the probability that the user is interested in the recommended route, and recall represents the probability that a user’s favorite POI is recommended. The higher the accuracy and recall, the better the recommended effect. P_{r} represents the POI set in the recommended route, and P_{v} represents the POI set visited by users in the real tourism sequence.
4.3 Experiment result
In order to verify the effectiveness of the algorithm, this paper compares the traditional travel path recommendation algorithm. In the traditional travel route recommendation algorithm, POI distance and user interest preference are considered as the criteria. Under different time budgets B, the traditional algorithm is compared with the algorithm proposed in this paper, and a travel route recommendation algorithm based on POI distance and user interest preference is proposed. The experimental results are shown in Figs. 3 and 4.
In terms of accuracy, it can be seen from Fig. 3 that the proposed algorithm has higher accuracy than the traditional algorithm only considering user interests and poi topics. In terms of recall rate, as shown in Fig. 4, this algorithm has a higher recall rate than the traditional algorithm which only considers user interests and poi topics. One of the influencing factors is that the proposed algorithm and the algorithm only considering the user’s interest consider the user’s interest, because users prefer to visit the places they are interested in. The high accuracy and recall rate of the algorithm show that the algorithm can more accurately recommend the path reflecting the real travel sequence of users.
5 Conclusion
Based on the positioning problem, this paper establishes a travel recommendation model and proposes a personalized travel route recommendation algorithm. The algorithm comprehensively considers POI distance and user interest preference, recommends the most suitable route to users, and realizes the tourism route recommendation framework by using Flickr photo set with geographical tags. Finally, the experimental results show that the proposed algorithm has a higher recommendation accuracy and recall rate than the traditional algorithms which only consider POI distance or user interest preference. The next step is to further study the intelligent optimization algorithm to solve the orientation problem, so as to improve the efficiency of the algorithm and reduce the cost.
Availability of data and materials
The labeled dataset used to support the findings of this study is available from the corresponding author upon request.
Abbreviations
 GPS:

Global Position System
 DBSCAN:

Densitybased spatial clustering of applications with noise
 POI:

Point of interest
 WAMDM:

Web and mobile data management
References
M.F. Alhamid, M. Rawashdeh, H. Dong, et al., Exploring latent preferences for contextaware personalized recommendation systems [J]. IEEE Transactions on HumanMachine Systems 46(4), 615–623 (2016)
A. KlašnjaMilićević, M. Ivanović, B. Vesin, et al., Enhancing elearning systems with personalized recommendation based on collaborative tagging techniques [J]. Appl Intell 48(6), 1519–1535 (2018)
D. Ayata, Y. Yaslan, M.E. Kamasak, Emotion based music recommendation system using wearable physiological sensors [J]. IEEE Trans Consum Electron 64(2), 196–203 (2018)
R. ColomoPalacios, F.J. GarcíaPeñalvo, V. Stantchev, et al., Towards a social and contextaware mobile recommendation system for tourism [J]. Pervasive and Mobile Computing 38, 505–515 (2017)
C. Benfares, Y.E.B. El Idrissi, A. Amine, Smart city: recommendation of personalized services in patrimony tourism [C]//2016 4th IEEE International Colloquium on Information Science and Technology (CiSt). IEEE (2016), pp. 835–840
J. Chen, H. Zhang, X. He, et al., Attentive collaborative filtering: multimedia recommendation with item and componentlevel attention [C]//International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM (2017), pp. 335–344
L. Cui, L. Dong, X. Fu, et al., A video recommendation algorithm based on the combination of video content and social network [J]. Concurrency and Computation: Practice and Experience 29(14), e3900 (2017)
Y. HUANG, Y. Jinxin, S.U.N. Wei, Research of hybrid recommendation algorithm based on improved bipartite network and expert trust [J]. Value Engineering 36(19), 160–164 (2017)
Q. Liu, E. Chen, H. Xiong, et al., A cocktail approach for travel package recommendation [J]. IEEE Transactions on Knowledge & Data Engineering 26(2), 278–293 (2013)
C. Tan, Q. Liu, E. Chen, et al., Objectoriented travel package recommendation [J]. ACM Trans Intell Syst Technol 5(3), 1–26 (2014)
C.M. Lee, J.J. Thomas, Travel route recommendation based on geotagged photo metadata [C]//International Visual Informatics Conference (Springer, Cham, 2017), pp. 297–308
L.Q. Nie, X.M. Song, T.S. Chua, in Proc. of the Synthesis Lectures on Information Concepts, Retrieval, and Services. Learning from multiple social networks (Morgan & Claypool Publishers, 2016)
P.F. Yin, M. Ye, W.C. Lee, Z.H. Li, in Proc. of the 18th PacificAsia Conf. on PAKDD. Mining GPS data for trajectory recommendation (SpringerVerlag, 2014), pp. 50–61
A. Eldawy, M.F. Mokbel, S. AlHarthi, A. Alzaidy, K. Tarek, S. Ghani, in Proc. of the ICDE. SHAHED: a MapReducebased system for querying and visualizing spatiotemporal satellite data (2015), pp. 1585–1596
Y. Shen, L.G. Zhao, J. Fan, Analysis and visualization for hot spot based route recommendation using shortdated taxi GPS traces. Information 6(2), 134–151 (2015)
Y.B. He, H.Y. Tan, W.M. Luo, S.Z. Feng, J.P. Fan, MRDBSCAN: a scalable MapReducebased DBSCAN algorithm for heavily skewed data. Frontiers of Computer Science 8(1), 83–99 (2014)
Y.X. Li, J. Bailey, L. Kulik, Efficient mining of platoon patterns in trajectory databases. Data Knowl Eng 100, 167–187 (2015)
Q. Fan, D.X. Zhang, H.Y. Wu, K.L. Tan, A general and parallel platform for mining comovement patterns over largescale trajectories. PVLDB 10(4), 313–324 (2016)
T. Hasuike, H. Katagiri, H. Tsubaki, H. Tsuda, in Proc. of the SMC. A route recommendation system for sightseeing with network optimization and conditional probability (2015), pp. 2672–2677
D. Gavalas, C. Konstantopoulos, K. Mastakas, et al., Review: mobile recommender systems in tourism [J]. J Netw Comput Appl 39(1), 319–333 (2014)
C. Xin, C. Gao, C.S. Jensen, Mining significant semantic locations from GPS data [M]. VLDB Endowment (2010)
Acknowledgements
TangSeng Huang helped perform the analysis with constructive discussions.
Funding
Research on the Construction of Health Informationization of People in the old Revolutionary Base Area  Dazhou City(SLQ2020SB015).
Author information
Authors and Affiliations
Contributions
XI cheng, as the primary contributor, completed the analysis, experiments, and paper writing. The author read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable
Consent for publication
Not applicable
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Cheng, X. A travel route recommendation algorithm based on interest theme and distance matching. EURASIP J. Adv. Signal Process. 2021, 57 (2021). https://doi.org/10.1186/s1363402100759x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1363402100759x