• Subject Code : BUS5PB

Table of Contents

Task 1 2

Task 2 6

References 9

Task 1

a. Word embedding is a form of representation that helps machines to understand human language. It is an invaluable part of Natural Language Processing in Data Science. 

Survey Analysis

Analysing the answers to surveys requires the kind of time, manpower or tools that most businesses do not have. This is a loss of market understanding and investment returns (Brunet et al., 2019). Word embeddings can be used in such cases. The software is trained on the survey data sets with the help of vector representation technology. This helps the machine understand the responses and their context. This, in turn, helps machine learning bots identify the insights and suitable actions for the business.

Recommendation Systems

Previous recommendation systems were based upon content for future use. Now, with the popularity of streaming media services, the platforms use word embedding and other forms of machine learning to recommend content for present enjoyment (Packer et al., 2018). The word embedding systems use the streaming habits of each user as datasets. The final recommendation is based on popularity, the general interest of the user as well as what content is frequently used together in the same situation.

b. Neural network models like word embedding are subject to the bias in its datasets. The most common result is implicit social discrimination in the system, especially with regard to race and gender. Women and people of colour often face the brunt of this.

Racism

The datasets used for training word embedding models for simple tasks, like grammar errors or rude language, are more likely to flag text by people of colour or people from low-income backgrounds(Papakyriakopoulos et al., 2020). This hampers the self-expression and smooth flow of communication for such people. For example, speech transcription has higher error rates for persons with non-formal accents in English, particularly people of colour (Swinger et al., 2019). The bias is likely to be caused due to biased training datasets. The implicit discrimination present in most developers or even the available datasets, as the socially powerful classes, get more visibility than others (Brunet et al., 2019). Therefore, accented speech is not often considered as a part of the datasets, and has less priority even when they are. Overall, the words and sentence structure commonly used among persons of colour or lower-income communities are pointed out as abusive even when they decidedly are not.

Sexism

The word embedding models for predictive text have a higher preference for certain gendered words in different contexts (Papakyriakopoulos et al., 2020). For example, answering the question “Is the manager here?” using predictive text leads to a higher preference rating for the response “Yes, he is.” than “Yes, she is.” This, too, is largely due to the bias in the training data (Brunet, et al., 2019). Generally, certain professions or social situations have gendered associations, and the training data is reflective of that (Swinger et al., 2019). However, this creates a negative user experience if the system simply assumes the gender of a profession (Packer et al., 2018).

Machine learning and word embedding aim to provide personalised user experiences, and generalisations in training data are discriminatory (Packer et al., 2018).

 c. The two bias aspects of word embedding are racism and sexism. These two issues should be eliminated from this system. Essential computational phonetics can give intriguing data a few words with regards to a certain arrangement of reports. For case, check the occasions that a specified word is employed in all report or the amount of records including the word at whatever year. In an exceedingly file of paper articles, it can be seen where the word shows up to grasp the word implies after it is utilized, nonetheless, require not only tallying the employments of the speech yet by a way or another measuring the importance of the word (Asgari et al., 2016). Word vectors endeavour to try to this during this area, a natural, insignificantly scientific conversation of the overall hypothesis and technicalities of word vectorisation. After the understanding of such models work, ready to direct the concentration toward the extraordinary quandaries they present for fleeting inquiries and afterward to usage for tiny informational indexes that give potential arrangements. Two improvements can be made to modify the system and they are:

Distributional Semantics 

Word vectorisation technique is an execution of lots more seasoned semantic hypothesis referred to as the distributional speculation. Basically, the distributional speculation contends to facilitate the importance of a word is extricated by seeing, over numerous writings, at the terms that happen around it. The distributional speculation contends that, freed from another setting or maybe linguistic request, a deliberate assortment of speech collocations can permit to create semantic sense out of vocabulary. At the tip of the day, language specialists have contended that the vocabulary that show up next to one another restrain an astonishing measure of knowledge concerning the importance of a given speech which a distinction of significance associates with contrast of dispersion of those collocations (Dhouib et al., 2019). Words that show up in comparable settings of procurable terms have comparable implications. The intensity of the allocated speculation lies in its generalisation. It tends to be useful to any speech, or maybe to singular corpora inside a specified speech; this requires no previous contribution of speech references or syntactic structure. It concludes and here we should possibly say 'we derive,' since there's proof this is often the way people learn language the implications of words passionate about allowing for the collocations of every terms that happen in an exceptionally specified corpus of writings. The distributional theory is that the spine supposition that drive word vectorisation model (Kim et al., 2017).

 Turning Words 

Into Vectors In what capacity may we rough the distributional semantics replica of human speech securing to instruct it to a PC? It may start by seeing straightforward speech co-event, a fundamental method of communicating the understanding that words that happen in comparable setting have comparable implications. To require a gander at the co-event of words in an exceedingly given book, we could take each conceivable pair of words in an exceedingly book, and afterward compute the restrictive likelihood of word an occasion inside, state, five expressions of word b within the content. While various terms have distinctive free probability of happening, however a few words are more typical than others we have to compute what is referred to as the point-wise shared data (PMI) of the pair: basically, the quantity more are seeing a certain pair of terms happening jointly than it will hope to at arbitrary, specified the autonomous likelihood of considering each remark (Asgari et al., 2016). PMI qualities is certain or negative, yet when words a and b are free, for instance at the purpose when the nearness of single speech has no effect on the probability of considering the opposite. The estimation of the PMI of a specified pair of terms is then approximated utilizing direct polynomial math: where the replica creates the suspicion that PMI is approximated as a solitary results of 2 identical length vectors (Kim et al., 2017).

Task 2

a. The bank should adopt some secured systems to help the customers. There should be a proper management to eliminate indiscriminate and unwarranted data gathering system. The bank should help Fred and Tamara. The software package application that contains so many of local brokers should include some improved aspects that may not be harmful for the customers. Their payment of bill history was also being available to all. All other information about the life style of any customers is easily available in the application. The bank officers have not realised the need of its modification concerning the safety and privacy of the customers. Along with that the bank personnel did not think about the risks that may occur due to using such application software in crucial works. The bank knows the reason of failure of loan of Fred and Tamara, but it did not take any action to help them. The application for loan was cancelled by the bank software algorithm system and the bank personnel know the cause of the same. The bank should prevent unnecessary entry of third parties in any banking procedure, as helping customers is the ultimate ethical behaviour that every organisation should show. 

b. A married couple Fred and Tamara wanted to set up their own business in their locality. But unfortunately they are unable to get loan from bank as the newly installed software of the bank rejected them without disclosing any reason. Their life interest is harmed for that denial.

The couple wanted to be their own boss. That means they wanted to be an entrepreneur just like others. Tamara and Fred want to be independent and they had a wonderful business idea. But they don’t have enough funding for their business. They discussed their business idea with others and get appreciated by others. Then they decide to apply for loan in local bank. But for the software package a great business idea is unable to implement. Fred is a student of culinary school. They love to cook. That’s why they choose restaurants. Through this they want to improve their local culinary skills. But as the bank software judged beyond their ethics and depending upon some personal data denied to give them the loan.

Tamara is an accountant and Fred is a graduate from a prestigious culinary school. They are a young couple and had some dreams to fulfill. Fred loves to cook and they have some good start up plan to develop their financial conditions. They had a good past record with cleared bill and all. But the new software thinks beyond a loan officer. 

The software also checked each and every details of client. Such as what kind of books they love to read or what kind of movie they like to read that means if they are strong enough to bear the loss or they have suicidal tendencies. It also analyzes the medical histories whether they are drug addicted or they are suffered from any kind of serious diseases. In that case they would not be able to refund the loan. Some data are gathered beyond their knowledge in that case the bank has made a interference in their personal life and personal information which is against their ethics.

For the denial of loan the young 30’s couple is unable to fulfill their dream to become a successful entrepreneur. The couple believed them and gave their information to the bank for loan but bank has gathered some more data beyond their information which is highly related to their personal life. The most important they want to become financially strong by setting up the business. The software which is developed is not so reliable as it has no capacity to thing like a human being it works only with data not only that it does not show the specific reason for rejecting the loan which harmed the life of Fred and Tamara.

c. The large use of data science in loan evaluation process can harm a society a lot. Not only Fred and Tamara, but many customers face such problems in their load approval process. As all the personal data are available in this software, it becomes quite easy for the hackers to reach their targeted people. Hacking system can lead to drastic damage or harm to any people. It causes harm in economy or sometimes in life. Not only few people, but the whole society can be affected by this system. Data gathered of a specified region can give minor to major information about the people of that region. Sometimes, this can be beneficial for criminals. They can harm the full society by grabbing their money or by giving a threatening to their lives. Thus, this system should be modified to save the data of customers so that major risks can be avoided. These data many times used by the hackers as selling material in dark web. The risks are getting high in this case. More safety should be taken in modification system of the data science used by banking personnel specially.

d. Fred and Tamara are 30’s youth couple wanted to open a restaurant. Fred is a graduate of prestigious culinary school and Tamara is an accountant. Everyone appreciated the business idea, but get rejected by getting loan for a newly installed loan authorizing software. For that not only the youth suffered but it also has some really bad effect on the society and to those who have a good start up ideas. They can’t fulfill their dream because of financial support. The software they use seriously has some problems. A good loan management software should have some criteria:

Customizable- The software should be customizable. The various kinds of clients have different needs and criteria for their loan. Such as someone has a good business idea but the client has some set back. In that case the customized software can show the best way to satisfy their need is to customize the software.

Flexible- The software must be flexible to reach the need of their customers. It should be cloud based software which is very easy to meet the customers need. Software used for loan management must be user friendly so that the user can operate the software from where he or she wants. The cloud based software is again the most user friendly software used for loan management.

Reliable & Security- The software company should be reliable that when the bank or the organization is facing the problem regarding the software the company should come for their help and give enough support to the bank or the organization.

The most important thing is security. The customer who is willing to get a loan trusts the bank or the organization so in that case if they are unable to keep their data safe it will be a great setback for them. If any case the bank interferes in their personal life or tries to get data without informing them it will be a great problem for them. In cloud based software there is a high risk of getting hacked by the web hackers. If the bank wants to implement any loan management software they must be aware of these basic things. For various customers they should have different loan planes. Otherwise they would not be able to hold the customers. The management system must have some flexibility for example in present content when people are working from home they can operate the same system from another devise if it is based on cloud function. The security is the basic thing that a customer generally expects from a bank or organization. The loan management software must have a good security system that the web hackers can’t hack the data or bank must not seek for any personal data that does not have any connection with their loan approval.

References

Swinger, N., De-Arteaga, M., Heffernan, N., Leiserson, M. and Kalai, A., 2019. What Are The Biases In My Word Embedding?. [ebook] Microsoft. Available at: <https://www.microsoft.com/en-us/research/uploads/prod/2019/01/What-are-the-biases-in-my-word-embedding_paper.pdf> [Accessed 3 June 2020].

Asgari, Ehsaneddin, and Mohammad RK Mofrad. "Comparing fifty natural languages and twelve genetic languages using word embedding language divergence (weld) as a quantitative measure of language distance." arXiv preprint arXiv:1604.08561 (2016).

Brunet, M., Alkalay-Houlihan, C., Anderson, A. and Zemel, R., 2019. Understanding The Origins Of Bias In Word Embeddings. [ebook] Available at: <http://proceedings.mlr.press/v97/brunet19a/brunet19a.pdf> [Accessed 3 June 2020].

Caliskan, A., Bryson, J.J. and Narayanan, A., 2017. Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334), pp.183-186.

Dhouib, M. T., Zucker, C. F., & Tettamanzi, A. G. (2019, September). An Ontology Alignment Approach Combining Word Embedding and the Radius Measure. In International Conference on Semantic Systems (pp. 191-197). Springer, Cham.

Kim, S., Fiorini, N., Wilbur, W. J., & Lu, Z. (2017). Bridging the gap: Incorporating a semantic similarity measure for effectively mapping PubMed queries to documents. Journal of biomedical informatics, 75, 122-127.

Packer, B., Halpern, Y., Guajardo-Céspedes, M. and Mitchell, M., 2018. Text Embedding Models Contain Bias. Here's Why That Matters.. [online] Google Developers Blog. Available at: <https://developers.googleblog.com/2018/04/text-embedding-models-contain-bias.html> [Accessed 3 June 2020].

Papakyriakopoulos, O., Serrano, J., Hegelich, S. and Marco, F., 2020. Bias In Word Embeddings. [ebook] Barcelona. Available at: <https://dl.acm.org/doi/pdf/10.1145/3351095.3372843> [Accessed 3 June 2020].

 

Get It Done! Today

Upload your assignment
  • 1,212,718Orders

  • 4.9/5Rating

  • 5,063Experts

Highlights

  • 21 Step Quality Check
  • 2000+ Ph.D Experts
  • Live Expert Sessions
  • Dedicated App
  • Earn while you Learn with us
  • Confidentiality Agreement
  • Money Back Guarantee
  • Customer Feedback

Just Pay for your Assignment

  • Turnitin Report

    $10.00
  • Proofreading and Editing

    $9.00Per Page
  • Consultation with Expert

    $35.00Per Hour
  • Live Session 1-on-1

    $40.00Per 30 min.
  • Quality Check

    $25.00
  • Total

    Free
  • Let's Start

Browse across 1 Million Assignment Samples for Free

Explore MASS
Order Now

My Assignment Services- Whatsapp Tap to ChatGet instant assignment help

refresh